Scrape Email lists with ScrapeBox part 1
Today, we will learn how to scrape an unlimited amount of targeted e-mails from forms using Scrapebox. Before we continue, you should know that there are ICANN laws against e-mail spamming which require you to use opt in lists and provide a way in which the people on your list can unsubscribe from your e-mails. If you are thinking about e-mail marketing, make sure to understand tools and proceed at your own risk.
So the first thing we will need is a target site. Forms work great for this. You want to end up with a large list of URLs which potentially contain e-mails. In this step, we are going to be looking for either a large form that has a numerical URL structure, or a large forum that has a keyword URL structure.
Keyword URL Structure
If you are scraping from a forum that has a keyword URL structure, they’re going to head over to scrape box’s harvester. First, open up a notepad and enter the following
Basically, you’re going to want to enter the entire URL up until it begins to change. Look at a few different forum threads and is part of the URL is constant. Take that and add it to your inurl: function. Make a space and add as many related keywords if you possibly can. Use keywords which are likely to come up often in that particular form. For example, it is a difficult one use keywords like
And so on…
*Note: You can use the keyword suggestion tool found in Scrapebox for this.
Add all of these keywords to the harvester window in Scrapebox.
Now, we will use the merge function to combine our keyword list with our URL. Click the “M” button on the top left of Scrapebox and you should end up with a long list with your URL function and keyword attach. Check custom footprint and begin scraping.
*Note: You will be scraping Google for this so make sure you have proxies loaded and tested.
When you’re finished, you should have a large amount of URLs. Save this list.
Numerical URL Structure
Before we go any further, I want to show you how to capture all of the web pages from a form which uses a numerical URL structure. What I mean by this is a form which uses thread your oils such as
This is an ideal situation because we will be able to extract every single thread URL on the forum. To do this, first you have to open up Excel. Identify the range of numbers in the URL and create them in Excel.
When you do this, use as small of a range as possible. If you look at those two areas above, it is likely that we only want to focus on the last four or five digits of the URL. Anything further is either likely to not exist where the two old for relevancy.
Enter the lowest number in your range in cell A1. In B1, enter =A1+1
Copy B1 and paste it down the column until you have the desired amount of numbers.
Enter the “common” part of the URL and paste it all through column 2.
Enter =B1&A1 in cell C1
Paste C1 down the column.
Now you have the full list of URLs which we will use to extract emails from with Scrapebox.
In either case, we have our URLs. Tomorrow, we will learn how to scrape the emails from our URL list with Scrapebox.
Incoming search terms:
- scrap mail
- scrapebox lists