Extract emails from a site using email extractor
When this mode is selected program will search for E-mails starting from the page you entered in Starting URL field.
Starting URL Type in URL from which search for emails begins
Search limitations
Search depth means how deep application will go into the site searching for E-mails.
Depth 0 means that only the first page will be processed, Depth 1 means first page and links on it are processed and so on as shown on the picture below:
Process external links – weather to process links which are external to starting page (with different domain than starting url, for example if start page is CNN.com all links without CNN.com in its url are external)
Searching depth for external links – same thing as Search depth but tells how deep to go into external sites. There are no All pages option here because then it would be impossible to finish search at all, even if depth is greater then 0 or 1 searching can take very long.
URL filter
Here you can specify which words (or any text) URL must or must not contain.
This way you can limit search to some specific pages or skip some pages which are not of interest. Using this option search for E-mails is much more effective.
For example if a site has a few 1000s of pages searching all of them can be time consuming.
But if you know in advance that some pages contain many emails and others has no emails at all you can limit searching accordingly.
Imagine a site with articles, where every article contains Name and E-mail of the author and articles are stored on the site in a directory called “articles”.
Then URLs for articles would be something like this:
www.somesite.com/articles/text1.htm
www.somesite.com/articles/text2.htm
…
Putting word “articles” in URL must contain field will limit the search only to pages of that interest us.
Also a site may have pages with no E-mails at all like:
www.somesite.com/images/description1.htm
www.somesite.com/images/description2.htm
…
Then why searching all that pages? Just put word “images” in URL must not contain field and pages with image descriptions are skipped saving a lot of time.