Scrapebox has been notorious for scraping Google and data harvesting. It has been available for years and it continues to be one of the best tools for essential SEO tasks.

However, it’s commonly referred to as a spam link building tool because it also provides mass blog comment functionality. It has given the software a bad rap in the SEO scene but what many don’t realize is how effective Scrapebox can be for white hat SEO. You can use Scrapebox to speed up some of the things that would otherwise take you forever to do manually. Scrapebox comes out-of-the-box with great features that you simply can’t get in any other tool.

1. Basic Scrapebox Tasks

Scrapebox has dozens of basic tasks that it can do built right into the software or accomplished through one of its many free plugins. These basic tasks may not seem like much right now but when doing day-to-day SEO work, they can be incredibly helpful.

– Check Domain and Page Authority

Scrapebox provides a free add-on that utilizes the Moz API for checking domain metrics like Domain Authority and Page Authority. For white hat link building, these metrics are still very important as they help you decide which sites are worthwhile for guest post prospecting and other legitimate outreach.

– Scrape Emails/Phone Numbers

Don’t forget Scrapebox’s email and phone number grabber. This simple tool works great for scraping leads off small business directories, websites, communities or anywhere that your customers can be found online.

– Check If Indexed

A backlink will only benefit your search engine rankings if it has been crawled and index. When you’re building links through legitimate outreach, this generally won’t be an issue. However, Scrapebox lets you bulk check if your backlinks have been indexed in Google which can be useful when you’re building a large number of links.

– Remove Duplicates/Trim

Scrapebox lets you clean up sites you have scraped by triming them down to their root domain and removing duplicates. These features are nice to have at your disposal when you need them and although just simple little tools, they can save you an immense amount of time.

– Outbound Link Checker

This add-on allows you to easily view the number of outbound links from a list of URLs. You can export the results and use the data for link building opportunities on low OBL pages.

– Alive Check

Useful for checking which pages are still live in a list of URLs. Alive Check can also be utilized for expired domain checking by scanning indexed domains for 404 errors.

– Dofollow Test

A dofollow backlink passes its authority onto your website and is considerably more valuable in terms of SEO. Scrapebox provides a free add-on for dofollow/nofollow testing.

– Link Extractor

Similar to the Outbound Link Checker, the Link Extractor add-on not only checks how many internal and external links exist on a page but it also allows you to export those links and save them to a file.

– Vanity Name Checker

Expired web 2.0 properties can be used to build your own free private blog network. The vanity name checker works similarly to Alive Check, scanning a list of web 2.0 domain names and checking for 404 errors to see if they have expired.

– Social Checker

Checks a list of URLs for social metrics such as Facebook, LinkedIn, Pinterest and Google +1. You can export to a file and use this for analysis of your own websites or competitors.

– Page Scanner

Use custom footprints to scan and extract data from a list of URLs.

– Google Competition Finder

The Google Competition Finder checks the number of pages indexed in Google for a list of keywords. This is a simplistic approach to keyword research and provides relatively accurate competition ratings.

– Anchor Text Checker

Check a list of backlinks and see which keywords are used as the anchor text for your links. Anchor text diversity is very important to natural link building and this add-on makes it easy to evaluate your pre-existing backlinks.

– Social Account Scraper

The Social Account Scraper scans a list of URLs and retrieves the social accounts for those businesses. This add-on finds published profiles on Twitter, Facebook, LinkedIn, Google+ and Pinterest.

– Google META Scraper

Easily scrape the META details from Google search results. You can enter any keyword or even a list of keywords and this add-on will retrieve the titles, descriptions and URLs for the Google search results.

– Whois Scraper

Export the registrant names, emails and domain creation data for a list of URLs. Whois details can be used for competitor research, outreach and a variety of other tasks.

– Google Cache Extractor

Find the exact Google cache date for a list of URLs and easily export this data to a file. The cache date is the day that Google indexed the page and recognized the content.

– Google Image Scraper

Need a large number of images for link building or other purposes? The Google Image Scraper will scrape hundreds to thousands of relevant images.

– Malware and Phishing Filter

Trim your URLs by removing sites that have malware or phishing detected. You can also use this add-on for contacting websites with malware and offering a fix in exchange for a link back to your site.

– Alexa Rank Checker

Alexa is a metric that represents the traffic a website receives. A lower Alex rating means a website has a large audience and a steady flow of traffic. You can use Alexa scores to target sites with potential to drive inbound traffic.

– DupRemove

Remove duplicates in massive URL lists that contain up to 180 million lines. If you’re working with extremely large lists of websites, DupRemove can remove duplicates without crashing the software.

– TDNAM Scraper

This add-on scrapes the Godaddy database for domains that are soon to expire in the TDNAM $5 closout auctions. You can scrape by keyword and domain extension. After scraping you can easily export the domains back to Scrapebox and check Domain Authority, Page Authority, indexed status, Alexa rank, social metrics and other key data.

– Sitemap Scraper

If you need to retrieve an entire site’s internal URL list, the Sitemap scraper makes this an easy task.

– Mass URL Shortener

Easily shorten a massive list of URLs using services like TinyURL.com.

– YouTube Downloader

Download relevant vidoes from YouTube, Vimeo, DailyMotion and other video sites. This add-on also retrieves video metrics such as the number of views, likes, dislikes, video upload date, the category it’s published in and more.

– Broken Links Checker

Check a list of URLs for broken links. A broken link is an outbound link that potentially once worked but now points to a page resulting in a 404 or similar error. Broken link checking can be used for link building by recommending the site owner fix the downed link and replace it with a link of your own.

2. Scrapebox Scraping Techniques

Being able to export Google’s results for whatever keywords you want; in bulk, is what Scrapebox does best.

You can use the Scrapebox “harvester” to effectively scrape websites that are indexed  in Google based off any footprint and keywords you provide.

A footprint is a single keyword (or a keyword phrase) that you want to be present on every site that you are scraping. For instance, WordPress has a footprint of “Powered by WordPress” that can be found at the bottom of millions of blogs that use WordPress as their content management system. Therefore, “Powered by WordPress” is an example of a pretty good footprint for scraping WordPress sites.

A keyword list is a long list of keyword phrases that will combine with your footprint to perform searches in Google. Scrapebox uses these keywords to scrape Google’s index. The more keywords you have to combine with your footprint, the more results you will be able to Scrape off Google. If one of your keywords was “white hat SEO” and your footprint was “Powered by WordPress” Scrapebox would combine the two to perform a search that looks like: “Powered by WordPress” “white hat SEO”

You should use keywords that you would expect to find on the type of sites you are scraping. If you want to scrape niche related sites only, you would use a list of keywords that are related to your own niche.

Scrape sites that accept guest posts

To demonstrate how you might use Scrapebox for white hat scraping, consider scraping sites that accept guest posts.

First you need to decide on a few footprints that would be good for scraping sites for guest blogging. Here are some examples:

  • allintitle: guest post guidelines
  • allintitle: guest post requirements
  • allintitle: guest post submission form
  • allintitle: submit guest post

You can then use niche related keywords in your keyword list, which will combine with these footprints and scrape niche sites that accept guest posts.

Scrape sites to manually build links on

Directories and communities with open registration are still viable link building assets. Although less effective today and often requiring a tiered/pyramid link building structure to be fully effective, these sites are excellent targets for manual link building.

With so many different varieties of content and ways it can be reformatted, there are lots of directories you can utilize. Below you’ll find some examples of how you can search for these directories using footprints in Google.

Footprints:

  • allintitle: submit share article
  • allintitle: upload share powerpoint
  • allintitle: upload share pdf document
  • allintitle: upload share infographic
  • allintitle: upload share audio

– You can change the content/directory type such as searching for “videos” instead of “articles.”

– You can also change the keywords such as searching “submit” instead of “upload.”

Think Outside the Box

Scrapebox is very effective at what it does, particularly with scraping websites. However, you really have to think outside the box (pun intended) to get the most value out of this tool.

While the above scraping techniques are great and enough to keep you busy for quite some time, don’t hesitate to try out new things and think up innovative uses for it. Scrapebox is considered the Swiss army knife of SEO tools and it’s really something you can apply in unique and creative ways that are exclusive to your own needs.


Dylan

I’m 25, a software developer and I’m excited to be here and provide transparency to everything I do online. I build, grow and sustain profitable web-based businesses and I’ll continue to use this platform to strengthen my own accountability and provide others with useful insight and data.

Leave a Reply

Your email address will not be published. Required fields are marked *

*