You know what really grinds my gears? Opening up a report in Google Analytics and having to deal with referral spam. In this post, I’ll tell you how to deal with referral spam and why it’s dangerous.
referral spam grinds my gears

In the past year, I’ve noticed an alarming trend of referral spam creeping into my Google Analytics reports. Referral spam is the practice of sending bogus referral traffic to a website or product. It may sound relatively harmless, but referral spam is quickly turning into a serious issue.

TYPES OF REFERRAL SPAM

In the context of Google Analytics, referral spam comes in two main flavors: spammy web crawlers and ghost referral traffic.
Web crawlers are robots that visit websites, usually with the intention of indexing content. Most web crawlers identify themselves as such to web servers and are then left out of analytics reports. However, some web crawlers like those from Semalt (boo!) don’t identify themselves as robots and end up showing up in analytics reports as sessions with a 100% bounce rate and 0 second duration. Google recently introduced a feature to filter out known bots and spiders, though it’s definitely not perfect (more on that later).
Ghost referral traffic, arguably the greater of the two referral spam evils, never actually visits a website. In these cases, spammers exploit the fact that Google Analytics now transfers information via HTTP requests directly to Google Analytics servers, meaning someone can “spoof” a session very easily. Ghost referral traffic can be generated by a simple program that sends fake HTTP requests aimed at different Google Analytics properties, so this traffic doesn’t even hit your site. Even more annoying is the fact that this type of spam can be used to spoof organic search results, as well. See the screenshot below for an example:

Google Analytics Spam Traffic Report
Note: For ghost referral traffic, modifying .htaccess won’t help at all since these spammers never actually visit your site -- for more information view Google's Measurement Protocol documentation.

NEGATIVE IMPLICATIONS


“A referrer is a simple HTTP header that's passed along when a browser goes from one page to another page, normally used to indicate where a user's coming from. But users can change it, and some people will set referrer at pages they want to promote and visit tons of people around the web -- people see it and say 'Oh, I should check it out'. It's not necessarily a link… there are some people who try to drive traffic by visiting a ton of websites with an automated script and setting the referrer to be the URL they want to promote... there's no 'authentication'… You can’t automatically assume that it was the owner of the URL if you see something showing up in your dashboard. Somebody is trying to do some hijinx.”
Matt Cutts, Head of Google Webspam Team

So, why is referral spam so bad? For one, it’s screwing up my web analytics data. “Sessions” entering via referral spam skew the data, clouding the accuracy of engagement metrics and inflating traffic volume metrics. Unfortunately, those unaware of spam issues may base decisions based on inaccurate data, especially for sites with low traffic.
Moreover, referral spam makes SEO more difficult for everyone. One aim of referral spam is to have links from sites that publish their access logsSome websites publish web analytics data publicly, which can include hyperlinks back to the spammer’s designated URL. These backlinks can improve search engine results for that URL since many websites publishing referrer data are presumably trustworthy.
There are also more nefarious opportunities available to referral spammers. If a spammer wanted to send a website unwanted and unqualified traffic, they could simply change the name of the referral URL to the victim’s URL. As mentioned in the above quote from Matt Cutts, referral spam can’t truly be “authenticated” and tracked back to a specific source. With this in mind, referral spam could be used to harm reputations, possibly framing an innocuous website as a spam referrer.
Exposure to malware is another potential threat to anyone curious enough to visit referral spam addresses. With the rise of electronic data theft, it would be simple for referrer spam networks to point to URLs containing malicious software aimed at stealing valuable information.
Finally, no one wants to be advertised to while looking at web analytics acquisition reports.

SOLUTIONS

Within Google Analytics, there are multiple options to remove referral spam:

Exclude Foreign Hostnames and Filter Spammy Crawlers

One defining attribute of many ghost referrals is an inaccurate hostname attribution. When reviewing referral data in Google Analytics, the hostname will be completely unrelated to your website (e.g., “apple.com”). With this knowledge, it’s relatively simple to create a filter to only include data with an accurate hostname. For Google Analytics users using only one or a handful of domains, this solution may be the simplest (check here for a quick refresher on regular expressions in GA):
Google Analytics Hostname Filter
In most cases, substituting your top domain name for example.com will be sufficient. For multiple domains, check your regular expressions with Regex Pal.
That first filter will remove any ghost referral traffic. However, an additional filter will also be required to remove spammy web crawlers (like Semalt) since they actually visit the site and will report an accurate hostname. A filter to remove the two most popular web crawler offenders can be seen below:
Google Analytics Web Crawler Referral Filter
Featured Regular Expression:
  1. .*(semalt|buttons\-for\-website)\.com.*
Note: You should always retain an unfiltered view, as data processed by GA filters cannot be reverted.

Filter All Referral Spam Sources

In cases where domains in a measured view can easily change, blocking referral spam may require a more exhaustive referral filter encompassing all offending referral sites. Over the past few months, I’ve created a list of offending sites and updated the filter accordingly, as seen below. As a quick caveat, while this list targets many of the offending referral spam sources, it’s by no means an exhaustive list. With the discovery of more spam referrals, I've updated the regular expressions below the image, and this solution will now require two Exclude Referral filters.

Google Analytics Spam Referral Filter
Featured Regular Expressions:
  1. .*((darodar|priceg|semalt|buttons\-for\-website|makemoneyonline|blackhatworth|hulfingtonpost|bestwebsitesawards|o\-o\-6\-o\-o|(social|simple\-share)\-buttons)\.com)|((ilovevitaly|econom)(\.co(m)?|\.ru))|((humanorightswatch|4webmasters)\.org).*
Update - I've added another regular expression since the first one has reached the 255 character limit.
  1. .*best\-seo\-solution\.com.*

Advanced Segments for Historical Data

Since filters only process data moving forward, use advanced segments to review historical data from before filters were implemented. Similar to the above solutions, decide which approach is most appropriate for your site and use regular expressions to remove sessions from referral spam, as seen below::
Google Analytics Advanced Segment
Featured Regular Expressions:
  1. .*((darodar|priceg|semalt|buttons\-for\-website|makemoneyonline|blackhatworth|hulfingtonpost|bestwebsitesawards|o\-o\-6\-o\-o|(social|simple\-share)\-buttons)\.com)|((ilovevitaly|econom)(\.co(m)?|\.ru))|((humanorightswatch|4webmasters)\.org).*
Update - I've added another regular expression since the first one has reached the 255 character limit.
  1. .*best\-seo\-solution\.com.*
Note: Advanced Segments can be applied retroactively to historical data, while Filters only process data moving forward. If unfamiliar with segments and filters, a quick comparison summary between the two can be found here.

Bot Filtering within View Settings

In July 2014, Google introduced bot and spider filtering to give users more accurate data. From the admin view interface, you can select this option, as seen below. This will exclude any sessions named in the IAB known bots and spiders list (at no extra cost for you).
In theory, this is great! However, this feature is still new, and we’re still seeing referral spam from some web crawlers make its way through the bot and spider filtering. That said, there's no harm in checking the box, especially if Google decides to introduce more functionality to this feature.
Google Analytics Bots and Spiders
For those familiar with Google Tag Manager, I'd highly recommend reading Sayf Sharif's post Eliminating Dumb Ghost Referral Traffic in Google Analytics.
 

LIST OF OFFENDING SITES

The current list of offenders includes:
  • semalt.com
  • buttons-for-website.com
  • darodar.com
  • priceg.com
  • makemoneyonline.com
  • blackhatworth.com
  • hulfingtonpost.com
  • bestwebsitesawards.com
  • o-o-6-o-o.com
  • ilovevitaly.com
  • simple-share-buttons.com
  • social-buttons.com
  • best-seo-solution.com
  • econom.co
  • ilovevitaly.co
  • ilovevitaly.ru
  • humanorightswatch.org
  • 4webmasters.org

THIS ISN’T A LONG-TERM SOLUTION

Unfortunately, the solutions above are just short-term band-aids at the moment. As spammers innovate, users of products like Google Analytics are in danger of falling prey to bogus referral traffic through more sophisticated means. Google and other web analytics providers will, hopefully, create new mechanisms to combat this referral spam. However, without some serious changes to the current system, the world of web analytics may be in for some unpleasant surprises. To improve your GA setup in other ways, check out some of our other posts like: 15 Ways Your Google Analytics Might Be Broken.


For reasons why not to use the Referral Exclusion List and much more on referral spam in general, check out Mike Sullivan's guide: http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/
For a great visual walkthrough of referral spam solutions, view Carlos Escalera Alonso's recent write-up: http://www.ohow.co/block-referrer-spam-list/

Post a Comment

 
Top