Referrer Spamming
January 14th 2010
The Problem of Referrer Spamming Begins...
In the process of checking our web stats I noticed a couple of odd links showing up as external links
(referrers). Some of the sites listed didn't look any where near valid so I decided to pull the HTML for the pages with a custom application I had created. After reviewing the HTML, CSS, JavaScript and images for the site I concluded that there weren't actually any links from the site in question back to our site. It was about this time that I noticed that the content for the site was not something in line with what our company would support or condone.
Something had to be done!
What did Google know about this referrer
spammer?
I decided I would double check the questionable site with
Google's "link:" function. Once again, I didn't find any legitimate links from the questionable site back to our site. I also found that the site appeared to be banned from Google all together. These discoveries lead me to think about how the reference from the questionable site to ours may be negatively affecting our ranking. I needed to find a solution to the problem.
Internal Research on Referrer Spamming
I discussed multiple possibilities with the other WSI departments including the possibilities of banning a referring site IP, the stats system incorrectly reporting link backs and a virus. None of the possibilities checked out, so I went back to the drawing board.
I did a trace route on their site to check out if some where along the way a DNS server was messing up a reference, I have seen this happen before when an IIS default site isn’t setup correctly. This was another dead end.
A lookup on
Network Solutions also resulted in little additional information as the questionable site had used a DNS registration proxy to avoid having to supply accurate registration information.
By searching Google and various web developer forums, I found that other people with various configurations for
web stats were also seeing random results in their
referrer links. While there were many people posting problems, there were very few technical answers. Many of the solutions or proposed causes were little more than a virtual shoulder shrug.
The Solution to Web Stats Referrer Spamming
Just as I was about to leave a particular thread, that seemed like another dead end, I found the holy grail of posts related to this issue. While the post itself only read “You are getting referrer spammed.” it held the key to solving the problem.
Of course! This was a very tricky way of manipulating the system that I didn’t think of. I started digging on referrer spamming and how it related to
web stats systems. The
Wiki on referrer spamming gave me the information I needed.
Security Hole
Late last year we changed our server stat program for our corporate site and didn’t
replace the username and password on the site. The stats for our site were public to the world,
but only if you knew the correct URL. This didn’t
seem like a big deal, but most "security by obscurity" solutions are often left vulnerable because they don’t seem like a big deal.
How the Referrer Spamming was Working
What was happening was the questionable site had somehow found our public stats page and was running a robot to report false information to our server. This false information was seen in the form of an external link and registered in our stats program as a link to the referring page.
So what does that mean? What the questionable site was doing was not trying to refer to us; it was trying to get our web stats page to refer to them. If our stats pages were then indexed by a search engine their link would be seen on our web stats page and counted as a referring link to them. Tricky! Fortunately, our web stats pages aren’t listed in any search engine so any benefit they thought they were was minimal.
The Fix for Referrer Spamming
For the solution, I had our network administrator replace the missing password and place a robots.txt file with appropriate instructions for spiders to not index the stats system. Now it’s a waiting game to see how long the other site continues to try to “cheat” the system. Since there is no way for me to turn off their robot, they can still spam the server stating that they are linking to us.
I plan to further research this issue by seeing if I can identify exactly what IP the robot is coming from and ban it from our servers.
Web Department Manager