Natural Search Blog Recent Google Improvements Fail To Halt Massive Malware Attack

Various news sites are reporting ^[1] that a malware attack was deployed in the last couple of days, apparently based entirely upon black hat SEO tactics.

Software security company Sunbelt blogged about ^[2] how the attack was generated: a network of spambots apparently added links into blog comments and forums pointing to the bad sites over a period of months in some cases, enabling those sites to achieve fair rankings in search engine result pages for a great many potential keyword search combinations. The pages either contained iframes which attempted to load malware onto visitors machines or perhaps they began redirecting to the sites containing malware at some point after achieving rankings. Sunbelt provided interesting screenshots of the SERPs in Google:

Malware in SERPs
(click to enlarge)

And also showed some screenshots of some of the keyword-stuffed pages which apparently got indexed:

Malware site page
(click to enlarge)

I think it’s not at all a coincidence that the attack was timed to occur right on the first weekend of the holiday shopping season and Cyber Monday when more people are likely conducting keyword searches than any other time of year. Deploying the malware now was likely intended to infect as many computers as possible before the malware was detected and the sites deleted from listings.

The methods these unethical developers used are pretty “classic” black-hat tactics. For many years now, blackhat optimizers have used automated agents to insert keyworded textlinks into blog and forum comment areas and online guestbooks, pointing back to their sites in an effort to built PageRank. In addition, really old and crusty black hat techniques include keyword stuffing — adding tons of keywords on a page in an effort to make the page relevant for words and phrases. Also, the bait-and-switch technique of allowing one page to get indexed by search engines while redirecting human users to a different URL is pretty well known.

In recent months, Google has apparently ^[3] been working particularly industriously to penalize more sites that may be buying/selling links or which may be involved in various linking schemes. So much so, that there’s been considerable talk about how some of the affected sites may’ve been unfairly red-flagged by bad assumptions made by their algorithmic policing software. So, it’s disappointing that a network of egregious malware sites were able to effectively employ legacy black-hat tactics which ought to’ve been detectable earlier.

It feels a bit like having the police devote all their time to writing minor speeding tickets while violent murders are happening!

Now, to be fair, any site which appears on the level could suddenly start redirecting to a bad location, and there’d naturally be a period of time before the search engine bots re-spider the page and realize that there’s malware on it. During that window of time between when it was first spidered while appearing alright and the time later when it starts launching evil, it could naturally continue to appear in the SERPs where innocent people could click on it and get infected. Also, the term combinations that Sunbelt cited were moderately arcane in some cases, so average users might not’ve been impacted by any significant numbers. It could also be that Sunbelt might well be hyping-up the issue in order to get attention for themselves, so you have to consider their assessment as possibly non-objective.

Even so, just the fact that this rather pedestrian combination of black-hat tactics could be used to effectively poison search results with malware listings is significant and disturbing.

Why wasn’t the comment spam detected early on? One assumes that the slow accretion of links over months may not’ve set off alarms, or perhaps the comment text added was made to be cleverly relevant.

And, the spam-laden content of the pages looks blatantly unnatural to me — that should’ve also been detectable.

And how about perhaps being suspicious of gobbledy-gook domain names? And, domains ending in “.CN”? I know gobbledy-gook in of itself might be hard to detect (particularly considering all the gobbledy-gook that still slips past spam filters on email) and unclear in of itself if it represents a bad content site, but you’d perhaps expect that one could tell whether the character strings contained patterns which match names/words by some percentage of fuzziness, and red-flag those that don’t match more normal naming patterns — associate lower trust scores or quality scores with them.

Even sadder, some of the domain names involved were so new they should’ve easily been detectable and flagged as suspicious just on that basis alone. For instance, I just looked up registration info for one of the sites IDed by SunBelt, luewusxrijke.cn, and found that it’d only been registered on November 24th! Why didn’t registrar status provide enough distrust to “sandbox” these sites?

hcg injections ^[4]