Natural Search Blog Matt Cutts reveals underscores now treated as word separators in Google

After the recent WordCamp conference, Stephan Spencer reports here ^[1] and here ^[2] that Matt Cutts ^[3] stated that Google now treats underscores as white-space characters or word separators when interpreting URLs. Read on for more details and my take on it…

Previously, if a developer created a page name containing underscores like “Coca_Cola.html”, it wouldn’t have been considered as close a match to keyword searches for “Coca Cola” as pages designed with more classically-accepted white-space characters like periods, commas, dashes, colons, and semicolons. “Coca-Cola.html” would’ve matched the keyword search much more closely. With this change, Google now likely treats both “Coca_Cola.html” and “Coca-Cola.html” as equally relevant to web searches for “Coca Cola”. This is important to rankings because exact-matches of terms are more likely to rank higher than fuzzy-logic matches.

Of course, the other search engines may not be changing their interpretation of underscores, so it still may be important to not use underscores as white-space characters for the sake of cross-engine optimization.

Even more importantly from my point of view, this announcement reveals that Google is indeed paying attention to keywords in URLs â€” something that was previously somewhat open for speculation in SEO circles.

Matt further states that the file extension of your page doesn’t matter to Google â€” .php, .html, .htm, .asp, .aspx, .jsp etc. The one exception is .exe â€” Google doesn’t want to link directly to executables from their page results.

The major takeaway is: your pagename URLs shouldn’t be esoteric ID numbers or just generic names like “index.jsp”, “file.php”, or anything like that â€” they should be named meaningfully after the primary contents of the pages. This would provide an additional bit of signal weight for your primary keywords, perhaps giving the page a little more chance to rank higher for searches for those words.