Google forgetting stuff: wiping older sites right out of its index like cleaning a blackboard

As if we didn’t have enough problems, there’s a mounting body of evidence that Google now has an attention span somewhat shorter than ten years. After ten years or so, Google forgets things. Or, perhaps, Google just can’t be bothered to index these older web pages, because there’s no money in it.

A commenter mentioned this after my post wherein I spoke of the pain of the Kink.com transition to their “new” (2016) Kink Unlimited product that broke many hundreds of my old links. It turns out that blogging pioneer and web-bones architect Tim Bray noticed the Google-dementia phenomenon about a year ago, writing that “Google has stopped indexing the older parts of the Web.”

Bray had discovered that his old blog posts weren’t turning up in Google searches even when he chased them with extremely precise search terms. I had noticed the same thing, but I assumed it was the “Google hates porn” filter that was killing me. (More on this later.)

Bray also noticed that Bing and Duck-Duck-Go were finding his old posts just fine. The implication is that it’s not some inherent “the web has gotten too big to index” problem, but rather it’s a deliberate choice by Google to focus on newer, fresher material. Bray:

My mental model of the Web is as a permanent, long-lived store of humanity’s intellectual heritage. For this to be useful, it needs to be indexed, just like a library. Google apparently doesn’t share that view.

Indeed.

A couple of days later, Marco Fioretti expanded on Bray’s post with his own examples of the things Google forgets, and had this additionally to say:

Unless we’re all missing something here, it seems more correct to say that Google forgets stuff that is more than 10 years old. If this is the case, Google will remember and index a smaller part of the web every year. Google may do so simply because it would be impossible to do more, for economical and/or technological constraints, which sooner or later would also hit its competitors. But this only makes bigger the problem of what to remember, what to forget and above all who and how should remember and forget.

Neither Bray nor Fioretti applied the term “dementia” to Google. I got that term from an earlier (2017) blog post by open-data maven Tony Hirst, that was referenced in the comments on Bray’s post. Hirst posits that Google is getting both paranoid (because of SEO and other factors) and forgetful. To Hirst, Google seems rooted in the past, crediting signals of link authority that people are mostly not using these days (publication of links on websites) and not able to properly weight or remember the social media signals that accompany most links modernly. It’s a different problem to be sure from the one that Bray and Fioretti highlighted, but the terminology seems applicable here too.

My observations, from my perspective inside the adult/porn parts of the web, are parallel with Hirst’s. Google’s digital dementia is even more severe with respect to adult URLs, because our #pornocalypse-driven exclusion from so much social media means that our links are automatically absent from so many of Google’s modern page quality signals and ranking algorithms.

Here’s my own example, showing the type of digital dementia Bray highlighted. There’s an ErosBlog post from 2005 called Dildoes In the Subway (that’s the post title.) As of this writing, if you search for those four words in quotes, Google will admit to knowing of four places on the web — including three on ErosBlog — where that phrase exists, but Google doesn’t seem to know that the post itself exists:

Google digital dementia search result

Bing? Bing still has possession of all its faculties, and returns the proper post as the first search result:

bing can find it

I’ve been seeing this phenomenon for years, but honestly? I just assumed it was a porn thing. Google hates stinky porn sites like mine, and is always pretending not to know about pages that are actually in its index. Usually what this means is that you haven’t used enough “porn words” in your search query to convince Big Brother Google that you realio-trulio want a porn result, so the porn result is being hidden from you for your own good. But that’s probably not the case here, because “dildoes” ought to be porny enough. And anyway, we can test this; adding the “site:erosblog.com” search filter should override the “it’s for your own good” anti-porn filters:

Google forgot the dildoes

Nope! Google is being adamant here; it knows of three places on ErosBlog that mention this post, but the post itself? Not in the Google index any more.

Just in case you’re skeptical or curious, though, here’s what it looks like when you’re searching for an ErosBlog page that actually is (unlike the Dildos In The Subway page) in Google’s dementia-ridden memory, only Google doesn’t want to show it to you, because stinky porn. I wrote a post in 2005 called The Pony Girls Of Ancient Egypt that contains the unique-on-the-web (until I hit the publish button on this post) phrase “a charioteer boffing a woman”.

Google knows about it. Google hasn’t forgotten it. Google has the charioteer-boffing in its index, all right:

google search result for charioteer boffing

But apparently “boffing” is an insufficiently pornographic word to signify that I am an adult who wants to see porn, genuinely and truly. Because, even though I have all the so-called “safe search” settings turned as far off as Google will allow these days, here’s what Google pretends to know about my Egyptian pony girls once I remove the site:erosblog.com search constraint. That’s right, it’s Sergeant Schultz time: they know nothing! Pony girls? Boffing charioteers? New phone, new search engine, who dis?

new phone, who is this -- google knows nothing

Increasingly I find myself going to Bing when I need completeness in a search result. Google’s digital dementia, it turns out, is part of why that has become necessary.