Wednesday, May 15th, 2013 -- by Bacchus
If you’ve got an adult blog on Tumblr, there’s a good chance Tumblr uses robots.txt to exclude the search engines from indexing it. Did you know that?
Two weeks ago in The Pornocalypse Comes For Us All, I wrote:
Who is next? My guess would be Tumblr. Tumblr is, of all the big platforms, perhaps the most porn friendly; there’s lots of porn on there and the Terms of Service do not prohibit it… But Tumblr is, famously, a popular platform in search of a revenue-generating business model. And we’ve learned that the suits have no loyalty to the porn users who made their platform popular. So, my bold prediction is that as Tumblr casts about for a business model, one of their steps will be to “clean this place up”…
And now, guess what? I’ve discovered that Tumblr uses robots.txt to bar all search engine access to blogs flagged as adult. If you’ve got an adult Tumblr, go look at your own settings. Do you see that first checkbox, the one that says “allow search engines to index your blog”?
That checkbox is a lie. It’s nicely checked, it’s not greyed out, but if your blog is flagged “adult” it’s a lie. Do you see the “Learn more about what this means” link under “Your blog was flagged NSFW” selector? It leads to this page, where Tumblr requests users to appropriately self-flag their blogs:
Please respect the choices of people in our community and flag your blog as NSFW or Adult from your blog Settings page.
- NSFW blogs contain occasional nudity or mature/adult-oriented content.
- Adult blogs contain substantial nudity or mature/adult-oriented content.
If you’re not sure if you should flag your blog you can leave it unflagged, but keep in mind that we might flag it later if we see a lot of mature/adult-oriented content.
To answer the question “What happens to blogs that are flagged NSFW or Adult?” Tumblr offers this handy chart. The key piece of information is the white space indicated by my red superimposed arrow:
That’s right — where the “Blog indexed by Google” row intersects the “Adult Blogs” column, we find a ringing silence.
Would you have noticed? None of the adult Tumblr bloggers I know ever did. I knew from my porn researching that adult Tumblrs tended to be poorly represented in Google search results, but I chalked it up to the sheer scale of Tumblr and Google’s growing bias against returning porn search results. Nope, I found out the truth in one stark moment of astonishment, summed up by this image:
Let’s click the “See wickedknickers.tumblr.com robots.txt page” link:
From me: Aghast. Fucking. Gulp.
In robot, that means, roughly “All robots: stay out!” No search spiders allowed. No Internet Archive crawler. The Wicked Knickers tumblr is there, but you have to know about it, or you have to be linked to it. You won’t find it in Google, you won’t find it in any other search engine that honors robots.txt, and when Tumblr decides to stop hosting it, you won’t find the pages in the Wayback Machine — it will be gone for good, lost to humanity unless somebody with the technical chops and outlaw sensibilities of Archive Team finds a way to archive it anyway, robots.txt be damned.
Wicked Knickers is just an example, one that has some meaning to me because it’s one of the first Tumblr blogs I ever noticed, and I’ve been linking to it since 2010. That’s almost 6,000 vintage erotica posts since January 2009, and none of those pages are in Google or the Wayback Machine. It was only when I twigged to that anomaly that I finally understood what Tumblr is doing to adult blogs.
In all the years that I’ve been preaching Bacchus’s First Rule (“Anything worth doing on the internet is worth doing on your own domain that you control”), I’ll confess that I never considered the power of robots.txt, or what it means to be putting stuff on an internet site where somebody else controls what robots.txt says. Not only do they control your visibility to search engines, they control whether history will remember what you said. That strikes me as a high price to pay for a “free” blogging platform.
It’s worth noting that there’s still rather a lot we don’t know about the Tumblr robots.txt blockade on adult Tumblr sites. Unanswered questions include:
- Does Tumblr have any flexibility on this? Would their support, if asked, remove or modify the robots.txt barrier in specific cases?
- When did Tumblr start using robots.txt to block Google from adult blogs? Has it always been like this, or is it a recent innovation?
- Why does Tumblr display the misleading checkbox that falsely implies that search engines can see flagged adult blogs?
- What is the actual reason for excluding adult Tumblrs from search engine and (especially) archive crawls?
In an unusual move for me, I actually reached out to press@tumblr.com, told Tumblr I was going to write this post, and asked them for answers to those questions. That was on May 11th. No response so far. If they ever do answer, I’ll be sure to update this post.
Similar Sex Blogging: