ErosBlog

The Sex Blog Of Record
 
 

Adult Tumblrs Hidden From Search (Again)

Monday, July 31st, 2017 -- by Bacchus

robots forbidden

Back in June when Tumblr announced that blogs containing “primarily explicit content” would no longer be visible on the open web (but only to logged-in Tumblr users), I wrote:

Although the email does not say so, I predict that explicit-content blogs will go back to flying that involuntary robots.txt that makes them invisible to the search engines, too. No more outside search-discovery for Tumblr porn!

That day is here, and it gives me no pleasure to announce that I was right. A reader forwarded the email they got from Tumblr:

We’re contacting you to let you know that your Tumblr has been marked as containing explicit content. This means it won’t be visible to minors, people who are using Tumblr in Safe Mode, and people who aren’t logged into Tumblr.

No mention of search, but when I went to check the robots.txt on their Tumblr, sure enough:

tumblr disallows search robots

As I explained in Thou Shalt Not Search Adult Tumblr Blogs back in 2013 when Tumblr first tried to sneakily hide all the porn blogs:

In robot, that means, roughly “All robots: stay out!” No search spiders allowed. No Internet Archive crawler. The tumblr is there, but you have to know about it, or you have to be linked to it. You won’t find it in Google, you won’t find it in any other search engine that honors robots.txt, and when Tumblr decides to stop hosting it, you won’t find the pages in the Wayback Machine – it will be gone for good, lost to humanity unless somebody with the technical chops and outlaw sensibilities of Archive Team finds a way to archive it anyway, robots.txt be damned.

So it is now official. The ghetto walls are up and the gates are closed. The adult-Tumblr community is no longer part of the open web. The #pornocalypse has claimed another social media victim.

Image credit: the graphic at the top of the post has been adapted from part of a panel that appeared in Action Comics #292 (1962).

Similar Sex Blogging:

 

Thou Shalt Not Search Adult Tumblr Blogs

Wednesday, May 15th, 2013 -- by Bacchus

If you’ve got an adult blog on Tumblr, there’s a good chance Tumblr uses robots.txt to exclude the search engines from indexing it. Did you know that?

Two weeks ago in The Pornocalypse Comes For Us All, I wrote:

Who is next? My guess would be Tumblr. Tumblr is, of all the big platforms, perhaps the most porn friendly; there’s lots of porn on there and the Terms of Service do not prohibit it… But Tumblr is, famously, a popular platform in search of a revenue-generating business model. And we’ve learned that the suits have no loyalty to the porn users who made their platform popular. So, my bold prediction is that as Tumblr casts about for a business model, one of their steps will be to “clean this place up”…

And now, guess what? I’ve discovered that Tumblr uses robots.txt to bar all search engine access to blogs flagged as adult. If you’ve got an adult Tumblr, go look at your own settings. Do you see that first checkbox, the one that says “allow search engines to index your blog”?

misleading tumblr settings showing adult blogs as visible to search engines when they are not

That checkbox is a lie. It’s nicely checked, it’s not greyed out, but if your blog is flagged “adult” it’s a lie. Do you see the “Learn more about what this means” link under “Your blog was flagged NSFW” selector? It leads to this page, where Tumblr requests users to appropriately self-flag their blogs:

Please respect the choices of people in our community and flag your blog as NSFW or Adult from your blog Settings page.

  • NSFW blogs contain occasional nudity or mature/adult-oriented content.
  • Adult blogs contain substantial nudity or mature/adult-oriented content.

If you’re not sure if you should flag your blog you can leave it unflagged, but keep in mind that we might flag it later if we see a lot of mature/adult-oriented content.

To answer the question “What happens to blogs that are flagged NSFW or Adult?” Tumblr offers this handy chart. The key piece of information is the white space indicated by my red superimposed arrow:

tumblr chart showing that adult blogs are not indexed by Google no matter what preference the user has expressed

That’s right — where the “Blog indexed by Google” row intersects the “Adult Blogs” column, we find a ringing silence.

Would you have noticed? None of the adult Tumblr bloggers I know ever did. I knew from my porn researching that adult Tumblrs tended to be poorly represented in Google search results, but I chalked it up to the sheer scale of Tumblr and Google’s growing bias against returning porn search results. Nope, I found out the truth in one stark moment of astonishment, summed up by this image:

Internet Archive Wayback Machine page showing a Tumblr blog where robots.txt is blocking access

Let’s click the “See wickedknickers.tumblr.com robots.txt page” link:

a sample robots.txt for an adult tumblr showing that all user agents are forbidden

From me: Aghast. Fucking. Gulp.

In robot, that means, roughly “All robots: stay out!” No search spiders allowed. No Internet Archive crawler. The Wicked Knickers tumblr is there, but you have to know about it, or you have to be linked to it. You won’t find it in Google, you won’t find it in any other search engine that honors robots.txt, and when Tumblr decides to stop hosting it, you won’t find the pages in the Wayback Machine — it will be gone for good, lost to humanity unless somebody with the technical chops and outlaw sensibilities of Archive Team finds a way to archive it anyway, robots.txt be damned.

Wicked Knickers is just an example, one that has some meaning to me because it’s one of the first Tumblr blogs I ever noticed, and I’ve been linking to it since 2010. That’s almost 6,000 vintage erotica posts since January 2009, and none of those pages are in Google or the Wayback Machine. It was only when I twigged to that anomaly that I finally understood what Tumblr is doing to adult blogs.

In all the years that I’ve been preaching Bacchus’s First Rule (“Anything worth doing on the internet is worth doing on your own domain that you control”), I’ll confess that I never considered the power of robots.txt, or what it means to be putting stuff on an internet site where somebody else controls what robots.txt says. Not only do they control your visibility to search engines, they control whether history will remember what you said. That strikes me as a high price to pay for a “free” blogging platform.

It’s worth noting that there’s still rather a lot we don’t know about the Tumblr robots.txt blockade on adult Tumblr sites. Unanswered questions include:

  • Does Tumblr have any flexibility on this? Would their support, if asked, remove or modify the robots.txt barrier in specific cases?
  • When did Tumblr start using robots.txt to block Google from adult blogs? Has it always been like this, or is it a recent innovation?
  • Why does Tumblr display the misleading checkbox that falsely implies that search engines can see flagged adult blogs?
  • What is the actual reason for excluding adult Tumblrs from search engine and (especially) archive crawls?

In an unusual move for me, I actually reached out to press@tumblr.com, told Tumblr I was going to write this post, and asked them for answers to those questions. That was on May 11th. No response so far. If they ever do answer, I’ll be sure to update this post.

Similar Sex Blogging:

 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 
cupid