How To Rescue 60 Terabytes Of Tumblr Porn
Wednesday, December 5th, 2018 -- by Bacchus
My post this morning gave you the tools to backup and save your own porn tumblr, plus perhaps a handful of others. Good. You’ve carried your fully-loaded book bag safely out of the burning library. Now what else can you do?
Say hello to our rogue archivist friends at Archive Team. Powered by “rage, paranoia, and kleptomania” in all their data-rescue efforts, their Tumblr project page and associated IRC channel indicate that they’re rumbling and grumbling into full power-up mode on this Tumblr #pornocalypse crisis. Their previous data-rescue exploits are legendary. They can’t get it all, but they are going to get a bunch. How can we, how can you, help?
I’ve been monitoring their public-facing channels. From what I can tell, here’s what’s going on. They are tweaking the scripts for the software they use for distributed scraping and downloading. (They call it “Archive Team Warrior.”) When Warrior is ready — hopefully real soon now — if you’ve got a good internet connection you could help by running an instance of that.
But even before it starts to run, they need a list of Tumblr porn blogs to feed into the Warrior. They’ve already got lists of course, massive ones. But the lists are far from complete. They want more and better ones. There’s a web form here where you can contribute your favorite porn Tumblr URLs. The form accepts ten URLs at a time, but you can fill it out as many times as you like. And if you have a much bigger list? Just paste it up online somewhere and paste the URL to that into the form — they’ll make it work.
All told, based on past experience, they figure they have time and bandwidth and volunteers enough to rescue maybe 60 terabytes of porn blogs. That’s enormous — but it’s only a tenth, maybe a twentieth, of the total amount of the porn on Tumblr. (Everybody is guessing; these are wild-ass guesses.) Still, it will be a massive save if it works. What will happen to all that data?
Eventually — and these things can take a lot of time — the idea is that rescued blogs can end up in the Wayback Machine at the Internet Archive. Every terabyte that winds up in the Wayback machine adds, they estimate, about $1500 to $2,000 in long term cost. Archive.org donations are needed, and welcome.