Trying Harder At The Turing Test
Tuesday, November 22nd, 2011 -- by Bacchus
The spam robots, they try harder and harder.
Yesterday there was a trackback “comment” in my moderation queue, commenting on my post from six weeks or so ago called Son Of “Anything Worth Doing…”. The quoted snippet (which in a trackback comment represents a chunk of what the remote blog post said about the post being commented on) looked like this:
[…] Eros Blog’s contention of a amicable media issue isn’t pornographic. Yet it’s framed on a page that includes dual bare paintings and several links to sites that are possibly publishing or deliberating adult passionate issues. […]
Talking about me, we like; using full sentences, we like. Already a better comment than 99% of robots and a not-insubstantial fraction of human commentors. This one looks set to make the moderation cut. But there’s a nonsense fraction (“a amicable media issue”?) that raises a mental flag…
That’s the best I can recollect of my mental processes during the time it took me to navigate my mouse to the approve/disapprove buttons. By the time my mouse was hovering over the approve button, the jangle of “amicable media” had blocked up my clicking finger. Instead, I looked at the linking URL, and followed it.
Throw an “http:” and an “//” in front of carrie.zoko.in/?p=16 if you want to see it. What you’ll find is a fairly garbled essay on a blog with a default theme. The paragraphs about ErosBlog look like this:
Recently we wrote about how Facebook and other vital amicable networks bluster a open and eccentric web. This essay wasn’t pornographic. It was a contention of a record process issue.
Several websites picked adult a contention and wrote their possess views on a issue. One was Eros Blog, a site that by a possess outline is about “sex blogging, tributary nudity, eccentric sex [and] various sensuality”.
Eros Blog’s contention of a amicable media issue isn’t pornographic. Yet it’s framed on a page that includes dual bare paintings and several links to sites that are possibly publishing or deliberating adult passionate issues.
The ISPs web patience filters will roughly positively retard entrance to Eros Blog and take a post about my amicable media views with it. There will be one reduction place on a web where people can confront my ideas and other people’s perspectives on them. Not my porn ideas. Not my adult calm ideas. My ideas about record and how to run a web.
It’s nonsense, but it’s far from total nonsense. It has human sentence structure. It’s almost like something written in another language and run through Google Translate, but the grammar is too good for that and the wrong words are not wrong enough. There’s plenty of meaning — and the arc of an argument — to be discerned, even though some of the words make no sense at all, in context. What’s going on here?
And then it hit me — I’ve seen these paragraphs before.
In fact, I’ve seen them — a version of them — in another blog post that comments on that same Son Of Anything blog post. There’s already an approved trackback comment on that post, and it links back to this blog which contains the paragraphs in the original:
Recently I wrote about how Facebook and other major social networks threaten the open and independent web. This article wasn’t pornographic. It was a discussion of a technology policy issue.
Several websites picked up the discussion and wrote their own views on the issue. One was Eros Blog, a site that by its own description is about “sex blogging, gratuitous nudity, kinky sex [and] sundry sensuality”.
Eros Blog’s discussion of the social media issue isn’t pornographic. Yet it’s framed on a page that includes two nude paintings and several links to sites that are either pornography or discussing adult sexual issues.
The ISPs web blocking filters will almost certainly block access to Eros Blog and take the post about my social media views with it. There will be one less place on the web where people can encounter my ideas and other people’s perspectives on them. Not my porn ideas. Not my adult content ideas. My ideas about technology and how to run the web.
So I repeat, what’s going on here?
Near as I can tell, it’s the end result of a robot that’s trying to build human-looking blogs to be used as fodder to feed search engine robots, in an attempt to scam search traffic that could be sold onward to advertisers. Apparently the robot is doing something like this:
- Scan blogs for posts that have approved trackbacks (confirms trackback links are available at spam target)
- Follow the trackback link, grab some text that mentions the current spam target (appeal to vanity of spam target)
- Run the text through a gentle prose morphing algo that mostly substitutes synonyms (to avoid getting hit by duplicate text penalties, and to make it harder for the monkey on the approve/unapprove button to recognize the text)
- Post the text to the SEO-spam blog (triggering the automatic trackback comment and moderation process at the spam target, and hopefully getting an inbound link with some traffic and/or search engine juice)
- Profit???
The heartening thing in all this is that it’s discernibly algorithmic, and the algorithms are still very simple. There are some clever hooks designed to appeal to human weaknesses (like vanity) but the robots are still not doing anything fundamentally clever. I worry, though, every time I encounter one of these that gets closer to getting through my defenses. These mundane tasks — like spam comment moderation — only get a limited amount of attention (single-digit seconds per task, typically) and it doesn’t take a lot of mechanical cleverness to overwhelm that much monkey-meat processing. The spam robots aren’t quite ready to take over the world, but they grow smarter every month…
Similar Sex Blogging: