Spam Robot Finally Rolls 00 Versus Turing
I like to read Bruce Sterling’s Beyond the Beyond blog for its glimpse into his favela-chic future. There, most everybody is poor, the whole world is a squat, and the people we now mostly aspire to be (the college-educated info-worker lucky few with white collar jobs or better — you know, the folks who have a shot at actually being wealthy while calling themselves middle class, the ones who can go to a doctor without financial stress, the ones who get to go to a dentist when their teeth don’t hurt yet) are trying to adjust to the culture shock of discovering themselves to be just another wretched class of pixel-stained technopeasantry.
Perhaps you ride the “no silver linings, no lemonade” bus with Douglas Coupland, or maybe you prefer to hope that people will pull together and Make some fun among the rubble (see A Happy Mutant’s Guide To the Near Future by Jim Leftwich, which struck me, though not so billed, as a direct response to the Douglas Coupland piece, and tasted very Sterling-ish to my tentacles). Either way, though, the point of the future is that it’s supposed to be happening tomorrow. Today, we still expect to find our knowledge workers chasing that 20th-century rainbow, not farming WOW gold elbow-to-elbow with a roomful of Chinese gamerboyz or working as a manual blog comment spamming robot.
Yes, boys and girls, that’s what has precipitated this morning’s rant. Recent improvements in my back-end blog technology have nuked most of the automated blog comment spam that used to plague me; at one point, I was having to clear five or six hundred robot spams a day from the moderation queue. These were machine generated texts, usually starting from some base text and then auto-morphed with a synonym replacement algorithm; so they looked like real comments but read like nonsense. It was annoying, and in bulk, taxing — but at least I understand the economics of robot spam. I hate it, but I understand it. And, I can fight it with better software. Battle won.
Of course we have always had people who would drop by and leave a comment for the sole purpose of dropping their URL in the box provided. With Google’s NoFollow, this is pretty pointless; but maybe they get a few clicks of traffic, and it happens. If the comment is real and the URL looks normal, there’s no way to avoid it, either. Fortunately people like this are usually lazy and/or greedy; there are usually obvious spam keywords in the link, and (it used to be) they would be unable or unwilling to spend the time to write a decent comment. So we’d get “Haha, nice pic!” and a link to some buy-my-penis-pills-now site from a first time commenter, not hard to nuke from first-time-commenter moderation.
No, boys and girls, what’s new in the last six months, and growing rapidly, is the seemingly hand-rolled comment spam that is so good it cannot be distinguished from the comments my regular commenters leave. It’s two or three sentences long, it’s on topic, it’s friendly, it’s fun, it would in every respect qualify for being passed through moderation. Except that it’s clearly been submitted in order to support a horrid spammy link. Instead of putting a name in the link box, the person will put whatever keyword they are trying to promote their site for in the search engines; and then they will put their keyword-laden URL in the URL box.
These happen with increasing frequency now. The sites are of all kinds, the emails and IP addresses differ, the commenting styles seem to differ, it has not seemed like just one person doing it. I’ve been open to the idea that there’s one “work from home” ring of people doing it in support of some solitary Black Hat SEO evil mastermind, but it’s been hard to believe they would actually be getting paid enough to make it worth their time. And given what I think I know about search engines these days, I don’t think this behavior makes economic sense. But the world changes rapidly, and one of the “features” is that what you think you know often turns out not to be so.
And so it would go, ever since my tech improvement made most of the automated stuff vanish from my site. I’d see one or two of these hand-rolled comment spams in my moderation queue every so often, I’d marvel at them as I nuked them, and then I would get on with my life.
Until this morning. One of these comments, on a post from 2007, seemed very familiar…
Light dawns on Marble Head. Ding ding ding ding!
Clever robot, have a pellet.
Challenge: You need a comment text good enough to convince a blog owner it was written by a human in response to a post on his blog. Where do you get one?
Duh answer: You just steal the comment text from the existing comment thread, and then you rely on the blog owner’s bad memory not to recognize that he’s already seen that comment years previously. On a blog with “only” 13,989 approved comments, what are the odds he’s going to remember the one you borrowed?
The reason this works so well — and had me convinced for a month I was dealing with pixel-stained technopeasant wretches — is that all of the comments were good enough to approve. Of course they were! Because they’d all been approved before, duh. The robot is well crafted in several ways. Not only is it smart enough to get around the ice that’s keeping most of the robots away from my site now, it’s set to stay several years in the past, hit my site no more than two or three times a day, generally not more than two or three times a week. But (I discover on looking in the spam bin) it always works by taking the first comment from the comment thread and attempting to repost it with its own spammy link.
As humbled as I am to discover that I’ve spent the last month failing to recognize the robot on the other side of the Turing Test, I’ll confess to a bit of relief that the world of bulk hand-crafted comment spam hasn’t arrived (yet).
{pause for breath}
That’s the end of the post, but if you’re still reading this deep into one of my ranty posts that has nothing to do with sex, I figure you’ll indulge me further. When I went to Boing Boing for those two links that I used in the second paragraph, I thought it was a sign of the apocalypse that there was nothing visible “above the fold” but their header banner and advertising, thusly:
Given that I’ve been considering a few tweaks to the ErosBlog template to modernize the look and create some of the larger ad spaces that advertisers are looking for these days, I was wondering whether I should take that as an omen. I hope I’ll never go that far (shudder) but if one can get away with that and still enjoy the cool reputation Boing Boing enjoys, maybe I’ve been too conservative?
Similar Sex Blogging:
Shorter URL for sharing: https://www.erosblog.com/?p=5715
That seems like a really short-lived tactic. It should be a trivial matter to write wordpress or whatever filter that compares a comment body against the corpus of approved comments for identicality or similarity.
Of course, on the next round, they’ll probably synthesize two or more previous comments in a non-obvious way.
The first AI to pass the Turing test really is going to be built by spammers, isn’t it?
Yeah, I suspect it will. Chat bots are also being used to good effect these days by cam site marketers; they advertise “chat for free with our cam girls” and then use cheesy bots to try and lure people into the paid part of the site. Maybe people have a low opinion of the conversational skills of cam girls, maybe it doesn’t work too well yet, but given the per-minute rates they charge I can see them having incentive to keep improving the state of the art…
I’ve seen a few of those come in. A cunning tactic – I’m thinking I should put up a “commenting rules” post like I’ve seen some blogs do, where one of the rules is “Thy link shalt point to a blog and only a blog.”
Hee, hee, I was fooled by those “stolen comment text from existing comment threads” too! However, I ran a hardcore grep search over my blog’s database that scoured out the bot dupes. (Mad skillz for just a Japanese girlie, ne?)
I gladly approve all human and human-like comments, BUTT if new seemingly-human commenter smells the tiniest bit spammy I do three things:
1) I add an anonymizer like //anonym.to/? to their URL.
2) I subtly fuck up their spammer’s name in some way such as adding an extra or duplicate character (so the comment bot’s tracking program registers their comment as a failed spam).
3) Remove few spaces between key terms or add random spaces in key words (so the comment bot’s tracking program cannot “find” their comment).
I never had a real human commenter complain about a few added spaces or their name being duplicated, but I am sure the Comment Bots are gnashing their teeth/bites/bytes about me screwing with their keywords, anonymizing their URL, and pooping on their tracking name, tee, hee.
It seems to me that the website box, and indeed all parsing of href in comments, might have turned the point where the noise is no longer worth the signal.
Well Smog, I have to say there are days when it seems that way to me, too.
Especially now that blogging is on the decline, and a lot of people are doing their “social” net interactions on more immediate platforms like Twitter and Facebook and such.
And yet, and yet…
Having a community of regular commenters makes a website better. And some tool for letting those community members communicate who they are and where they “live” on the internet seems sensible. Sure, you could take away the href parsing and provide something a bit more modern and “with it” — but that, too, would immediately be abused by spammers, I figure.
One of these enterprising little monsters showed up on an EroticMadScience post today (one less than a month old, too), shamelessly plagiarizing the work of a valued commenter. Of course, since the post was less than a month old, I was able to have the “wow, that looks suspicious” reaction, so maybe they’re slipping.
I know this post is already a few days old, but todays xkcd is a perfect comment on the whole spam problem
http://xkcd.com.../810/
You got noted on Making Light. Over there, we have a collective spam-hunting trick, so it’s not all up to the moderators: Anytime someone spots spam on a comment, they reply with “sees spam” appended to their own name. That way, any moderator (we have several) can scan the “recent comments” list for commenters with “spam” in their name, and zap the spam (and optionally, the report). Works a treat!
Meanwhile, over on a lightly-trafficked Wiki, I’m rooting out a less-sophisticated spamwave where I just missed the “ranging shot”. Very annoying….
Nifty, David! I’ve seen the “so-and-so sees spam” remarks at Making Light, but I wasn’t sufficiently in-clued to comprehend the specific mechanical way in which that was helpful to the moderators.
Note that the Making Light trick works because there’s a list of links to recent comments on the front page — the last 20 or so posted. I use this primarily to position myself in the right area of a comment thread, and then scroll up if necessary to find the place where I last left off reading. But it also notes comments posted on old threads, which is an invitation to any regular contributor to go look and see if it’s spam. As David says, it takes a lot of work off the moderators, because it’s effectively crowdsourcing the spam detection process.
I’m a web marketing specialist involved with search engine marketing, and I have a background in IT; I watch spam and spamming tools. Like you, I’ve seen the thesaural commenting, and the terminal-three-non-alphabetic-character markings. Most of the blogs I monitor have Akismet, which, without any kind of CAPTCHA has been hitting 99%+ correct spam detection – until the most recent trend of evasion.
Comments praising a post (and often either asking for permission to link to the article, or an assurance that the blog will be added to an RSS feed) are escaping the filters; I believe that they work because the praise is often taken by bloggers as fair due, and so the crowdsourcing defence is defeated. Signatures? Some of the blogs I work on have been tweaked to make the “Comment Policy” more attractive – spammers usually find the blogs they target, using search engines, so tweaking the comment policy to attract spammers there rather than the articles, make spam detection easier – “your article is great and highlights points other bloggers have not considered” becomes blindingly obvious as “flam” – flattery spam – when applied to a comment policy page. There is an older meaning of “flam” and this seems an appropriate re-use as a name for this category of spam!
“Flam” does have a signature though. While IP addresses tend to be varied, the email address and/or the keyword/name are often retained. And the spammers controls over the spamming process are insufficient – the same comment can be added twice or more to different articles (“About” is another good candidate as a honeypot to detect “flam”) in the same blog.
For humans, detection is somewhat easier, so long as you assume that flattery is merely to weaken defences. The comments are impressively non-specific.
When this technique finally falls to better filtering, I’m betting the next evolution is to target a phrase in the article and comment on it. A bit of praise about how you mentioned “bulk hand-crafted comment spam” and you can’t fail to publish this… but is it spam? When does a hand crafted comment designed for publishing, become unworthy to be published? Interesting problem, eh? Just because it has a spammy author-name, a useful contribution is lost? Difficult choice. My policy has been that I will rewrite the spammy name, and/or lose the link, but keep the comment – but that has an ethical dimension too. Spammers profit from their activity, but if they usefully contribute and are returned no value, does that discourage the evolution of useful substantive comment by spammers?
It is an evolutionary arms race, so the independent decisions by thousands of blogmasters and their moderators will affect the next generation of spam. Fun, eh?
Jeremy, a slightly different perspective here because I never used Akismet; I tend to avoid plugins that require me to register for a third-party API key. However, I’ve done very well with the family of plugins that subtly alter the comment form in ways that discombobulate robots without bothering people. It’s not perfect, but it tends to keep the dumb bots out and let me see the efforts of the smartest ones.
The flattery spam has always struck me as laughable; sincere flattery on the web always involves republication elsewhere and currency in inbound links. “No applause, just send traffic.” If there’s somebody who genuinely is just a bystander and wants to say something nice, they generally have enough to say that it’s unique at length, looking nothing like the hilariously vague blandishments of the bots.
I used to rewrite spammy names and/or links in order to preserve useful hand-crafted spam comments, but as my overall volume of comments has gone up, my patience for doing so has diminished considerably. The barrier is now very high; if the comment is actually long and interesting, I might do so, but otherwise, I’m inclined to plonk it to the trash and move on. My reasoning is that my readers don’t really want to engage in a conversation with “Free Sex Toys” no matter how engaging a conversationalist he, she, or it might be.