dsandler.org

A couple of years ago I worked on a TrackBack Validator which identified and rejected TrackBacks posted on your blog from sites that didn’t actually link to your blog.

[image]

Trackback spam. (Figure taken from TR-06-876.)

In our 2006 tech report on the subject (co-authored by my advisor and a number of undergrads in his computer security class), we speculated that—given sufficiently widespread use of inbound-link validation—spammers would be forced to either (a) close up shop, moving on to some other exploitable technology, or (b) start actually linking to their victims. To wit:

Spammers who wish to overcome our mechanism are forced to indefinitely maintain reciprocal links from their own web sites, effectively increasing their necessary investment of time and resources. Furthermore, the spammer’s site, by linking to its victims, will actually benefit the victims’ search engine rankings by sharing part of the spammer’s ranking with each of its victims. Best of all, if the spammer is effectively publishing a list of its victims, that list would provide compelling evidence that could be used against the spammer in legal proceedings.

In the limit, we are effectively pushing spammers to run “legitimate†weblogs. If spammers’ weblogs are following the TrackBack protocol correctly and are legitimately providing reciprocal links, then we face a more fundamental question: is such a TrackBack message actually spam? If a “real†blog is linking to the victim, regardless of any spam-like content it might contain, then the TrackBack the victim receives could well be defined as “legitimate.†At that point, the issue is not one of spam vs. non-spam, but rather one of relevance.

Well, we were right and not right. I just received some TrackBack spam (probably not coincidentally, on a blog post about trackback spam) that fooled the Validator and yet can’t really be considered to be legitimate.

[image]

A tricky TrackBack.

The inbound link is included, but hidden from the user with CSS tricks! Here’s an excerpt of the source of the page:

  <style type="text/css" media="screen">
    .trackback { position:absolute; top:0px; left:0px; visibility:hidden; }
  </style>
  <div>
    <div class="trackback">
    [...]
        <p>
          [...] far out site now comment this synopsis
          <a href='http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante'>http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante</a>
          and give comments [...]
        </p>

As you can see, all the inbound links are surrounded with irrelevant content, but what’s more, they’re children of the <div class="trackback"> and hence invisible to readers. In our paper we point to readers as one of two “last resortsâ€Â to help weed out irrelevant but otherwise Validated TrackBacks; obviously they won’t be able to help here. (The other technique, which would still work in this case, is the same sort of statistical classification currently used for email; see §5 of the TR for details.)

In the end, this “break†of the Validator may not yield much for this spammer aside from the satisfaction of successfully defacing my blog. Google has been known to apply a PageRank penalty to websites with large regions of hidden text, so the currency gained by inbound links may very well be more than offset. What’s more, like most modern blogs and CMSes, dsandler.org applies rel="nofollow" to any links found in comments or TrackBacks, so the spammer gets zero Google-juice in this situation.

But since spam is so cheap, the spammer probably doesn’t care. That’s why the Validator was so important: it proved remarkably effective at reducing the “collateral damage†of spam, namely, blog defacement. In order to continue to be effective against this sort of attack, it would probably need to include some sort of CSS/DOM interpreter.

(Yuck.)

For more on all these icky edge-cases in TrackBack (and other forms of Web) spam, read the report. (It’s just a six-pager.)

2 Responses to “Validator foiled!”

jack says:

great article about spam. Didn;t know that it was possibel to use CSS also.
Actually, I think spam can to a large degree be blamed on how the SE’s rank sites. If the ranking was made in a different way then the spam would also go away.

Spam y Trackbacks en Buayacorp - Diseño y Programación says:

[…] En base a un archivo modificado de wp-trackback.php que me envió Maty, hice unos cambios a éste para que haga casi lo mismo que el plugin Trackback Validator, que básicamente verifica que el sitio que envía la petición contenga un enlace recíproco a la entrada a la que se hace referencia (ver el paper para mayores detalles). La limitación de este método, tal y como reconoce una de las personas que participó en ese proyecto, es que puede evadirse fácilmente de diferentes modos (con CSS, comentarios HTML, JavaScript, generación dinámica de contenidos, etc). […]

Leave a Reply

To prove you're not a spammer, what's 13 × 1?

subscribe to dsandler.org

[image]   [image] for faster updates, subscribe with FeedTree

mac software made on premises

toastycode.com: toasty software for the mac pyrotheque: a new (old) fireworks screensaver for the mac
Cuckoo—the bell tolls for your Mac.

twitter/dsandler [RSS]

loading…

elsewhere

research is what grad students do all day my bookmarks are your bookmarks my photos are grainy but help me remember

highlights

Pyromania! the true screen saver story Fall down go boom “Cars†secondo Luigi Cinquecento Food reviews (DNR and Earl of Sandwich) Z520a iTunes remote because the computer is so far away MiniPNG a microscopic graphics library FeedTree 0.7.0 as seen recently on Slashdot OpenBinder is great glue for an operating system Apartment Security (a true story in 3½ panels) Silicon Valley Damashii or, the Yahoo! endgame To appear in the proceedings of IPTPS'05

between the couch cushions

my sketchbook makes occasional appearances [image]feedtree makes rss instantaneous software I made just for you captain jim my 1995 webcomic the soothing green t-shirt a slashdot favorite misclassified under “funny” email is quite likely to be read

strongly connected

erinmak is not to be trifled with pixelknave says moof when upside-down dave is dangerous rod is one groovy mother adam is googling us all amar is not really a pirate angi sees little blue dots harbinger lets you know it's coming jason looks like an idiot in that hat jeff is keeping austin weird regan seems to tolerate jason emann will not abide your IM-speak jim is a stranger in ein anderes Land liscio is pronounced "lee-show" darryl has no need of identifying objects friends as they appear on dsandler.org sportsgirl reports…on all the pro courts

Search

Recent

Archives

dsandler.org is Dan Sandler's website and notebook.

Powered by WordPress and here's why.


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser