You can click, but you can't hide
Keith Dawson
2000-03-08
The unknown perpetrators of the recent, much-publicized denial-of-service
attacks used tools designed to obliterate their tracks in
cyberspace. By contrast, the ordinary user who surfs thinking (or
hoping) that
"on
the Internet, nobody knows you're a dog" leaves tracks deeper
and more persistent than he or she may imagine. The sites you visit
not only know you're a dog -- they know your breed, the condition or
your coat, and the cost of your last shampoo at the grooming shop.
Let's look into how much Internet sites know about you without even
trying, and how much they can find out with a little more
persistence.
Connection data. Every time your browser requests a file over
the Web, it sends a packet of information to the server. Visit
this
page at Junkbusters for a quick look at some of the information
conveyed in every HTTP connection request. The bold text you see was
customized just for your visit. (Bet you didn't know about this one:
Click on the check on whether you are being revealed link in
the penultimate paragraph.)
More data. If the site owners want to do a little more work,
they can learn considerably
more about you:
- Whether your browser supports Java and/or JavaScript
- What browser plug-ins you have loaded
- How large your monitor is and how many colors it's using at the moment
- How many Web pages you have viewed in this window
- The local time according to your computer
- Who your ISP is, where it's located, and who its upstream provider is
Server logs. All Web servers store basic information about
every request they process. For example, I just now visited the
DigitalMASS home page. My
single click left 5 records in this site's server log and 28 more in
the log file of akamai.net (which DigitalMASS uses to serve some of
its graphics files). Each log-file record contains my IP address,
the time, information about my browser, the page I was viewing when
I clicked, and other information. Visit this
page on Privacy.net for a good overview of what log-file entries
look like and how to read them.
Many Web sites store their server logs in a database and analyze
their visitors using powerful data-mining tools. Others simply dump
the log files out
onto
the Web for all to see.
Cookies. If the last few paragraphs have succeeded in making
you uneasy about how much you reveal while surfing the Web in
imagined anonymity, consider how much more you disclose when you
accept cookies from the sites you visit.
A cookie can act as a unique identifier for your computer. A site
can't count on your IP address (which it gets from the server log)
to identify you, because you may be assigned a different IP address
the next time you dial in to your ISP. Or you may be visiting from
behind a firewall at work, and all the site has to go on is the IP
address of the firewall. If your browser returns your own computer's
unique identifier (i.e., their cookie) each time you visit, it
becomes possible to correlate your actions on the site across weeks
and months.
Let's say you visit a toy company's site in July. Their server
stores a cookie on your computer without your awareness. Every few
weeks you go back to see what's new, browsing for toys your kids
might enjoy. The site's owners don't know who you are, but they can tie
your visits together using their cookie. They can mine the server-log
data for insights into how the site's navigation is working (what
paths did you take through the pages?), which pages aren't pulling
their weight (what's the last page you saw in each visit?), etc. So
far so innocent.
Months later, as the holidays approach, you finally buy a toy from
the site. As soon as this transaction reveals your identity the
site owners can -- if they wish -- tie together your entire history
on the site with other data about you, data purchased perhaps from a
credit agency, a direct-mail marketing firm, or a magazine.
Now consider how much more data DoubleClick can collect about you.
The Internet ad agency serves ads, with a side of cookies, for
more than a thousand popular Web sites. DoubleClick can mine your
meanderings across hundreds of sites over a period of months or years.
If they are ever able to associate a real-world identity with your
unique DoubleClick-assigned cookie, think how much more they could
learn.
For hands-on fun, visit Privacy.net's
simulated advertising
collaborative. It demonstrates, quite graphically, how such
cross-site tracking operates.
|