Blog

I am not a person who generally feels well-informed; for a year I called our Prime Minister Julia Jillard. So I’ve been reading a series of remedial primers, the Oxford Very Short Introduction.  >

Ad

Life as a Dog

Joseph Pearson July 28

Jumping online is second nature for many of us. We do it greedily, automatically and almost without thought. It is all so easy, after all, so wonderfully infinite, and there are so many new variables to be tried. We like to think also that we are in control, that the public personas we create on Facebook, Twitter and so on are resources of our own making, that we can block, unsubscribe and filter as we see fit. But every time we make a command – search, access, refresh, decline – we leave an imprint. We create data waiting to be picked up. Stats can be collected by big search engines like Yahoo and Google, as well as advertisers, citizen journalists and even your own curious social circle. How much choice is there really and who is interested in our virtual footprints? In the June edition of Meanjin, Joseph Pearson asks that we not be alarmed – the problem is entirely too interesting for that. A brief extract of his essay ‘Life as a Dog’ is below, read the full text here.


In 2003, when I should have been working on my thesis, I spent several months rebuilding my personal weblog. As an afterthought, I wrote a stats reporting tool.

Most people who keep a blog use a stats package of some kind. At least until your writings attract regular comments, it is the core charm of the practice—what transmutes blogging from shouting in a fog to a feeling of speaking from the pulpit (or chattering in the town square, if your humility is more mature than that of most bloggers). It gives you some measure of your audience. You learn how many people have visited your site recently, and from where on the web they came. You see—and almost invariably, are horrified by—the Google search queries that brought this itinerant mob to your digital doorstep. You find out what other sites are linking to you, and get the cosy feeling of community membership from it. You find out what times are busiest, what articles are most popular, and often, the geographic concentration of your audience. It is exhilarating and puzzling to see that you have readers in Iran and India. All this knowledge is essentially benevolent, as web writers learn through feedback how to challenge and entertain their patrons.

I tinkered with my stats tool rather a lot in the early months. My curiosity compelled (and my small readership enabled) me to super-charge it. With ten to fifty unique visits a day, I could afford a much finer granularity than any commercial stats package typically offers. Every visitor was listed individually. If you visited my site in early 2004, yes, I knew the site or search whence you arrived and your approximate geographic location. Yours specifically. I knew how big your display was. I knew which pages you accessed, the order in which you accessed them, how many seconds you spent on each page. [1] I knew the minute you arrived, the minute you departed. I could tell if you were still on my site while I was looking at you. Of course, I was only looking at your shadow—even though your stats were essentially unique, I didn’t really know who you were.

Unless you left a comment. In that case I had a name (or a handle), an email address (which could be fake) and possibly a website, if you had volunteered one. And of course, I could associate this information with your stats. On each subsequent visit, I knew it was you—the creature who left that earlier comment.

Soon enough there were limits—in the processing power of my server, and more importantly, in my own curiosity. But what if you had oodles of CPU and data storage, and a business case for being inquisitive? A few years ago, I listened to one of the lead engineers at Google claiming that their server farms stored the internet thirty times over. They have data mining algorithms that make my attempts at log analysis look like finger painting. Across their internal networks they crunch terabytes of data in seconds. So does Yahoo, so does Microsoft. Marginally more specialised, so does Amazon and its subsidiary, Alexa. So do Akamai, Atlas and older players such as AOL. There’s a whole alphabet more. Many major companies have a lot of data about what’s on the internet, and how it’s accessed, and by whom. In recent years storage has become so cheap and ubiquitous that these businesses have no compelling reason discard any data. It accumulates, gleaned from all levels of the technology stack we call the internet…



1. In modern browsers, which encourage non-linear browsing, the assumptions I safely made back then are now more fraught. Back to article


 

 

Only the comment field is required. Omitting the ID fields increases your risk of being mistaken for spam.