The now ubiquitous blog format — a timestamped series of posts in reverse chronological order — is a truly wonderful invention.
It’s wonderful for users, who can quickly see whether there’s anything new and get the most up-to-date stuff first. But it’s also wonderful for authors, because it’s immediately obvious to visitors when the content they’re looking at may be out of date. This means authors can almost completely dispense with one of the most tedious management tasks normally associated with any large corpus of information: revisiting what you’ve written in the past and making sure that it’s still correct.
If you’ve ever had to maintain a large website which doesn’t have this kind of built-in auto-obsolescence, you’ll know what I mean. Marketing people, for example, often feel that the more content they can put on the website about their product, the more impressive and compelling it will be. Keeping it updated as the product line evolves, however, then becomes a bit like painting the Forth bridge. The value of blogs, in contrast, is that you don’t need to tidy up after you. So pervasive has the timestamped article become, that I get frustrated when I’m reading a review or an opinion piece which doesn’t show the date. What information was available to the author at the time? Is he reviewing this version of the software or the previous one? Did he know about the competing device from another company?
So, with blogs, we’ve come up with this cunning way of handling the problem of producing too much content. But what about the similar challenge of having too much to consume?
Well, we’re still evolving ways of dealing with that, and we’ve already passed through several stages. I can, because I’m Really Old, remember the time when there were fewer than a dozen websites in the whole world. So it was pretty easy to remember which ones you liked, and when you’d run out of interesting things to read on those, you might start one of your own.
Since then, we’ve moved through a series of different ways of coping with the ever-increasing amount of information.
- When there was a small amount of stuff, bookmarks helped you remember it.
- When there was a bit more stuff, Yahoo helped you navigate it.
- When there was a larger amount of stuff, Google helped you find it.
- When there was too much stuff, social networks showed you the bits your friends liked.
- When there was even more stuff, streams forced you to ignore most of it.
Now, we’re almost at a couch-potato level of consumption. You fire up your Twitter, Facebook or Google+ app, and information flows past you. Next time you look at it, new stuff will be there. The process of finding new stuff to read has thus been reduced, for most of us, to a single button-click on a phone. Actually typing something into a search engine now constitutes ‘research’, especially if you have to click through more than one or two pages.
This is, arguably, a new kind of page-ranking, where novelty plays a greater role than it ever has before. Yes, some old material gets recirculated, but generally, the river keeps flowing, and this morning’s news will be well downstream by the time you dip your toe in during the afternoon.
Now, novelty is exciting, but it is very different from quality. In fact, it is often the opposite. C.S. Lewis once observed, in an essay called On the reading of old books, that, since there were many more books being published than could ever be read, one very good way of filtering out the dross was to stick to those that had stood the test of time. This is an idea that has stuck with me ever since I first cam across the essay as a child, and I have since tried to read one book written before my lifetime for every one written during it. That is still outrageously biased towards the present, I know, but it’s a start.
Now, how does ‘the test of time’ translate into our modern world? I think there’s an argument that this is a very powerful page-ranking metric that has not yet been fully exploited. (Perhaps, ironically, because it is not a new idea!) Surely, there must be value in knowing which pages people are still reading several years after they first hit the web?
At least once a day, when I’m trying to avoid out-of-date documentation or reviews, I’ll make use of Google’s time-filtering option to limit search results those created in, say, the last year. And in fact, you can create more complex filters to restrict output to particular ranges of dates. So you can search for pages more than 5 years old. (I’m ignoring, for the moment, the fact that the real dates of publication can often be hard to establish. If one newspaper is bought by another and its content copied to a new server, for example, the creation dates may not be preserved very well.) Still, you can, in general, limit your searches to ‘old stuff’.
But Google’s Page Rank algorithms make substantial use of the overall number of times a page is linked to when determining its importance, though they are no doubt biased towards the present. But I really want to know the number of times an old page has been linked to recently: I want a page ranking algorithm based on recently-published pages’ references to older pages.
Can I get an RSS feed of blog posts and web pages that people are still referring to now, but were published more than three years ago? It’s challenging, in a world where even the URLs that worked last year may not work today. But I think would would be worth pursuing. How’s that for a project, Google?