Tag Archives: data

Let’s ask Quentin (RIP)!

I’ve been contemplating how to achieve immortality.  I’m sure you often do the same thing over breakfast on a sunny morning.

It has often occurred to me that, since so much of my output is in digital form, it may vanish without a trace once I’m gone, since nobody will be paying the hosting fees. Had I written more things that had made it into print, they might at least have lingered in the dark recesses of a library somewhere for a rather longer period.  Perhaps even gather dust on one or two people’s bookshelves.  Probably nobody would ever read them, but it would be comforting to know that they were there!

In reality, of course, digital data should last a lot longer, as long as it’s maintained.  If I were really wealthy and cared enough about this vanity project, I would leave behind an invested sum big enough to pay for web hosting in perpetuity plus one day per year of an IT consultant’s time to update the formats, check the backups, etc.

Fortunately, though, I have some hope that my 20+ years of blog posts won’t just vanish into the ether when Rose forgets to pay the web hosting bill after I’m gone, partly because there are periodic snapshots on the wonderful Internet Archive. (Here’s what Status-Q looked like in early 2001.)  

Brewster Kahle, the man behind the Archive, was good enough back in 2005 to give me a tour of their headquarters, which was then located in the Presidio of San Francisco. Brewster’s an inspiring guy doing important work, and a much better use for my hypothetical legacy would be to leave it to them.    I wonder if they would guarantee, in exchange, to keep my memory alive, in much the same way that donors to religious organisations used to get prayers said in perpetuity for their departed souls….

But then I started wondering about the next stage.

If you were to train an AI system on all of my blog posts, YouTube videos, academic papers, podcast & media interviews, etc… how convincingly could you get it to respond to questions in the way that I would have done?  Perhaps a deepfake video character could even give future interviews on my behalf?  I can’t quite decide whether that’s exciting, or thoroughly creepy.

But I tell you what… I do think it’s inevitable.  

Perhaps not for me: I imagine that when I’m gone a few friends will shed a quiet tear and everyone else will breathe a huge sigh of relief and switch the servers off.   But for others; those more prolific, more wise, more entertaining, I think this is bound to happen.  You will be able to ask questions of Mother Theresa, or Christopher Hitchens, or the Dalai Lama, or Warren Buffet.  You’ll be able to get Handel to compose your wedding march, and Peter Ustinov to speak at the reception afterwards.  And for a bit of spiritual advice, you could always ask God. Or a ChatGPT engine trained exclusively on his revelations to mankind from whichever source you prefer them.

Today’s systems would, of course, do a very fallible job, but what will the AI systems be like in 100 years’ time?   That will only help you, of course, if they still have access to your data, in non-proprietary, open, standard formats.  In the past, if you had sufficient wealth, you might have chosen to spend it on Cryonics.   I can’t help feeling that to achieve immortality now, a better bet would be to spend it on good, globally-accessible backups of your data.

 

 

Looking backwards at the future

Searching recently for emails from one of my academic colleagues, I came across one or two that appeared to have the address written backwards. He works in the Computer Lab at Cambridge, and the email was from user@uk.ac.cam.cl. What was going on?

Well, the simple answer was that my mail archives stretch back quite a long way. I have emails I received from my friend Peter just last week, but I also have some from him that arrived in the early 1990s, and this was just about the time that the UK’s academic networks were switching from the Name Resolution Scheme (NRS) they had used up to that point, over to the Domain Name System (DNS) which was becoming the standard in other parts of the world. NRS addresses started at the more general, and worked down to the more specific. Hence uk.ac.cam.cl.

Actually, email addresses in general tended to look like USER@UK.AC.CAM.CL because on mainframes EVERYTHING TENDED TO BE IN CAPITALS. But Peter was fortunate enough to be an early user of Xerox and Unix-based systems, which were more lower-casey; more cuddly California, less corporate IBM. By the start of the 90s, I too had an email address that looked like quentin.stafford-fraser@uk.ac.cam.cl.

Anyway, the fact that I still have emails from 30 years ago made me reflect, once again, on how extraordinarily successful email has been, not just as a communication medium, but as a storage format.

When I think back on other electronic documents of the time, few, if any could be read now. The companies behind my early ‘desktop publishing’ programs are no longer in existence. Microsoft Word long ago lost the the ability to open documents it had created in the past. And I imagine my documents from WordStar, WordPerfect, Microsoft Works and others would be just as challenging, if I could even find them.

But my email messages I can find. And I can read them. This is despite the fact that they have been through dozens of different email systems, created by a wide range of apps on multiple operating systems, stored on servers around the world and hard disks in my various homes and offices, and accessed through a range of different protocols (IMAP, for most of that period). Not only is my email readable, but it’s easily searchable from multiple locations using a choice of apps on any of my devices. It’s tagged with helpful metadata about authorship, time of creation and receipt, etc. I can choose to store it myself or pay others to do so. And so on. Almost no other digital storage system has proved as powerful and flexible as IMAP-accessed email.

Much of this comes, of course, from the fact that email is governed by open standards, accessed through open protocols, and often stored in non-proprietary formats. Because it is fundamentally about inter-operation, email providers have had no choice. It bugs me that I don’t have my pre-1991 emails, but that was probably because of an inadvertent slip on my part, or a hard disk crash, or something, rather than because of a fundamental limitation of the technology. If I do ever find them on some backup, I’m confident I’ll be able to include them in my archive.

This explains why, like some of my colleagues, I’ve resisted my University’s recent attempts to migrate our email accounts from our existing Open-Source-based system to Microsoft Exchange Online. It’s not because I dislike Exchange per se; after a rocky first decade or two it seems to be settling down quite nicely. But I don’t want to use a Microsoft email reader on all my devices — my own are much better, thank you — and Exchange has repeatedly shown an inability to support IMAP reliably. The messages are also not stored anywhere on a server where I could extract them by any other means in a standard format when I want to move them elsewhere. And I will want to move them elsewhere at some point; history shows me that. Fortunately, I have that power. If my email shows any danger of being locked into proprietary formats, I can simply arrange for it to be forwarded to my own servers and handle it however I like there; that’s what I’ll do if the University turns off the old system completely. And since almost everything does support IMAP, I can move emails around the world to my preferred location with a simple drag and drop.

One of my colleagues said in a recent meeting that his children don’t know what the fuss is about. Email is just something they glance at once a week to see if they’ve had any. As long as it works, they don’t mind where it comes from. Well, they may be right; perhaps it will be less important in future. But this may also be a natural tendency of the young just to focus on the immediate here and now, and the immediate future.

To me, and occasionally to other people, my email archive has turned out to be important. Something I wrote 20 years ago becomes relevant to a patent case now and earns me money because I can look back at the records. Interviewers ask me about the technologies used in a particular project and I can search back to find the answers. I forget the name of a good B&B or hotel in a particular city; email allows me to find it again. I generally had no idea, at the time, that these communications might prove to be important. But they’re a key part of the history of my life.

So here’s my question: If the things you’re doing today turn out to be important a few decades from now, what sort of digital archive would they need to be in for you to find and make use of them then? Best to start using that today, before it’s too late.

What did you do to keep warm between Thanksgiving and Christmas in the old days, daddy?

An interesting bit of data visualisation by Andy Kriebel gives some ideas.

I’d love to see how this varies for different countries/climates…

Personal Analytics

I wrote a few months back about how I was using a GPS logger to keep a record of my movements. Some people think I’m a little eccentric – I think that’s the word – for doing so.

But my data-gathering is nothing compared to Stephen Wolfram’s. In a splendid Wired article called The Personal Analytics of My Life, he discusses some of the insights he’s been able to glean from his own historical records. One inspired idea, which I confess had never occurred to me, is to run a keystroke logger; he’s captured everything he’s typed for many years. (Now, that’s data you wouldn’t want to fall into the wrong hands!)

I once thought seriously about capturing, say, once or twice a minute, the image of my screen, which I could then later OCR, search, use to recreate lost documents, etc. But other than helping Sheng Feng Li with a system that did some of this for VNC, I never took it any further. Worth reconsidering, perhaps…

Anyway, many thanks to Richard for pointing me at the Wolfram article, which is worth a read.

I suppose that another way to analyse data about your life is to do the analysis on the fly and record the results there and then. That’s called a blog.

© Copyright Quentin Stafford-Fraser