Category Archives: Internet

Wisdom of the crowds, or lowest common denominator?

I liked this:

People have too inflated sense of what it means to “ask an AI” about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of “asking an AI”, think of it more as “asking the average data labeler” on the internet.

But roughly speaking (and today), you’re not asking some magical AI. You’re asking a human data labeler. Whose average essence was lossily distilled into statistical token tumblers that are LLMs. This can still be super useful of course. Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you’re not asking an AI, you’re asking some mashup spirit of its average data labeler.

Andrej Karpathy

Thanks to Simon Willison for the link.

Surviving the search engine meltdown

  Today, I got yet more evidence that the web is sinking in a world of AI-generated slime.  

Our otherwise-fine Dualit toaster has, after many years, started to have occasional hiccups with its timer… I think the clockwork has become a little dodgy.  So I did a quick search to see if others had the same experience, and I got this page back as one of the top hits:

I quote: “Nowadays, there are so many products of dualit toaster timer keeps sticking in the market and you are wondering to choose a best one.”

There’s a danger that we may soon move past the time of useful online search — Peak Google, if you like — and the alternative approach of trying to ask questions of an AI will only make things worse, since studies have already shown that training AIs on AI-generated content quickly leads to madness (for the AIs, that is, not the users, though that too would probably follow soon afterwards). 

So making the most of online content in the future may depend, more than ever before, on being able to ensure that it comes from a trusted human source.  Who’s old enough to remember when the web was small enough that human-generated indexes were the best way to find things?

But this is also why, as John Naughton nicely reminds us in an Observer piece this weekend, the best human-generated and human-curated content out there is often available via your RSS reader, not your search engine.  (I happen to like News Explorer, and have used it for a few years.)

RSS — a system for telling you when your favourite sites, especially blogs, have been updated without you needing to go and look at each one every day — has existed since long before Facebook and these other trendy things now called ‘social networks’ existed, and I suspect will still be around after they’ve gone.  But if RSS doesn’t appeal for some reason, much of the best content — including, of course, John’s blog and this one — is also available via an even more time-tested channel.  Your email inbox.

 

Coffee Pot – The Movie

For a long time, it has both bugged and bemused me that, though the first webcam ran for 10 years taking photos of our departmental coffee pot, there are almost no original images saved from the millions it served up to viewers around the world! I had one or two.

Then, suddenly, in a recent conversation, it occurred to me to check the Internet Archive’s ‘Wayback Machine’, and, sure enough, in the second half of the coffeepot camera’s life — from 1996-2001 — they had captured 28 of its images. I wrote a script to index and download these, and turned them into a slideshow, which you can find in my new and very exciting three-minute video:

What the internet was invented for

About a decade ago, my friend Richard wrote a short blog post entitled “This is what the internet was invented for“.  In it, he linked to “Ian’s Shoelace Site“, his point being that if you suddenly realise you’ve always laced your shoes in a particular way without really wondering whether it was the best way, then there’s probably someone out there with sufficient interest in shoelacing that they’ll have compiled everything you need to know about how to lace your shoes… and this turns out in fact to be the case.  Ian Fieggen lives in Melbourne, Australia, and his site is wonderful.

Well, in a minor way, this changed my life, because I went and perused the Shoelace Site at the time, and so for the last ten years, most of my shoes have been laced using the Double Helix Lacing method.  

Now, it’s pretty rare that I buy a new pair of shoes, and after my latest purchase, I forgot about this undeniable improvement, and left them laced in the way they came from the shop, can you believe?   For several months!  Well, while polishing them at the weekend, I realised the error of my ways, and immediately pulled the laces out and re-did them, and now my tensioning and untensioning is smoother, easier and more satisfying.

There are so many ways in which the internet is getting worse, and making life worse, all around us, that it’s nice to be reminded, from time to time, of all the ways in which it can also make things better.

Tips for Fastmail (and perhaps other email) users

Do you use one of those ‘free’ email services, which make money by reading your emails so they can sell things to you, and sell you to others?  Or do you get it from your ISP, which makes it hard for you to change supplier and possibly quite complicated when you move house?  Or do you pay for a proper email account?

I’ve been doing the latter — using Fastmail for all my personal, family and business email — for a dozen years now, and have always been very happy with the service, accessing it through standard email programs on each of my various platforms. They’ve also always had a very nice webmail interface, but in general I prefer to use native apps rather than websites where I can.  

This week, though, I’ve been experimenting with some features that turned out not to be quite so easy using my particular favourite apps, but were available through the site.  One example is ‘Snooze’, which takes an item out of your inbox now but brings it back at a time you specify — this evening, for example, or next weekend.  Things like this are nice to have, but there was another feature that was becoming a bit more essential to me.

I have a pretty large array of email addresses, which I use for various purposes.  Some of them are just aliases on my main account, and some are separate email services provided by the university and other organisations.  But I arrange that everything ends up at Fastmail, where it’s under my control and can be centrally searched, managed and archived.   I either forward mail from each other remote service, or configure Fastmail to go and fetch it periodically.   

Aside: I was bitten once in the past by having an email account at one of my previous companies, to which I still had access through being on the board for a while after I left, but when the company was closed down rather suddenly, the GSuite account was deleted almost instantaneously before I knew about it, and quite a bit of my email from that period simply vanished into the ether.  I have the majority of my email stretching back as far as 1991, and I value that archive, so this was a bit of a blow.  If you also value your data, make sure you look after it yourself!

Anyway, it’s becoming ever more important, in the battle against spam, that when you send mail, you send it using a server that is properly configured to be officially associated with the ‘From’ address you’re using.  Some servers, especially GMail, are pretty fierce now about rejecting email purporting to come from, say, myuser@statusq.org but are actually sent using my account at the university.  (There’s a range of technical standards such as SPF and DKIM which help a receiving email server assess the claimed provenance of an email message.)

And I often do need to send using different From addresses, so I normally have my email apps set up to talk to all of my email accounts (currently whittled down to just Fastmail, iCloud, Gmail and University, though I used to have rather more), and once all of these are set up on all of my many devices, in general they’re pretty good about sending the right message through the right service.  But it’s a little untidy, and I end up with a lot of mail folders even if most of them are empty. And when I want to set up a new alias, I need to make sure it sends from the right place on every one of my devices…

This week, though, I hit one more issue.

There’s a service I’ve been using even longer than Fastmail, and that’s Pobox. This is one of those email-address-for-life services, and ‘quentin @ pobox.com’ has been the email address I’ve given out to people since roughly the turn of the millennium, ever since I realised how important it was to have an email address that would outlive any particular employer or ISP, but before it was easy to do so using your own domain (which is, of course, the best option now).  There’s also a great deal to be said for having an address that is easy to dictate to people: they’re always grateful not to have to copy down something like my.long.surname4356@hotmail.co.uk, as I am whenever I have to type or write it down myself!  (Even with my nice pobox address, I have keyboard shortcuts on all of my devices so that ‘qpo’ automatically expands to my email address. Highly recommended, if you don’t do something similar already.)

Anyway, Pobox now offers various services, but the basic one I’ve been using for 23 years simply forwards my mail to wherever I ask them, after having filtered out the most obvious spam.  I don’t actually store any email there, and I don’t have an IMAP account with them.

And this is where I was starting to come unstuck, because I do now need to send email using their SMTP service, whenever I want it to appear to come from pobox.com, but my normal favourite Mac email app, for example, though amazingly powerful in other ways, only really understands the concept of email servers which have both incoming and outgoing services.  Perhaps understandably, it, like many other apps, doesn’t cater for a complete email server that’s only used for sending, can’t receive, and for which any incoming emails appear in another account!  Not yet, anyway!

All of which is a long explanation about why I’ve been using the Fastmail web interface again, because Fastmail does allow you to select a different SMTP/Outlook/Gmail/whatever server for sending depending on your outgoing From address.  And it does allow you to retrieve email from multiple accounts without having to set them up on each of your devices.  And it offers a range of features that should satisfy the requirements of most power users, all without having to sell your soul to, say, Google, or use nasty proprietary systems like Outlook/Exchange.  Yet your email is still completely accessible using standards-compliant IMAP etc when you want it.  And my Pobox emails do get sent using Pobox. And my university mails get sent using Exchange without me having to touch Exchange for anything else. Perfect.

But what about the fact that I prefer to use native apps instead of web sites for this kind of thing?

Well, there are official Fastmail apps for iOS and Android which work pretty well, and on the Mac I’ve been rather pleased to discover FMail2: a thin wrapper around the web interface but implemented as a local app, meaning it can be your default mail program, handle `mailto:` URLs, see an icon in the menu bar, etc.

All of which helps ensure that if I send you an email from my favourite email address, you’re less likely to have it filtered out as spam.

Lucky you.

 

Worth a try?

I’ve been doing an experiment which I fear will end up costing me money.  And this is in response to the observation that so much of the online world we see is filtered through Google.  I have nothing against Google, but this means that the starting point for most online exploration is filtered  through Google’s business model.  

Suppose I viewed the world through somebody else’s business model instead?

Building a search engine is hard.  Building one that can come close to competing with Google is really hard.  

For a while, on some of my machines, I’ve been using the popular DuckDuckGo, and it’s been pretty good.  (The only way to try these things properly, I think, is to set them as your default search engine and then see how often you find them falling short.)  The name was a mystery to me, never having heard of the children’s game ‘Duck, duck, goose’ before, but the business model and the appeal is simple: they do run ads, but not as many; they do much less tracking, the ads aren’t targeted, and they help block other companies from tracking you as well.  It has many devotees.

But this weekend, I came across something better: Kagi.  No ads. No tracking. Nice and fast. Elegant layout, and lots of customisation options.  And, having used it as the default on my desktop, laptop and iPad for a few days, very good results!   But of course, there’s no such thing as a free search, so the catch here is that you have to pay.  For most people, the $5/month plan, which gets you 300 searches per month, will be sufficient, but there are lots of variations.  I think the Duo family plan, which gives two people unlimited searches for (effectively) £10/month, sounds appealing.

So, would I pay £120/year (or £42/year, for the individual basic plan purchased annually) for something which I could get for free? Well, their free trial, which got me 100 free searches, has made me think that I probably would.  Search is such a key part of day-to-day life, that this seems a modest premium to get a better version where you don’t have to start by scrolling past the sponsored links.

Here’s a short video showing a few of the extra bits you get for your money:

(Direct link)

Definitely tempted.

Posting for Posterity

I’ve written before (e.g. in May) about the importance of the Internet Archive, which I was fortunate enough to visit in its early days. It’s a hugely valuable resource for many reasons, not least in giving some protection against link rot through its ‘Wayback Machine‘.

What I’m embarrassed to say I didn’t know until recently, or had forgotten, is that there is also a UK Web Archive at webarchive.org.uk . It’s a very nicely-done collaborative project of the UK Legal Deposit Libraries, and performs a similar task for UK-based websites.

It’s been going for 10 years now, which is a good span but not nearly as long as the Internet Archive, so if, say, you were feeling gloomy about the situation in the UK and needed to be cheered up, you could go and look at the old News of The World site and be grateful that it ceased to exist 12 years ago.  For that, though, you would need to go to the Internet Archive.

The UKWA is a great initiative,and worth supporting. If you have a UK-based site which isn’t already indexed, let them know. It’s another good way to try and ensure it outlives you, and they try to update their copy at least annually.

And if you want to know more about the UK’s Legal Deposit Libraries which are behind the project, Tom Scott (of course) has a nice new video.  

 

The day the internet died

optical fibre cut by hedge trimmer

Oops. At the start of the holiday weekend, I managed to cut the optical fibre providing our internet connection. I realise that it’s one of our most important cables, one of the thinnest and most vulnerable, and pretty much the only one we have that I’m incapable of repairing myself!

In case you’re wondering, the hedge wasn’t there when the fibre was installed, and had since grown up to cover it. I would have been alright if it weren’t for the fact that optical fibres can’t be bent around tight corners, and so had to bulge away from the wall before going through it…

A day that shall live on in infamy. Though not as much infamy as it might have had in the absence of phone-based backup connections.

Let’s ask Quentin (RIP)!

I’ve been contemplating how to achieve immortality.  I’m sure you often do the same thing over breakfast on a sunny morning.

It has often occurred to me that, since so much of my output is in digital form, it may vanish without a trace once I’m gone, since nobody will be paying the hosting fees. Had I written more things that had made it into print, they might at least have lingered in the dark recesses of a library somewhere for a rather longer period.  Perhaps even gather dust on one or two people’s bookshelves.  Probably nobody would ever read them, but it would be comforting to know that they were there!

In reality, of course, digital data should last a lot longer, as long as it’s maintained.  If I were really wealthy and cared enough about this vanity project, I would leave behind an invested sum big enough to pay for web hosting in perpetuity plus one day per year of an IT consultant’s time to update the formats, check the backups, etc.

Fortunately, though, I have some hope that my 20+ years of blog posts won’t just vanish into the ether when Rose forgets to pay the web hosting bill after I’m gone, partly because there are periodic snapshots on the wonderful Internet Archive. (Here’s what Status-Q looked like in early 2001.)  

Brewster Kahle, the man behind the Archive, was good enough back in 2005 to give me a tour of their headquarters, which was then located in the Presidio of San Francisco. Brewster’s an inspiring guy doing important work, and a much better use for my hypothetical legacy would be to leave it to them.    I wonder if they would guarantee, in exchange, to keep my memory alive, in much the same way that donors to religious organisations used to get prayers said in perpetuity for their departed souls….

But then I started wondering about the next stage.

If you were to train an AI system on all of my blog posts, YouTube videos, academic papers, podcast & media interviews, etc… how convincingly could you get it to respond to questions in the way that I would have done?  Perhaps a deepfake video character could even give future interviews on my behalf?  I can’t quite decide whether that’s exciting, or thoroughly creepy.

But I tell you what… I do think it’s inevitable.  

Perhaps not for me: I imagine that when I’m gone a few friends will shed a quiet tear and everyone else will breathe a huge sigh of relief and switch the servers off.   But for others; those more prolific, more wise, more entertaining, I think this is bound to happen.  You will be able to ask questions of Mother Theresa, or Christopher Hitchens, or the Dalai Lama, or Warren Buffet.  You’ll be able to get Handel to compose your wedding march, and Peter Ustinov to speak at the reception afterwards.  And for a bit of spiritual advice, you could always ask God. Or a ChatGPT engine trained exclusively on his revelations to mankind from whichever source you prefer them.

Today’s systems would, of course, do a very fallible job, but what will the AI systems be like in 100 years’ time?   That will only help you, of course, if they still have access to your data, in non-proprietary, open, standard formats.  In the past, if you had sufficient wealth, you might have chosen to spend it on Cryonics.   I can’t help feeling that to achieve immortality now, a better bet would be to spend it on good, globally-accessible backups of your data.

 

 

How not to design the front page of your website

I seem to be seeing more and more of those pop-up windows that, within seconds of you first visiting a website, ask whether you immediately want to fill in your email address so they can send you spam.  

Usually, it happens before I’ve even read the first sentence, let alone the first paragraph, so my reaction to “Would you like to receive updates from us?” is generally, “How the hell should I know? I’ve only seen your URL so far!”

So my curmudgeonly questions of the morning are:

  • Does anyone, anywhere, ever fill these in?  My basic respect for human intelligence would suggest not, but I suppose roughly half the world has below-average IQ.
  • Who are the fools who, when planning a shiny new website, decide that immediately obscuring it with one of these, and simultaneously annoying every new visitor to your site, is a good idea?
  • Are people who work in marketing actually the kind of people who would fill these in themselves?  Or do they just think everyone else is an idiot?  Either option would not reflect well on them, which leads me to an inevitable conclusion and final question.
  • Why do so many of those people with below-average intelligence work in marketing?

 

Sign of the times: might ChatGPT re-invigorate GPG?

It’s important to keep finding errors in LLM systems like ChatGPT, to remind us that, however eloquent they may be, they actually have very little knowledge of the real world.

A few days ago, I asked ChatGPT to describe the range of blog posts available on Status-Q. As part of the response it told me that ‘the website “statusq.org” was founded in 2017 by journalist and author Ben Hammersley.’ Now, Ben is a splendid fellow, but he’s not me. And this blog has been going a lot longer than that!

I corrected the date and the author, and it apologised. (It seems to be doing that a lot recently.) I asked if it learned when people corrected it, and it said yes. I then asked it my original question again, and it got the author right this time.

Later that afternoon, it told me that StatusQ.org was the the personal website of Neil Lawrence.  

Unknown

Neil is also a friend, so I forwarded it to him, complaining of identity theft!

A couple of days later, my friend Nicholas asked a similar question and was informed that “based on publicly available information, I can tell you that Status-Q is the personal blog of Simon Wardley”.  Where is this publicly-available information, I’d like to know!

The moral of the story is not to believe anything you read on the Net, especially if you suspect some kind of AI system may be involved.  Don’t necessarily assume that they’re a tool to make us smarter!

When the web breaks, how will we fix it?

So I was thinking about the whole question of attribution, and ownership of content, when I came across this post, which was written by Fred Wilson way back in the distant AI past (ie. in December).  An excerpt:

I attended a dinner this past week with USV portfolio founders and one who works in education told us that ChatGPT has effectively ended the essay as a way for teachers to assess student progress. It will be easier for a student to prompt ChatGPT to write the essay than to write it themselves.

It is not just language models that are making huge advances. AIs can produce incredible audio and video as well. I am certain that an AI can produce a podcast or video of me saying something I did not say and would not say. I haven’t seen it yet, but it is inevitable.

So what do we do about this world we are living in where content can be created by machines and ascribed to us?

His solution: we need to sign things cryptographically.

Now this is something that geeks have been able to do for a long time.  You can take a chunk of text (or any data) and produce a signature using a secret key to which only you have access.  If I take the start of this post: the plain text version of everything starting from “It’s important” at the top down to “sign things cryptographically.” in the above paragraph, I can sign it using my GPG private key. This produces a signature which looks like this:

-----BEGIN PGP SIGNATURE-----
iQEzBAEBCgAdFiEENvIIPyk+1P2DhHuDCTKOi/lGS18FAmRJq1oACgkQCTKOi/lG
S1/E8wgAx1LSRLlge7Ymk9Ru5PsEPMUZdH/XLhczSOzsdSrnkDa4nSAdST5Gf7ju
pWKKDNfeEMuiF1nA1nraV7jHU5twUFITSsP2jJm91BllhbBNjjnlCGa9kZxtpqsO
T80Ow/ZEhoLXt6kDD6+2AAqp7eRhVCS4pnDCqayz0r0GPW13X3DprmMpS1bY4FWu
fJZxokpG99kb6J2Ldw6V90Cynufq3evnWpEbZfCkCl8K3xjEwrKqxHQWhxiWyDEv
opHxpV/Q7Vk5VsHZozBdDXSIqawM/HVGPObLCoHMbhIKTUN9qKMYPlP/d8XTTZfi
1nyWI247coxlmKzyq9/3tJkRaCQ/Aw==
=Wmam<
-----END PGP SIGNATURE-----

If you were so inclined, you could easily find my corresponding public key online and use it to verify that signature.  What would that tell you?

Well, it would say that I have definitely asserted something about the above text: in this case, I’m asserting that I wrote it.  It wouldn’t tell you whether that was true, but it would tell you two things:

  • It was definitely me making the assertion, because nobody else could produce that signature.  This is partly because nobody else has access to my private key file, and even if they did, using it also requires a password that only I know. So they couldn’t  produce that signature without me. It’s way, way harder than faking my handwritten signature.

  • I definitely had access to that bit of text when I did so, because the signature is generated from it. This is another big improvement on a handwritten signature: if I sign page 6 of a contract and you then go and attach that signature page to a completely new set of pages 1-5, who is to know? Here, the signature is tied to the thing it’s signing.

Now, I could take any bit of text that ChatGPT (or William Shakespeare) had written and sign it too, so this doesn’t actually prove that I wrote it.  

But the key thing is that you can’t do it the other way around: somebody using an AI system could produce a blog post, or a video or audio file which claims to be created by me, but they could never assert that convincingly using a digital signature without my cooperation.  And I wouldn’t sign it. (Unless it was really good, of course.)

Gordon Brander goes into this idea in more detail in a post entitled “LLMs break the internet. Signing everything fixes it.”   The gist is that if I always signed all of my blog posts, then you could at least treat with suspicion anything that claimed to be by me but wasn’t signed.  And that soon, we’ll need to do this in order to separate human-generated content from machine-generated.

A tipping point?

This digital signature technology has been around for decades, and is the behind-the-scenes core of many technologies we all use.  But it’s never been widely, consciously adopted by ordinary computer users.  Enthusiasts have been using it to sign their email messages since the last millennium… but I know few people who do that, outside the confines of security research groups and similar organisations.  For most of us, the tools introduce just a little bit too much friction for the perceived benefits.

But digital identities are quickly becoming more widespread: Estonia has long been way ahead of the curve on this, and other countries are following along.  State-wide public key directories may eventually take us to the point where it becomes a matter of course for us automatically to sign everything we create or approve.

At which point, perhaps I’ll be able to confound those of my friends and colleagues who, according to ChatGPT, keep wanting to pinch the credit for my blog.

 

 

 

 

 

 

 

A new perspective?

Like everyone else, I’ve been having a bit of a play with ChatGPT. There have been some nice examples recently of people asking it questions and getting answers that are grammatically correct, well-presented, completely plausible, and completely wrong.

That didn’t happen to me. There was one question I felt I ought to ask it, and it gave a mostly correct response, though its description of the Trojan Room wasn’t quite right. Overall, though, not bad. But it was the last sentence that took me by surprise:

Screenshot of response to What was the first webcam? The first webcam was created at the University of Cambridge in 1991. It was installed in the Trojan Room, a computer laboratory in the university's department of engineering, and
was used to monitor the condition of a coffee pot. This way, researchers working in the lab
could check remotely whether there was coffee available without having to physically go
to the room to check. The webcam's feed was made available online, making it one of the
earliest examples of a web-based surveillance system.

Well, yes, I suppose it was a surveillance system, though no human has used that phrase to me before when describing it!

Perhaps it’s only natural, though, that a machine should think of things chiefly from the point of view of the coffee pot?

© Copyright Quentin Stafford-Fraser