How BitTorrent works

I knew the fundamental idea behind BitTorrent, the file-distribution system – that after some initial seeding every BitTorrent client provides uploads as well as downloads and so you can distribute much more data more quickly without ridiculously heavy loads on one server.

But I’ve just been reading Bram Cohen’s paper and so have a bit more of a grasp of the behind-the-scenes operation, which is really quite clever. Here’s a very simplistic overview:

Files are distributed by creating a .torrent file and putting it on a web server. The .torrent file includes information about the file, its name, length, checksums and so on, and also the URL of a tracker. This is a very simple server which knows abut the machines currently downloading the file (henceforth known as peers). When your client starts downloading a file, it connects to the tracker, gets added to the list, and receives back a random selection of other peers also downloading the file. It can then go off and talk to those peers.

Files are downloaded in pieces, each typically a quarter of a megabyte in size. Your client connects to several peers and finds out from them which pieces of the file they each currently have. It can then start downloading different pieces from different peers; it doesn’t have to get the pieces of the file in order. Whenever you have a complete piece, it’s added to the list of pieces you can make available to others. Often the traffic will be two-way: you’ll be downloading one piece from a peer while uploading a different piece to them.

The overall amount of data downloaded across the system must equal the overall data uploaded – every download has to come from somewhere! So, as a very rough approximation, you can download data as fast as you make it available for upload. This isn’t quite the case, because people often leave their clients running for some time after the download has finished, either because they’re good citizens or because they’re off having a cup of coffee. Others can therefore get more download capacity. Also at particular times, you’ll see the speed of your downloads or uploads fluctuate for a few minutes, though it roughly balances out over time.

A disadvantage of the system as a whole is that if, like many of us, you have a much faster downstream connection than upstream, your bittorrent download is likely to happen at something closer to your slower upstream speed. The advantage, though, is that if the file you’re downloading is at all popular, you’re likely to get it in a much more reliable way than a regular web download, you won’t be limited by the capacity of the originator’s server link, and they won’t end up paying a fortune in bandwidth charges.

There are lots of clever bits which I haven’t touched on here. For example, if you’re connected to several peers, how does your client decide which piece of the file to download next? Answer: it normally downloads the rarest one, the one which fewest of the others have, thus helping to redress the balance. Cute, eh? There’s also stuff related to starting up, to finishing, to finding new peers etc, but for more details have a look at the paper linked to above. All in all, a very nice system.

Specifications

I’m reading a book edited by Joel Spolsky and came across this nice footnote:

This reminds me of my rule: if you can’t understand the spec for a new technology, don’t worry; nobody else will understand it either, and the technology won’t be important.

Mac OS X and Subversion

Non-geeks can skip all of this!

Subversion is a very nice version-control system which fixes many of the problems with its predecessor, CVS. You can use, for example Martin Ott’s packages to get an up-to-date copy for your Mac. There’s some support for it in XCode, and in general it works very nicely on the Mac as long as you don’t mind using the command line. I haven’t found a Mac GUI for it yet that I like; the best is SvnX and frankly, that’s not saying much, though I applaud Dominique Peretti for doing something.

Anyway, there is one thorny issue on the Mac. Many things which appear to be files in the Finder are in fact directories – ‘bundles’, they’re officially called. In the past, they were mostly just used for applications, but an increasing number of document formats are now bundles as well. Apple’s Pages and Keynote packages are examples.

When you check a directory tree out of Subversion onto your local disk, a hidden ‘.svn’ directory is created in each directory in the hierarchy. That’s where subversion keeps its stuff. Having this in a document bundle does not upset an application; they normally just ignore it. But some apps assume (reasonably) that they’re the only ones interacting with the bundle. If you open a document in Pages, change something and then save the doc, it will overwrite the directory with a new one and in the process delete and .svn directories within it, which will confuse Subversion if you then try to check it back in. The latest version of Keynote doesn’t do this; it reuses its old directory, but it’s unusual in that respect – most things which create bundles will cause a problem if that bundle is managed using Subversion.

There are manual fixes for this (see ‘Things to watch out for’ at the bottom of this page, for example), but it’s very inconvenient if you do this often. Especially if your bundle includes multiple subdirectories because you’ll need to do it for each one.

Probably the right way to fix this is for Subversion to be able to view certain directories as untouchable, and store the information about them within the .svn directory of the parent. An alternative would be to tar and un-tar all such directories behind the scenes and check them in and out of the repository as if they were a single file. I discovered a thread from about three years ago discussing this, but I don’t think anything was done.

I’m really hoping that Apple, having made a major step forward in file systems by making them searchable, will be the first to introduce decent version control at a fundamental level. Well, the first since VMS, anyway.

Interesting patents – the Munch Box

munch box

Here’s one I came across by accident. In 1979, Susan E Brownlow patented a small ‘cool box’ in the shape and size of a cigarette packet. There’s a removable section at the bottom which you can put in the freezer, and the box is insulated so that you can carry it in your pocket and the contents are kept cool.

The motivation? People giving up smoking need something to chew on which doesn’t include too many calories. Carrot sticks are apparently good. But how do you carry carrot sticks around with you all day and keep them fresh and crunchy?

Florida madness comes to the UK

Oh dear. Someone in the UK has been prosecuted for using an open wifi connection. Three questions occur to me here:

  • If you connect to someone’s network by accident, like my aunt, are you liable for prosecution?
  • In any incident, does the owner of the network have to press charges?
  • Is there a way to say ‘I believe in sharing my resources and any passer-by is welcome to use this network?’ We should establish a convention, perhaps like including ‘open’ in the network name.

Tell me the old, old story…

Where Google leads, Microsoft follows.

Actually, both of these projects must have been in the pipeline for some time, and I bet MS was furious that Google stole their thunder.

And your word for the day…

…is frolicsome. I suggest you try to be a bit more frolicsome than usual today.

What Business Can Learn from Open Source

Paul Graham comes up with very good essays in an annoyingly consistent way. His recent one entitled What Business Can Learn from Open Source is particularly interesting because it isn’t about software.

Print it out, find a comfy chair and a good cup of coffee, and enjoy…

Productivity

Well, it’s so quiet in the office today that I guess it must be a Bank Holiday. Somehow I only tend to discover these things when I turn up at work and find the car park empty! It should be a productive day, though!

Pop

Chris DiBona uploaded this rather nice picture of Ward Cunningham (creator of the original Wiki) holding a half-exploded balloon.

The high-speed photography session was a fun one at Foo Camp, but I got there at the end and didn’t get anything as good as those in Chris’s collection.


A different kind of Bluetooth security flaw

A Cambridge Evening News article. Do laptops really respond to Bluetooth queries when they’re asleep?

(I never imagined I would ever post a link to the Cambridge Evening News here!)

Just enough piracy

An interesting post on Chris Anderson’s blog. An extract to whet your appetite:

I was chatting with a former Microsoft manager the other day and he revealed that after much analysis Microsoft had realized that some piracy is not only inevitable, but could actually be economically optimal. The reason is counterintuitive, but intriguing.

© Copyright Quentin Stafford-Fraser