I knew the fundamental idea behind BitTorrent, the file-distribution system – that after some initial seeding every BitTorrent client provides uploads as well as downloads and so you can distribute much more data more quickly without ridiculously heavy loads on one server.
But I’ve just been reading Bram Cohen’s paper and so have a bit more of a grasp of the behind-the-scenes operation, which is really quite clever. Here’s a very simplistic overview:
Files are distributed by creating a .torrent file and putting it on a web server. The .torrent file includes information about the file, its name, length, checksums and so on, and also the URL of a tracker. This is a very simple server which knows abut the machines currently downloading the file (henceforth known as peers). When your client starts downloading a file, it connects to the tracker, gets added to the list, and receives back a random selection of other peers also downloading the file. It can then go off and talk to those peers.
Files are downloaded in pieces, each typically a quarter of a megabyte in size. Your client connects to several peers and finds out from them which pieces of the file they each currently have. It can then start downloading different pieces from different peers; it doesn’t have to get the pieces of the file in order. Whenever you have a complete piece, it’s added to the list of pieces you can make available to others. Often the traffic will be two-way: you’ll be downloading one piece from a peer while uploading a different piece to them.
The overall amount of data downloaded across the system must equal the overall data uploaded – every download has to come from somewhere! So, as a very rough approximation, you can download data as fast as you make it available for upload. This isn’t quite the case, because people often leave their clients running for some time after the download has finished, either because they’re good citizens or because they’re off having a cup of coffee. Others can therefore get more download capacity. Also at particular times, you’ll see the speed of your downloads or uploads fluctuate for a few minutes, though it roughly balances out over time.
A disadvantage of the system as a whole is that if, like many of us, you have a much faster downstream connection than upstream, your bittorrent download is likely to happen at something closer to your slower upstream speed. The advantage, though, is that if the file you’re downloading is at all popular, you’re likely to get it in a much more reliable way than a regular web download, you won’t be limited by the capacity of the originator’s server link, and they won’t end up paying a fortune in bandwidth charges.
There are lots of clever bits which I haven’t touched on here. For example, if you’re connected to several peers, how does your client decide which piece of the file to download next? Answer: it normally downloads the rarest one, the one which fewest of the others have, thus helping to redress the balance. Cute, eh? There’s also stuff related to starting up, to finishing, to finding new peers etc, but for more details have a look at the paper linked to above. All in all, a very nice system.