Open Buddha

Open Source Buddhism, Technology, and Geekery

Kindle, E-books, and Hacking

By Al Tesshin Billings

kindle2 I’m a great lover of books. I spent much of my childhood with a room of the house dedicated to books (aka the “library room”). By the time I moved out after I college, I already owned hundreds of books and I continued to amass them over the years. Being a former occultist, I had acquired the required collection of hard to find texts. At least one friend of mine complained, after a house move more than ten years ago, of having dreams the next night of seeing unending boxes labeled, “Al’s Books.” Needless to say, the number of books has gotten to be a problem and, at this point, I have a couple of thousand living in boxes in a garage because I don’t have room for them in my house and really don’t always need them at hand. I’ve gotten to the point where I will read a book, decide that I’m not going to read it again (nor keep it on hand for others) and I’m not sure what to really do with it. There aren’t a lot of good options for what to do with used books unless you don’t mind getting completely ripped off selling them to a used bookstore or the like. In an ideal world, I would be able to store the book (so I could have it at hand if I did want it) but not have it take up a lot of space.

All of this makes me a good candidate to go all digital, along with being a complete computer geek. I spend most of my time reading words on a screen for my day job. When the Kindle DX came out, I bought one because the high DPI screens are much nicer than reading an LCD (you do really have to see one to appreciate it) and because the DX had native PDF support. For my school program and scholastic interests, I wound up with a lot of PDF files of articles to read and this way I didn’t have to print them out. I’ve bought a lot of books for the Kindle since I got mine and I really have enjoyed using the device. One of the big problems of the Kindle and similar devices is the non-transferability of the books. I’d rather have books in an open format like EPUB or even PDF than the DRM-laden Kindle format. As has been pointed out, when you buy a Kindle book, you are really long-term leasing the book, not owning it. I’ve worked around this to some degree by seeking (and often finding) pirated versions of the same books that I own in more open formats. I don’t even feel vaguely bad about this. After all, I have already paid for the book. I’m just getting a more suitable archival copy of it. In fact, I’ve been doing this for paper books that I own as well. This gives me more room to clear out my shelf space from many pounds of dead tree.

I have been expecting for a while that the Kindle format would be cracked. Given the cracking of DVDs and even Blu-Ray, I knew it was only a matter of time. No DRM scheme will survive the interest of enough talented individuals. Only one person has to be smarter than the creators, as Cory Doctorow has pointed out, and then everyone else can learn from that person. Well, that day has come, at least partially. It has been reported recently that the Kindle DRM has been hacked. As it turns out, this is not entirely true. Kindle books, while externally looking the same, come in more than one format. The Kindle supports (and Amazon sells) books using the Mobipocket format without DRM (.mobi), Amazon’s topaz format books (.tpz), and Amazon’s DRM-restricted Mobipocket format (AZW). The Amazon topaz format has not been hacked yet. What has been hacked is the Kindle form of the mobipocket format that added DRM. This format is a format based on the Open eBook standard. Most Kindle books are actually the same .mobi books that you can possibly by from other retailers but tweaked for Amazon DRM. (Well, actually, Amazon has a LOT more than any other retailer but they are still mobiformat books for the most part…) Some publishers use the topaz format but it seems to entirely be up to the publisher to choose what they want to create based on the instructions for creating a Kindle book that I’ve read. There are a lot of tools for working with mobipocket format, which is basically just HTML with some special additions and tweaks.

The hack uses a combination of the Amazon for PC application for Windows and two python scripts. One script extracts the books from the Amazon for the PC application when it creates a session for reading the book and the other script strips off the mobipocket DRM, leaving an unprotected mobipocket format book. I tested this out by firing up a Windows XP virtual machines (since I run OS X), installing python and the Amazon for PC, and downloading the two scripts. You fire up the main script (called “unswindle”), which then starts the Amazon for PC application. You then open the book that you want to extract and close the application, unswindle grabs the book and fires up the mobidedrm script to strip off the DRM. I looked at my bought content, as an experiment, and it turned out that out of more than 50 books that I owned, only five or so were in topaz format. The rest were mobipocket books with DRM. I was able to extract a book that I had bought and then fire up Calibre, an open source ebook reader and reformatting tool, and view the book completely outside of any Amazon application.

It will be interesting to see what Amazon will (or can) do to stop this. They’ve updated their PC application once but the author of the script simply made an update and it worked again. While the Amazon application updates by default, users can turn this off. At that point, Amazon will have to make a choice between cutting off users who haven’t updated the application or letting the hack continue to work (since they can’t change the way they do DRM without cutting off users unless they update the application).

It is an interesting problem. As a number of people have pointed out, having no ebooks doesn’t really get rid of piracy either. There have been text file or PDF versions of popular books floating around the net for more than ten years. You can’t really stop a dedicated person from either typing a book in or scanning a book and running OCR on it. For a lot of people, a scanned OCR’d book is “good enough” for them. There is no DRM in the world that can stop that from happening, which makes books an entirely different problem than say music or movies. The printed book is not an analog hole that can be plugged. I’ve ripped the spines off of paperbacks and scanned them before, not for distribution but simply because I was tired of having the paper copy of a reference book on my shelf taking up space. I never bothered to OCR mine, just leaving them as high dpi image-based PDFs. Most books are still under 10 megabytes in size and this isn’t much in an age when people have giant music collections where a single song is easily three to to eight megabytes in size. Print is cheap. I found the process to be tedious but pretty easy if you own a scanner.

Like I said, it was only a matter of time until the DRM was hacked and this is probably the first salvo in what will be another ongoing DRM war between publishers of media and their own customers.

Update: Updated to clarify that there are two book formats, not three. I was confusing Amazons mobiformat with DRM with unprotected mobiformat. The only currently uncracked format is topaz.