Dominic Cleal's Blog

Automated news articles on the Kindle using headless Calibre

Calibre's a pretty good OSS e-book library app. It's good for running your desktop/laptop, imports books in a variety of formats and can ship them back out to loads of devices by converting them to the device's native format (MOBI for the Kindle Wi-Fi in my case).

It can also pull articles from RSS feeds, fetch the original articles and convert them to read on the device. There's a library of over 800 "recipes" for scraping feeds, finding and converting the content with everything from world-wide daily newspapers to science journals, tech blogs and even some that pull articles from Google Reader.

While the GUI lets you schedule how often they should be downloaded, this means you've got to have Calibre up and running - a bit of a pain when you're running out of the door in the morning. So I decided to run it on my headless BitFolk VPS on a cronjob.

Edit: I originally wrote this in January 2011, when Debian Lenny was stable. Squeeze is now stable so the chroot isn't strictly needed, but schroot is neat enough to make it worth leaving this section in (though commands have been updated for the Squeeze version). Also Calibre development is very fast-paced, so having the latest release is handy.

First step was to set up a chroot with Debian Sid (unstable, bleeding edge) as Calibre depends on glibc 2.10 up, while Debian Lenny (stable) is on 2.7. The author has a strange position on this:

If you bought into the notion that a real server must run a decade old version of Debian, then you will have to jump through a few hoops.
I've been using schroot (a Debian project) to manage this and debootstrap to create it:
# mkdir -p /srv/chroot/sid
# debootstrap sid /srv/chroot/sid http://ftp.uk.debian.org/debian/
And then in /etc/schroot/schroot.conf, added:
[sid]
type=directory
description=Debian Sid
location=/srv/chroot/sid
users=dominic

Change the "users" line to your own username. Now you can use commands such as schroot -c sid to switch into the chroot environment. The schroot app takes care of mounting /proc etc. Next, chroot in as root and then run apt-get install calibre. It'll install loads of X libraries and dependencies, but it's only a chroot so you're not filling your server's installation up with stuff.

Now you should be able to run Calibre to download articles by creating a short script along these lines:

#!/bin/bash
renice 19 -p $$
chroot=$(schroot -c sid -b)
schroot -c $chroot -r -- ebook-convert /usr/share/calibre/recipes/the_register.recipe the_register.mobi --title=Register
schroot -c $chroot -r -- calibre-smtp -r localhost -a the_register.mobi me@example.com me@kindle.com Register
schroot -c $chroot -e

This first downloads and converts the articles into a single the_register.mobi file and then uses a Calibre utility to e-mail it (via the localhost MTA) to me@kindle.com. This then gets picked up automatically by the Kindle whenever it's connected to the net (Wi-Fi/3G) and gets added to a cronjob, so it's ready each morning:

$ crontab -e
then add:
15  4  *   *   *     ~/download_news.sh >/dev/null 2>&1

Calibre doing news downloads is very CPU intensive and seems to work very well on a multi-core/threaded system, so do be warned if you're on a shared host. You can also run the Calibre library web server on a headless box, allowing you to get to e-books from anywhere.

Archives