Jul3

#rdc2009 Hacker Wednesday – Successful Downloads and Benchmarks!


The RDC crew working remotely in Skype. L to R: Levi Thompson and Steven McKenzie. Portions of these chats will be available on our site on an ongoing basis as tutorials.

Greetings from Rebecca Malamud on behalf of the Rural Design Collective! Apologies for this delayed #rdc2009 Hacker Wednesday report – I am on the road again visiting friends and family – but always in touch with the RDC crew in Oregon via Skype and email during the trip. We are starting to make real progress on the IACL-4-OLPC project!

Steven has successfully downloaded his first book with his new auto-downloader program! Written fully in python, it can retrieve a user-defined list of books and specify what directory the files are being saved to. Scotty Auble, our Development Mentor for the IACL-4-OLPC project, states that this differs from the Bulk Access Downloader available from Open Library in that it uses straight python urlib calls instead of relying on linux/unix wget commands. We are investigating if there is a way to still use rate limiting to decrease the load to the IA servers when downloading the books. Since we are only dealing with a subset of the Internet Archive Children’s Library (2,000 out of 3,322 books), and the examples set forth on the Open Library blog are 700K+, we are inclined to believe that impact on the network will be marginal. But we plan to check with folks at the Archive just because it is good manners :-) .

Steven, who has code-named his application “Oghams Prayer”, has set some new personal goals on the project – he wants to (1) auto-detect file formats in the list so it does not have to be hard-coded, (2) develop a means to sort and file books based on topic (we built our collection based on the IACL Tag Cloud so we would like to retain that meta-information, another reason to not just do a wget), and (3) make the application extensible so it can download books from multiple sources, not just the IACL. He also plans to develop a simple user interface, and release the code as open source. The RDC team will be helping him with the rollout of his new application!

Levi has wrapped up his first draft of our DjVu Memory Chart – and mastered some new CSS and HTML skills in the process. Ever the skeptics, we were curious to see how much the DjVu memory per page varied on different types of books. It is obvious that this can vary a great deal depending on the content, so it will be interesting to see how much memory the IACL books we have targetted will actually occupy. We are estimating that an 8GB memory stick will be more than enough. We do plan on finessing our sample collection a bit, both in terms of design and content, so it might provide a useful guide for someone else planning to build a collection in the DjVu format.

It was a busy and successful week! Next up: downloading our IACL collection and exploring the EPUB format! We also have a special surprise announcement that will be revealed at a later date. Stay tuned to the RDC!


No Responses to “#rdc2009 Hacker Wednesday – Successful Downloads and Benchmarks!”

You can leave a response, or trackback from your own site.


Leave a Reply

Recent Posts

Popular Categories