Tuesday, November 29, 2011

Explanation of What I Am Doing

I just wanted to make a quick post explaining what I am doing with this site.

Mises.org does an amazing job at releasing all of their books for free in PDF form (and now converting many of them to EPUB):

http://mises.org/literature.aspx

I am doing multiple things:

  • Fixing any errors in the EPUB version if it exists.
  • Creating an EPUB version if it does not exist.
  • Releasing a more optimized PDF version if needed (much smaller filesize, cleaned images, the OCR backend will match the EPUBs)
    • (Update: August 2012): As of August 2012, before starting anything, I export the book to png files, and then I run them through imagemagick. This cleans up virtually all specks.
    • This is a link to the .bat file I have been using. It takes a long time, but it cleans up every single specks, and works better than anything else I have been able to find.
    • http://www.mediafire.com/?bzkxxl2qypjgxdh

While reading an EPUB version of the book, I am fixing up any errors that I find, and then release a "Fixed" version of the book. These will have a thorough changelog of what exactly was changed from the original. Usually this just includes small typos such as extra spaces, missing punctuation, accidental split paragraphs, wrong accent marks, etc. The professionals that Mises.org hires do a very good job at converting these books.

For those books which do not have an EPUB version, I am creating one which is as close to 100% accurate as I can. This involves taking the original scans of books (the PDFs released on Mises) and use ABBYY Finereader to OCR (Optical Character Recognition) the book.

Most/All of the PDFs on Mises are run through an OCR program, and then released. This automatic OCR does not do a 100% perfect job. What I do is manually go through each page and fix any errors that I find to create as close to a 100% accurate text backend for the book.

This is then converted into into a basic EPUB version. From there I go through a process of touching up everything: recreate tables in HTML, making sure images transferred, creating chapters, making sure formatting is correct (bold/italics are correct), creating footnotes, creating/fixing CSS, making sure unicode characters transferred (ç, È, Ê, é, £....), fixing little errors, etc.

As of now, I am focusing on older books (less likely to be in the process of being converted officially by Mises.org), and books which have bloated PDF versions (would benefit the most from having a tiny EPUB version).

Please contact me if you:

  • Find any errors in my documents or the original versions
  • Want me to create an EPUB of a specific book
  • Want me to create a certain format of a specific book from my Finereader OCRs (ODT, DOC, TXT, etc.)
    • Can't guarantee it will look pretty

Here is a list to my backup of every Mises.org book onto Mediafire:

EPUBs: http://www.mediafire.com/?iszjpz6pd1pya
PDFs: http://www.mediafire.com/?gcmj1km77kvy6

Foreign Language PDFs: http://www.mediafire.com/?somr2pyq3e0oe

No comments:

Post a Comment