Resources

Using Tesseract

Additional Tutorials

How to Digitize Texts with Open-Source Command-Line Optical Character Recognition (OCR) Software
Build Your Own Text-as-Data Corpus: A Print-to-Bytes Primer from WIDH@NYCDH Week 2021 by Nicholas Wolfe;
Images to Text: A Gentle Introduction to Optical Character Recognition with PyTesseract from the 2021 Text Analysis Pedagogy Institute by Hannah Jacobs.

Alternative OCR tools

PyTesseract
Adobe Acrobat(https://www.adobe.com/acrobat/how-to/ocr-software-convert-pdf-to-text.html)
Abbyy FineReader(https://www.abbyy.com/) (Great when you need high accuracy, like when publishing a book)

Additional Reading

Cordell, R. 2017. “Q i-jtb the Raven”: Taking Dirty OCR Seriously.” Book History, 20, 188-225.
Cordell, Ryan. 2019. “Why You (A Humanist) Should Care About Optical Character Recognition.”
Hawk, Brandon W. “OCR and Medieval Manuscripts: Establishing a Baseline.”
Smith, David, and Ryan Cordell. 2018. “A Research Agenda for Historical and Multilingual Optical Character Recognition.”