Resources that are expressed through a form of notation intended to be read, especially printed documents (books, serials, pamphlets, posters, broadsides, etc.) for which OCR (Optical Character Recognition)-extracted text should be provided as an output of the digitization workflow.

External Resources

Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files (Federal Agencies Digitization Guidelines Initiative, August 2010)

Minimum Specifications

OCR source
Print source
File type TIFF TIFF JPEG TIFF UTF-8 compliant .txt file JPEG PDF*
Resolution 300 PPI**       n/a 72 PPI n/a
Bit depth 8-bit grayscale***     Bitonal, adjusted for contrast and brightness     n/a
Retention Permanent Permanent Create on ingest Temporary Permanent Create on ingest  
Ratio 1:1            

* The maximum size for PDF files should not exceed 100 MB.

** The resolution should be calculated from the dimensions of the object and the size of the text. For large text, 10pt or higher, 300 PPI should suffice. A document with smaller text, 9pt or lower, may require 400-600 PPI.

*** Whenever possible, capture at 48-bit RGB (for color, monochrome, objects with stains or marks) and save at 24–bit RGB. Or capture at 16-bit grayscale (for most objects where color is not a concern) and save at 8-bit grayscale. Use 1-bit bitonal for clean, high-contrast documents with printed type only. 

Last modified: 
Monday, March 24, 2014 - 8:39am