Textual

Definition

Resources that are expressed through a form of notation intended to be read, especially printed documents (books, serials, pamphlets, posters, broadsides, etc.) for which OCR (Optical Character Recognition)-extracted text should be provided as an output of the digitization workflow.

External Resources

Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files (Federal Agencies Digitization Guidelines Initiative, August 2010)

Minimum Specifications

 
Master
Processed
Access
OCR source
Text
Thumbnail
Print source
File type TIFF TIFF JPEG TIFF UTF-8 compliant .txt file JPEG PDF*
Resolution 300 PPI**       n/a 72 PPI n/a
Bit depth 8-bit grayscale***     Bitonal, adjusted for contrast and brightness     n/a
Retention Permanent Permanent Create on ingest Temporary Permanent Create on ingest  
Ratio 1:1            

* The maximum size for PDF files should not exceed 100 MB.

** The resolution should be calculated from the dimensions of the object and the size of the text. For large text, 10pt or higher, 300 PPI should suffice. A document with smaller text, 9pt or lower, may require 400-600 PPI.

*** Whenever possible, capture at 48-bit RGB (for color, monochrome, objects with stains or marks) and save at 24–bit RGB. Or capture at 16-bit grayscale (for most objects where color is not a concern) and save at 8-bit grayscale. Use 1-bit bitonal for clean, high-contrast documents with printed type only. 

Last modified: 
Monday, March 24, 2014 - 8:39am