Textual
Definition
Resources that are expressed through a form of notation intended to be read, especially printed documents (books, serials, pamphlets, posters, broadsides, etc.) for which OCR (Optical Character Recognition)-extracted text should be provided as an output of the digitization workflow.
External Resources
Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files (Federal Agencies Digitization Guidelines Initiative, August 2010)
Minimum Specifications
Master |
Processed |
Access |
OCR source |
Text |
Thumbnail |
Print source |
|
File type | TIFF | TIFF | JPEG | TIFF | UTF-8 compliant .txt file | JPEG | PDF* |
Resolution | 300 PPI** | n/a | 72 PPI | n/a | |||
Bit depth | 8-bit grayscale*** | Bitonal, adjusted for contrast and brightness | n/a | ||||
Retention | Permanent | Permanent | Create on ingest | Temporary | Permanent | Create on ingest | |
Ratio | 1:1 |
* The maximum size for PDF files should not exceed 100 MB.
** The resolution should be calculated from the dimensions of the object and the size of the text. For large text, 10pt or higher, 300 PPI should suffice. A document with smaller text, 9pt or lower, may require 400-600 PPI.
*** Whenever possible, capture at 48-bit RGB (for color, monochrome, objects with stains or marks) and save at 24–bit RGB. Or capture at 16-bit grayscale (for most objects where color is not a concern) and save at 8-bit grayscale. Use 1-bit bitonal for clean, high-contrast documents with printed type only.