| ALTO Introduction |
IntroductionMETS offers great opportunities to reflect complex structure more than any other standard. HistoryMETS offers great opportunities to reflect complex structure more than any other standard. Thus, the METAe project group chose METS for their challenging task to digitize historic books and journals (1850-1920). METS/ALTO XML Objects in Real LifeCCS developed its software docWORKS/METAe as a content conversion software. Scanned images are processed (Pre-processing, Layout Analysis, OCR, Structure Analysis) and exported as standard XML objects, based on METS/ALTO XML schemas. From the rich METS/ALTO XML object, you can build derivatives (PDF, METS/TEI, METS/TXT) using XSL style sheets easily. Several national and general libraries as well as other cultural and educational institutions already use docWORKS to digitize and preserve their books, newspapers and journals, f.e.: Harvard University Library ALTO in NDNPFor the NDNP (National Digital Newspaper Project) the Library of Congress was looking for a METS extension schema describing the layout and content on printed pages. ALTO was a perfect fit, as it is proven in digitization of books and journals for previous years. Due to NDNP related requests the ALTO schema was extended to cover all needs. ALTO DescriptionALTO stores layout information and OCR recognized text of pages of any kind of printed documents like books, journals and newspapers. ALTO is a standardized XML format to store layout and content information. It is designed to be used as an extension schema to METS (Metadata Encoding and Transmission Standard), where METS provides metadata and structural information while ALTO contains content and physical information. Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside. Measurements in ALTO XML files are given in 1/10mm or in 1/1200inch. For presentation purposes one might want to create low resolution images. To use the coordinates within the ALTO file with any resolution they need to be transformed into pixels. Transforming the inch1200 values to pixel depends on the image resolution. Convert the values into pixel as follows: For 1/10mm convert the values into pixel as follows: |



Download PPT