#

page-xml

Here are 27 public repositories matching this topic...

UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated Jun 9, 2026
C#

mittagessen / kraken

OCR engine for all the languages

ocr neural-networks hocr optical-character-recognition htr handwritten-text-recognition alto-xml page-xml layout-analysis

Updated Jun 5, 2026
Python

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

pdf csharp hocr tei hocr-documents alto-xml table-extraction page-xml alto layout-analysis document-layout-analysis xycut docstrum pdfpig xy-cut recursive-xy-cut page-segmentation

Updated Oct 1, 2023
C#

UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr validation transformation hocr finereader page-xml alto ocr-d

Updated May 21, 2025
JavaScript

lquirosd / P2PaLA

Page to PAGE Layout Analysis Tool

deep-neural-networks computer-vision pytorch generative-adversarial-network gan image-segmentation pix2pix handwritten-text-recognition page-xml document-layout-analysis

Updated Jan 17, 2022
Python

cneud / ocr-conversion

Conversions between various OCR formats

ocr hocr tei-xml alto-xml page-xml abbyy-xml

Updated Feb 13, 2026

qurator-spk / dinglehopper

An OCR evaluation tool

ocr page ocr-evaluation alto-xml page-xml alto qurator ocr-d

Updated Aug 22, 2025
Python

kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML

ocr page-xml

Updated Dec 10, 2025
Python

UB-Mannheim / blatt

NLP-helper for OCR-ed pages in PAGE XML format

page-xml

Updated Dec 6, 2024
Python

slub / textract2page

Convert AWS Textract JSON to PRImA PAGE XML

python ocr textract page-xml

Updated Feb 3, 2025
Python

VRI-UFPR / page-xml-draw

A powerful CLI tool for visualization and encoding of PAGE-XML files

visualization opencv ocr segmentation image-map page-xml layout-analysis

Updated May 19, 2021
Python

Heresta / OCR17plus

Data for layout analysis and HTR.

png ocr xml dataset segmentation htr alto-xml page-xml segmenter

Updated Sep 3, 2021
Python

IMAGO-Catalogues-Jjanes / cataloguesSegmentationOCR

Dataset and models for catalogs' Layout analysis and HTR

ocr catalog segmentation htr alto-xml page-xml segmenter

Updated Sep 5, 2021
Python

tboenig / gt-guidelines

OCR-D guidelines for Ground Truth production

ocr guidelines transcription ground-truth page-xml

Updated Apr 21, 2026
XSLT

flame-cai / gnn-synthetic-layout-historical

Graph based Layout Analysis for Historical Manuscripts in data scarce settings

ocr hocr htr handwritten-text-recognition page-xml layout-analysis graph-neural-networks

Updated Jun 10, 2026
Python

qurator-spk / ocrd_repair_inconsistencies

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

ocr page page-xml ocr-d

Updated Jan 9, 2024
Python

OCR-D / gt_structure_1_4

About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

repository segmentation ground-truth page-xml ocr-d

Updated Jun 24, 2024

paddleocr-htr-ancient-chinese

Ifena-Xia / paddleocr-htr-ancient-chinese

PaddleOCR-based base segmentation pipeline for vertically-written ancient Chinese manuscripts, producing PAGE-XML for eScriptorium.

ocr digital-humanities htr page-xml ancient-chinese paddleocr escriptorium

Updated Jun 10, 2026
Python

SCDH / pygexml

Small pythonic wrapper around PAGE XML

python parser xml page-xml

Updated May 8, 2026
Python

OCR-D / gt_structure_1_3

The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

repository segmentation ground-truth page-xml ocr-d

Updated Jun 24, 2024

Improve this page

Add a description, image, and links to the page-xml topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the page-xml topic, visit your repo's landing page and select "manage topics."