TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.
Please refer to the publications for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].
- David Tschirschwitz
- Franziska Klemstein
- Benno Stein
- Volker Rodehorst