TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.


Please refer to this publication for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

  • Download the dataset from Zenodo.
  • Find the related metadata at Google.


  • David Tschirschwitz
  • Franziska Klemstein
  • Benno Stein
  • Volker Rodehorst