eCodicology (project page) deals with algorithms for automatic tagging of medieval manuscripts. The implemented algorithms should be applied to 500 medieval manuscripts from the Benedictine Abbey St. Matthias in Trier. Each manuscript consists of up to 1.000 pages summing up to 150.000 pages in total. Due to different resolutions and additional metadata files the overall amount of 5 TB of data in 900.000 files including bibliographical metadata has to be stored and retrievable. However, for the eCodicology project this is just the raw data.

The main challenge of eCodicology is the development, testing and optimization of new algorithms detecting macro- and micro-structural elements of manuscript pages automatically. Therefor, an image processing workflow, currently consisting of six steps, including calibration, segmentation and different kinds of feature extraction, have been applied to each page. The results are presented by using appropriate scientific visualization techniques to humanities scholars who evaluate and annotate them. Based on these annotations, re-processing of single pages or entire manuscripts might be necessary using different parameters or algorithms.

Summarizing, there are plenty challenges for a research data repository based on KIT Data Manager:

  • Huge number of complex digital objects (one object per page and processing step)
  • Seamless integration of image processing and provenance information
  • Performant and intuitive access to data and metadata
  • Integration of novel methods for scientific visualization


  • Digitized manuscripts are ingested
  • User interface for browsing contents and accessing available metadata
  • Integration of visualization of extracted featues (by manually executed workflow)


