Hybrid Workshop - Automated Text Recognition of Historical Sources: From convolutional neural networks to vision language models

Europe/Zurich
3rd floor, seminary rooms 2 and 3 (Austrian Academy of Sciences)

3rd floor, seminary rooms 2 and 3

Austrian Academy of Sciences

Georg-Coch-Platz 2, 1010, Vienna
Description

Automated or Handwritten Text Recognition has been making continuous progress, with new tools, models, and technologies emerging every year. This rapidly evolving field is becoming a "must-have" for many digital endeavors, particularly in the study of historical sources. There are, however, still multiple challenges, e.g., complicated page layouts, lack of models for some languages and scripts, Latin abbreviations, and inconsistencies in the training data. 

This hybrid workshop at the Austrian Academy of Sciences will showcase a variety of case studies and projects from researchers based in Vienna, Graz, and Innsbruck, along with a guest presentation by David Smith from Northeastern University.

The event will conclude with a round table discussion on the present and future of Automated Text Recognition, featuring (in alphabetical order):

  • Anna Dolganov
  • Günter Mühlberger
  • Jan Odstrčilík
  • David Smith
  • Georg Vogeler

We invite you to join us to explore the latest developments in ATR.

The event is organised by:

Supported by:

  • Machine Learning Topical Platform (MLA2S), Austrian Academy of Sciences

Organisational team:

  • Jan Odstrčilík, Helmut Reimitz (IMAFO)
  • Anna Dolganov (ÖAI)
  • Thomas Wallnig (University of Vienna)

 



https://www.oeaw.ac.at/fileadmin/oeaw/institutstemplate/project/mla2/MLA2S-02-Logo-weiss-100g.png

 

Jan Odstrcilik
Registration
Registration
    • Session 1
      • 1
        Auditing ATR Datasets: From CER to CERatorsaurus
        Speakers: Jan Odstrčilík (IMAFO, ÖAW), Martin Roček (IMAFO, ÖAW; Charles University)
      • 2
        Building HTR Models for an Underresourced Language: The Case of Medieval Czech
        Speaker: Anna Michalcová (IMAFO, ÖAW and Czech Language Institute CAS)
      • 3
        Reading the Stars with AI: Handwritten Text Recognition (HTR) for Astronomical Text
        Speaker: Doris Vickers (Universität Wien)
    • 10:30
      Coffee break
    • Session 2
      • 4
        Open Source Layout Analysis for Sanskrit and Tibetan Manuscripts: First Experiences With Kraken on MUSICA
        Speaker: Patrick McAllister (IKGA, ÖAW)
      • 5
        HTR for Modelling Syriac Colophonology: Opportunities and Challenges
        Speaker: Ephrem Aboud Ishac (Institute for Medieval Research, Division of Byzantine Research, Austrian Academy of Sciences - Vienna)
      • 6
        Building a Digital Corpus of Armenian Manuscript Colophons: Training and Applying HTR Models for Classical Armenian
        Speaker: Lewis Read (IMAFO, ÖAW)
    • 12:30
      Lunch break (not provided)
    • Session 3
      • 7
        A Taste of Honey’: ATR and Early Medieval Glossed Manuscripts
        Speaker: Bernhard Bauer (Universität Graz)
      • 8
        It’s Philology All the Way Down: Htr and the Division of Labor in Textual Editing
        Speaker: David Smith (Northeastern University)
      • 9
        Deciphering Historical Documents With the Help of LLMs: New Perspectives in Greek Papyrology
        Speaker: Anna Dolganov (ÖAI)
    • 15:30
      Coffee break
    • 10
      Round table: The Present and Future of Automated Text Recognition: Research Topic and Research Tool
      Speakers: Anna Dolganov (ÖAI), David Smith (Northeastern University), Georg Vogeler (Karl-Franzens-Universität Graz, Institut für Digitale Geisteswissenschaften), Günter Mühlberger (Universität Innsbruck), Jan Odstrčilík (Institut für Mittelalterforschung, ÖAW)