Hybrid Workshop - Automated Text Recognition of Historical Sources: From convolutional neural networks to vision language models

Europe/Zurich
3rd floor, seminary rooms 2 and 3 (Austrian Academy of Sciences)

3rd floor, seminary rooms 2 and 3

Austrian Academy of Sciences

Georg-Coch-Platz 2, 1010, Vienna
Description

Automated or Handwritten Text Recognition has been making continuous progress, with new tools, models, and technologies emerging every year. This rapidly evolving field is becoming a "must-have" for many digital endeavors, particularly in the study of historical sources. There are, however, still multiple challenges, e.g., complicated page layouts, lack of models for some languages and scripts, Latin abbreviations, and inconsistencies in the training data. 

This hybrid workshop at the Austrian Academy of Sciences will showcase a variety of case studies and projects from researchers based in Vienna, Graz, and Innsbruck, along with a guest presentation by David Smith from Northeastern University.

The event will conclude with a round table discussion on the present and future of Automated Text Recognition, featuring (in alphabetical order):

  • Anna Dolganov
  • Günter Mühlberger
  • Jan Odstrčilík
  • David Smith
  • Georg Vogeler

We invite you to join us to explore the latest developments in ATR.

The event is organised by:

Supported by:

  • Machine Learning Topical Platform (MLA2S), Austrian Academy of Sciences

Organisational team:

  • Jan Odstrčilík, Helmut Reimitz (IMAFO)
  • Anna Dolganov (ÖAI)
  • Thomas Wallnig (University of Vienna)

 



https://www.oeaw.ac.at/fileadmin/oeaw/institutstemplate/project/mla2/MLA2S-02-Logo-weiss-100g.png

 

Jan Odstrcilik
Registration
Registration
    • 09:00 10:30
      Session 1
      • 09:00
        Auditing ATR Datasets: From CER to CERatorsaurus 30m
        Speakers: Jan Odstrčilík (IMAFO, ÖAW), Martin Roček (IMAFO, ÖAW; Charles University)
      • 09:30
        Building HTR Models for an Underresourced Language: The Case of Medieval Czech 30m
        Speaker: Anna Michalcová (IMAFO, ÖAW and Czech Language Institute CAS)
      • 10:00
        Reading the Stars with AI: Handwritten Text Recognition (HTR) for Astronomical Text 30m
        Speaker: Doris Vickers (Universität Wien)
    • 10:30 11:00
      Coffee break 30m
    • 11:00 12:30
      Session 2
      • 11:00
        Open Source Layout Analysis for Sanskrit and Tibetan Manuscripts: First Experiences With Kraken on MUSICA 30m
        Speaker: Patrick McAllister (IKGA, ÖAW)
      • 11:30
        HTR for Modelling Syriac Colophonology: Opportunities and Challenges 30m
        Speaker: Ephrem Aboud Ishac (Institute for Medieval Research, Division of Byzantine Research, Austrian Academy of Sciences - Vienna)
      • 12:00
        Building a Digital Corpus of Armenian Manuscript Colophons: Training and Applying HTR Models for Classical Armenian 30m
        Speaker: Lewis Read (IMAFO, ÖAW)
    • 12:30 14:00
      Lunch break (not provided) 1h 30m
    • 14:00 15:30
      Session 3
      • 14:00
        A Taste of Honey’: ATR and Early Medieval Glossed Manuscripts 30m
        Speaker: Bernhard Bauer (Universität Graz)
      • 14:30
        It’s Philology All the Way Down: Htr and the Division of Labor in Textual Editing 30m
        Speaker: David Smith (Northeastern University)
      • 15:00
        Deciphering Historical Documents With the Help of LLMs: New Perspectives in Greek Papyrology 30m
        Speaker: Anna Dolganov (ÖAI)
    • 15:30 16:00
      Coffee break 30m
    • 16:00 17:30
      Round table: The Present and Future of Automated Text Recognition: Research Topic and Research Tool 1h 30m
      Speakers: Anna Dolganov (ÖAI), David Smith (Northeastern University), Georg Vogeler (Karl-Franzens-Universität Graz, Institut für Digitale Geisteswissenschaften), Günter Mühlberger (Universität Innsbruck), Jan Odstrčilík (Institut für Mittelalterforschung, ÖAW)