Digitization of Written Parliamentary Questions from the Historical Archive (1974–1977) of the Hellenic Parliament

Fotios Fitsilis*, Basilis Gatos, Konstantinos Palaiologos, Panagiotis Kaddas, Charalambis Kyrkos, Maria Eleni Georgoulea, Yiannis Armenakis, Christina Tasouli, George Mikros, Olivier Rozenberg, Eleni Kiousi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This article outlines the digitization process and methodology applied to the archive of parliamentary questions from the 1st Parliamentary Term (1974–1977) in the Hellenic Parliament. A collaborative pilot project involving parliament, academia, and a research center facilitated the conversion of printed material to open data. The main tasks of the project include capturing digital images, a custom Optical Character Recognition (OCR) software solution employing machine learning, and rigorous validation for accuracy of a fragmented and of variable quality polytonic corpus in a variety of modern Greek language called Katharevousa. The article discusses the approach and challenges as well as the initial results of the digitization effort, emphasizing ongoing research steps. Overall, 1,674 images were digitally processed corresponding to 1,338 questions. Following algorithmic training, character recognition accuracy is over 98.5%. Successful implementation streamlines further similar digitalization operations in the vast parliamentary archives, while enabling in-depth studies on parliamentary control in the turbulent period of the immediate post-junta era in Greece. A preliminary comparative analysis with a corpus of newer parliamentary questions (2009–2019) provides insights and incentives for the further study of the characteristics and evolution of the Greek language.

Original languageEnglish
Title of host publicationDocument Analysis And Recognition-icdar 2024 Workshops, Pt I
EditorsH Mouchere, A Zhu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages103-117
Number of pages15
Volume14935
ISBN (Electronic)978-3-031-70645-5
ISBN (Print)9783031706448
DOIs
Publication statusPublished - 2024
EventInternational Workshops co-located with the 18th International Conference on Document Analysis and Recognition, ICDAR 2024 - Athens, Greece
Duration: 30 Aug 202431 Aug 2024

Publication series

NameLecture Notes In Computer Science

Conference

ConferenceInternational Workshops co-located with the 18th International Conference on Document Analysis and Recognition, ICDAR 2024
Country/TerritoryGreece
CityAthens
Period30/08/2431/08/24

Keywords

  • Hellenic Parliament
  • Machine learning
  • Ocr
  • Parliamentary control
  • Polytonic corpus
  • Written questions

Fingerprint

Dive into the research topics of 'Digitization of Written Parliamentary Questions from the Historical Archive (1974–1977) of the Hellenic Parliament'. Together they form a unique fingerprint.

Cite this