TY - GEN
T1 - Digitization of Written Parliamentary Questions from the Historical Archive (1974–1977) of the Hellenic Parliament
AU - Fitsilis, Fotios
AU - Gatos, Basilis
AU - Palaiologos, Konstantinos
AU - Kaddas, Panagiotis
AU - Kyrkos, Charalambis
AU - Georgoulea, Maria Eleni
AU - Armenakis, Yiannis
AU - Tasouli, Christina
AU - Mikros, George
AU - Rozenberg, Olivier
AU - Kiousi, Eleni
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - This article outlines the digitization process and methodology applied to the archive of parliamentary questions from the 1st Parliamentary Term (1974–1977) in the Hellenic Parliament. A collaborative pilot project involving parliament, academia, and a research center facilitated the conversion of printed material to open data. The main tasks of the project include capturing digital images, a custom Optical Character Recognition (OCR) software solution employing machine learning, and rigorous validation for accuracy of a fragmented and of variable quality polytonic corpus in a variety of modern Greek language called Katharevousa. The article discusses the approach and challenges as well as the initial results of the digitization effort, emphasizing ongoing research steps. Overall, 1,674 images were digitally processed corresponding to 1,338 questions. Following algorithmic training, character recognition accuracy is over 98.5%. Successful implementation streamlines further similar digitalization operations in the vast parliamentary archives, while enabling in-depth studies on parliamentary control in the turbulent period of the immediate post-junta era in Greece. A preliminary comparative analysis with a corpus of newer parliamentary questions (2009–2019) provides insights and incentives for the further study of the characteristics and evolution of the Greek language.
AB - This article outlines the digitization process and methodology applied to the archive of parliamentary questions from the 1st Parliamentary Term (1974–1977) in the Hellenic Parliament. A collaborative pilot project involving parliament, academia, and a research center facilitated the conversion of printed material to open data. The main tasks of the project include capturing digital images, a custom Optical Character Recognition (OCR) software solution employing machine learning, and rigorous validation for accuracy of a fragmented and of variable quality polytonic corpus in a variety of modern Greek language called Katharevousa. The article discusses the approach and challenges as well as the initial results of the digitization effort, emphasizing ongoing research steps. Overall, 1,674 images were digitally processed corresponding to 1,338 questions. Following algorithmic training, character recognition accuracy is over 98.5%. Successful implementation streamlines further similar digitalization operations in the vast parliamentary archives, while enabling in-depth studies on parliamentary control in the turbulent period of the immediate post-junta era in Greece. A preliminary comparative analysis with a corpus of newer parliamentary questions (2009–2019) provides insights and incentives for the further study of the characteristics and evolution of the Greek language.
KW - Hellenic Parliament
KW - Machine learning
KW - Ocr
KW - Parliamentary control
KW - Polytonic corpus
KW - Written questions
UR - http://www.scopus.com/inward/record.url?scp=85204578951&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-70645-5_8
DO - 10.1007/978-3-031-70645-5_8
M3 - Conference contribution
AN - SCOPUS:85204578951
SN - 9783031706448
VL - 14935
T3 - Lecture Notes In Computer Science
SP - 103
EP - 117
BT - Document Analysis And Recognition-icdar 2024 Workshops, Pt I
A2 - Mouchere, H
A2 - Zhu, A
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Workshops co-located with the 18th International Conference on Document Analysis and Recognition, ICDAR 2024
Y2 - 30 August 2024 through 31 August 2024
ER -