Abstract
The aim of this paper is to investigate the major quantitative parameters related to the definition of the optimum text size in Modern Greek corpus development. Using the Hellenic National Corpus (HNC) (Hatzigeorgiu et al., 2000) as a reference point we estimated a number of critical statistical measures regarding feature counting in different text sizes. The results indicate that frequent linguistic features behave differently from the medium frequency and the rare ones and the text size increase do not affect them uniformly.
Original language | English |
---|---|
Pages | 834-838 |
Number of pages | 5 |
Publication status | Published - 2002 |
Externally published | Yes |
Event | 3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain Duration: 29 May 2002 → 31 May 2002 |
Conference
Conference | 3rd International Conference on Language Resources and Evaluation, LREC 2002 |
---|---|
Country/Territory | Spain |
City | Las Palmas, Canary Islands |
Period | 29/05/02 → 31/05/02 |