Impact of linguistic analysis on the semantic graph coverage and learning of document extracts

Jure Leskovec*, Natesa Milic-Frayling, Marko Grobelnik

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

24 Citations (Scopus)

Abstract

Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore how the Support Vector Machines (SVM) learning method is affected by the quality of linguistic analyses and the corresponding semantic graph representations. We apply two types of linguistic analysis: (1) a simple part-of-speech tagging of noun phrases and verbs and (2) full logical form analysis which identifies Subject-Predicate-Object triples, and then build the semantic graphs. We train the SVM classifier to identify summary nodes and use these nodes to extract sentences. Experiments with the DUC 2002 and CAST datasets show that the SVM based extraction of sentences does not differ significantly for the simple and the sophisticated syntactic analysis. In both cases the graph attributes used in learning are essential for the classifier performance and the quality of extracted summaries.

Original languageEnglish
Pages1069-1074
Number of pages6
Publication statusPublished - 2005
Externally publishedYes
Event20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States
Duration: 9 Jul 200513 Jul 2005

Conference

Conference20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
Country/TerritoryUnited States
CityPittsburgh, PA
Period9/07/0513/07/05

Fingerprint

Dive into the research topics of 'Impact of linguistic analysis on the semantic graph coverage and learning of document extracts'. Together they form a unique fingerprint.

Cite this