24-12-2012, 04:13 PM
Improving Summarization Performance by Sentence Compression –
A Pilot Study
1Improving Summarization.pdf (Size: 196.01 KB / Downloads: 22)
Abstract
In this paper we study the effectiveness of
applying sentence compression on an extraction
based multi-document summarization
system. Our results show that pure
syntactic-based compression does not improve
system performance. Topic signature-
based reranking of compressed
sentences does not help much either.
However reranking using an oracle
showed a significant improvement remains
possible.
Introduction
The majority of systems participating in the past
Document Understanding Conference (DUC, 2002)
(a large scale summarization evaluation effort
sponsored by the United States government), and
the Text Summarization Challenge (Fukusima and
Okumura, 2001) (sponsored by Japanese government)
are extraction based. Extraction-based automatic
text summarization systems extract parts of
original documents and output the results as summaries
(Chen et al., 2003; Edmundson, 1969;
Goldstein et al., 1999; Hovy and Lin, 1999; Kupiec
et al., 1995; Luhn, 1969). Other systems based on
information extraction (McKeown et al., 2002;
Radev and McKeown, 1998; White et al., 2001)
and discourse analysis (Marcu, 1999; Strzalkowski
et al., 1999) also exist but they are not yet usable
for general-domain summarization. Our study focuses
on the effectiveness of applying sentence
compression techniques to improve the performance
of extraction-based automatic text summarization
systems.
Sentence compression aims to retain the most salient
information of a sentence, rewritten in a short
form (Knight and Marcu, 2000). It can be used to
deliver compressed content to portable devices
(Buyukkokten et al., 2001; Corston-Oliver, 2001)
or as a reading aid for aphasic readers (Carroll et
al., 1998) or the blind (Grefenstette, 1998). Earlier
research in sentence compression focused on compressing
single sentences, and were evaluated on a
sentence by sentence basis.
Unigram Co-Occurrence Metric
In a recent study (Lin and Hovy, 2003a), we
showed that the recall-based unigram cooccurrence
automatic scoring metric correlates
highly with human evaluation and has high recall
and precision in predicting the statistical significance
of results comparing with its human counterpart.
The idea is to measure the content similarity
between a system extract and a manual summary
using simple n-gram overlap. A similar idea called
IBM BLEU score has proved successful in automatic
machine translation evaluation (NIST, 2002;
Papineni et al., 2001).
Conclusions
In this paper we presented an empirical study of the
effectiveness of applying sentence compression to
improve summarization performance. We used a
good sentence compression algorithm, compared
the performance of five different ranking algorithms,
and found that pure a-sentence-at-a-time
syntactic or shallow semantic-based reranking was
not enough to boost system performance. However,
the significant difference between the ORACLE
run and the original run (ORG) indicated there is
potential in sentence compression but we need to
find a better compression selection function that
should take into account global cross-sentence optimization.
This indicated local optimization at the
sentence level such as Knight and Marcu’s (2000)
noisy-channel model is not enough when our goal
is to find the best compressed summaries not the
best compressed sentences.