Ou, S., Khoo, C.S.G., & Goh, D. (2007). Automatic multi-document summarization of research abstracts: Design and user evaluation. Journal of the American Society for Information Science & Technology, 58(10), 1419-1435. [PDF]


This study developed a method for multi-document summarization of sociology dissertation abstracts. We did not use traditional sentence extraction approaches. Instead, a hybrid summarization method involving both extractive and abstractive techniques was used. This method focused on extracting and integrating similarities and differences across different documents to summarize a set of related documents.

The identification of similarities and differences was based more on identifying research concepts and relationships expressed in the text, rather than words, phrases or sentences and rhetorical relations used in previous studies. To do that, the macro-level discourse structure (between sentences and segments) peculiar to sociology dissertation abstracts was analyzed to identify which segments of the text contain more important research information. Then the micro-level discourse structure (within sentences) was analyzed to identify which kinds of information could be extracted from specific segments.

To analyze the cross-document discourse structure of a set of dissertation abstracts, we focused on research concepts and relationships to identify what is similar information and unique information, and how the similar and unique information is linked in different dissertation abstracts. Thus a variable-based framework was proposed to present research concepts and relationships as well as contextual relations and research methods in a set of related dissertation abstracts.

The framework contains four kinds of information as follows:

  • Main concepts: The common research concepts, often operationalized as research variables.
  • Research relationships between concepts:  For each main concept, the descriptive attribute values or relationships with other concepts (e.g. correlations and cause-effect relationships) investigated in different dissertation abstracts.
  • Contextual relations: Concepts and relationships in the perception, attitude, insight, etc. of a target population, or in the context, framework, model, theory, etc.
  • Research methods: One or more research methods used to explore the attributes of concepts and relationships, including research design, sampling, and data measurement & analysis method.

The framework presents a full map of a specific topic by integrating research concepts and relationships as well as contextual relations and research methods extracted from different dissertation abstracts using a hierarchical structure and organizing them based on the main concepts. It has two advantages: giving an overview of a subject area by presenting the summarized information at the top level; and also allowing users to zoom in to more details of interest by exploring the specific information at the lower levels. The framework provides a way to summarize a set of dissertation abstracts that is different from the traditional sentence extraction methods.

The summarization method developed in this study is just one way of operationalizing the variable-based framework. In particular, different presentation formats can be used to organize and present the summary. Two presentation formats were investigated. One presentation format made use of a taxonomy to filter out non-concept terms, highlight important concepts in the domain, and categorize concepts into different subjects. The other presentation format did not use a taxonomy for information filtering, highlighting and categorization. A user evaluation was carried out to compare the two types of variable-based summaries against two types of sentence-based summaries: one generated by displaying research objective sentences only and another generated by the MEAD system which identified important sentences using a variety of general features (e.g. centroid words, sentence position and first-sentence overlap).

In the user evaluation, 70% of the researchers and 64% of the general users indicated their preference for the variable-based summaries generated with the use of the taxonomy, 55% of the researchers and 31% of the general users indicated their preference for the research objective summary, and only 25% of the researchers and 31% of the general users indicated their preference for the MEAD summary.

Comparing the two types of variable-based summaries, the summary generated with the use of the taxonomy obtained the highest rank score from the researchers, whereas the one that did not make use of the taxonomy obtained lower scores.  This demonstrates that using a taxonomy for filtering out non-concept terms, highlighting important concepts in the domain and categorizing concepts into different subjects can substantially improve the quality and usefulness of the variable-based summaries.

On the other hand, 55% of the researchers indicated their preference for the research objective sentences, since the sentence-based summaries could provide more direct information and were easy to understand. A higher percentage of researchers (55%) than general users (31%) preferred the research objective summary, indicating that the researchers had a great interest in the research objectives.

Interestingly, in the user evaluation, it was found that the presentation order of the different summaries influenced the assessment of the users. The summaries presented later were more likely to be assessed favorably and be given a better score. This is because of the carry-over effect from the summaries read earlier. After a user had read the previous summaries, familiarity with the content may make the subsequent summaries easier to understand.

This companion paper gives details of the system design and implementation, and evaluation of each step in the summarization process:

  • Ou, S., Khoo, C.S.G., & Goh, D. (2008). Design and development of a concept-based multi-document summarization system for research abstracts. Journal of Information Science, 34(3), 308-326. [PDF]

Development of the taxonomy for concept filtering and generalization was reported in:

  • Ou, S., Khoo, C.S.G., & Goh, D. (2005). Constructing a taxonomy to support multi-document summarization of dissertation abstracts. Journal of Zhejiang University: Science 6A(11), 1258-1267.