XML tagging resources for discourse analysis of research papers using the Information-Argument-Rhetorical Structure framework is available for download from DR-NTU (Data) (the data repository of the Nanyang Technological University) [https://doi.org/10.21979/N9/LD3EBQ].

The Information-Argument-Rhetorical Structure framework specifies 3 layers of discourse analysis:

  1. Information structure analysis: a tag indicates a semantic role in one of the semantic frames (e.g. Research-relation frame, Comparison frame, etc.).
  2. Argument structure analysis: a tag indicates a type of argument claim, or type of argument support/premise.
  3. Rhetorical structure analysis: a tag indicates a rhetorical function, modeled after Swale’s (1990) Creating a Research Space (CARS) model

The framework was derived from an analysis of the Abstract, Introduction and Literature Review sections of 30 sociology, mechanical engineering and bioscience research papers (10 each). The framework was applied to an additional 100 sociology research papers, and refined. Further work on mechanical engineering and bioscience papers are planned.

A major step in the discourse analysis is to tag text spans (usually noun phrases, clauses and single words) in the text (in XML format) with XML tags that reference elements in the Information-Argument-Rhetorical Structure framework, using an XML editor software (e.g., oXygen XML editor).

The following resources available from DR-NTU (Data) [https://doi.org/10.21979/N9/LD3EBQ]:

  • Info-Arg-Rhet.v1-0.xsd : XML schema file to support the XML tagging and validation.
  • Info-Arg-Rhet.v1-0.css : Cascading stylesheet file to display the annotated text in a Web browser.
  • Sample annotated text file.
  • 3 documentation files for Information structure tags (Info-structure.v1-0.pdf), Argument structure tags (Arg-structure.v1-0.pdf), Rhetorical structure tags (Rhet-structure.v1-0.pdf)
  • OWL/Turtle file that represents the semantic frames (in the Information Structure layer) as classes and relations in an ontology, which can be instantiated with words tagged in the research papers. We refer to the set of semantic frames as the Research Information Model (RIM).