This appendix contains additional material and information for the paper Expert-Informed Topic Models for Document Set Discovery.

Table of Contents

The appendix is structured in four sections describing the data collection (A), expert survey (B), topic modeling (C) and thematic classification processes (D) relevant for the paper. Appendix E contains a list of all files used in the appendix.

  • Appendix A: Describes the data collection that is used as a basis for the methods and analyses presented in the paper.

    It gives

    • an overview on how the data was collected

    • descriptive statistics of the data collection

    • and additional notes on the data set.

  • Appendix B: Describes the expert survey that generated the expert input into the topic modeling process.

  • Appendix C: Describes the topic modeling process that was used to analyze the data and generate the relevance scores as a new feature for each document.

    It gives

    • an overview of the topic modeling process

    • a detailed description of the text-preprocessing used before the topic modeling

    • and additional information on how the relevant topics and the respective models were selected.

  • Appendix D: Describes the semi-automatic process thematic classification of the data based on the relevance score and human coding input.

    It describes

    • the steps taken to prepare the data for the human coders,

    • how the human coders were instructed for the coding of thematic relevance of the material

    • the reliability test they had to conclude

    • o and the online coding interface used to distribute the coding task within the coding team.

  • Appendix E: Contains a list of files in all appendices.