Data Collection

This section gives an overview of the collection phase producing the Mannheim International News Discourse Data Set (MIND). First, some descriptive statistics for each source in the dataset is given. Then the collection process is explained. Finally, additional notes on the data set are given.

Key Features

The MIND data set is a collection of news items spanning one year beginning 1st of August 2015 until the 31st of July 2016. It contains news items of over 94 sources from five countries on four continents. The most relevant political information sources in each country in the categories News Website, Printed Newspaper and Blog were selected. From these sources the complete output of each source was collected in the topical areas of politics, society, economy, culture, while excluding the topics of sports, lifestyle and weather.

The following table shows an overview of the collected items for each country and each media type.

Media type Country Total
Australia Germany Switzerland Turkey USA
Blog 7,503 34,215 1,635 3,565 116,329 163,247
News Website 275,643 191,702 122,581 250,709 403,031 1,243,666
Printed Newspaper 163,029 121,739 100,729 - 131,478 516,975
Total 446,175 347,656 224,945 254,274 650,838 1,923,888