This section gives an overview of the collection phase producing the Mannheim International News Discourse Data Set (MIND). First, some descriptive statistics for each source in the dataset is given. Then the collection process is explained. Finally, additional notes on the data set are given.
The MIND data set is a collection of news items spanning one year beginning 1st of August 2015 until the 31st of July 2016. It contains news items of over 94 sources from five countries on four continents. The most relevant political information sources in each country in the categories News Website, Printed Newspaper and Blog were selected. From these sources the complete output of each source was collected in the topical areas of politics, society, economy, culture, while excluding the topics of sports, lifestyle and weather.
The following table shows an overview of the collected items for each country and each media type.
Media type | Country | Total | ||||
---|---|---|---|---|---|---|
Australia | Germany | Switzerland | Turkey | USA | ||
Blog | 7,503 | 34,215 | 1,635 | 3,565 | 116,329 | 163,247 |
News Website | 275,643 | 191,702 | 122,581 | 250,709 | 403,031 | 1,243,666 |
Printed Newspaper | 163,029 | 121,739 | 100,729 | - | 131,478 | 516,975 |
Total | 446,175 | 347,656 | 224,945 | 254,274 | 650,838 | 1,923,888 |