It was already four years ago when I created my Newsmap. It is time to update the whole system: fully rewritten in Python and developing a new classification algorithm. This is why I generated a 5,000 human-coded international news stories using the Prolific Academic.
Thanks to the crowed-sourcing services, recruiting is no longer a problem, but we still have to provide coding instruction, and it has to be very clear and simple. The coding rules for my research is even-oriented, and international news stories were coding according to the location of the events or problems concerned.
Unlike traditional codebooks for content analysis, which are often long and complex, Newsmap coding instruction is only five pages, and comes with classification codes in a separate CSV file.
I recently created a sizable human-coded dataset (5,000 items) of international news using the Prolific Academic service. The Prolific Academic is an Oxford-based academic alternative to the Amazon Mechanical Turk. The advantage of using this services is that researchers only have to compensate for work that they approve. The potential drawback is its relatively high costs. The service require researches to offer ‘ethical rewards’ to participants, and the minimum rate is £5. Most of the participants of the Prolific Academic are university students, but may be the same.
One of the reasons I had chosen the Prolific Academic over the Amazon Mechanical Turk was that classification of international news stories by the Turks may not be very accurate since the Americans are infamous for the lack of knowledge about foreign events.
The classification accuracy of the Prolific Academic participants in my project is shown below by country. Locations of participants (based on IP addresses) are concentrated in three countries, the UK, the US India, and the estimated accuracy (0-10) of the coding by the participants seems to be supporting my hypothesis: the Americans are not good in analyzing international news stories…
accuracy n percent
Austria 7.000000 1 0.3
Thailand 6.000000 4 1.3
Viet Nam 6.000000 4 1.3
United Kingdom 5.931250 160 51.8
Canada 5.666667 3 1.0
Spain 5.666667 9 2.9
Romania 5.600000 5 1.6
United States 5.192308 26 8.4
India 4.956989 93 30.1
Czech Republic 4.500000 2 0.6
Portugal 4.000000 1 0.3
Philippines 3.000000 1 0.3
The estimated accuracy of the US participants are much lower than UK counterparts. The low accuracy of the Indians participants seem to be due to their limited English language skills. Despite the prerequisite that English is the first language, the high hourly rate, which is very close to the minimum wage in the UK, attracted a lot of less qualified people. The Indians are only account for 2% of the registrants to the service, but it was 30% in this project.
I was expecting that the participants’ classification accuracy increases as they perform more tasks, but quite the opposite was the case. Some of the participants did really good jobs initially, but their classification accuracy usually decreased and sometimes became below 70%. The declining tendency in performance can be explained by participants’ attempt for cost minimization.
Those observations raise questions in crowd-sourced content analysis:
- Whether the Amazon Mechanical Turk is the always the best crowd-sourcing platform?
- Should we offer different amounts of reward to participants according to country of residence?
- How can we maintain or improve performance of participants over the course of projects?