Experiments on the Amazon Mechanical Turk
The Amazon Mechanical Turk is a pretty remarkable marketplace for work which hooks up “requesters” who need to repeat a simple and often tedious task a large number of times with a large pool of “workers” who will complete the task for a small amount of money. It’s a way to have access to an on-demand global workforce at any hour – a bit like the human version of the Amazon Elastic Compute Cloud perhaps. The Mechanical Turk tasks require human intelligence to solve (if they didn’t, you might as well write a program), and many of the requested tasks look like they will be used as benchmarks for predictive algorithms. For example, one of the active tasks right now specifies that the requester will show the worker a LinkedIn profile and then ask him/her to judge which of ten other profiles it is similar to. The payment for completing the task is $0.03 and the allotted time is 10 minutes. This seems like an interesting way to get class labels for calibrating a machine learning algorithm that finds people with similar profiles in LinkedIn.
Experimental Turk is a cool project which leverages the Mechanical Turk for (social science) research purposes. They have recreated several classical experiments in social science and economics by this “crowd-labor” approach. As far as I can tell, most of the previously published, “classical” results have been reconfirmed by the Experimental Turk. For instance, in the “anchoring” experiment (more about anchoring here), 152 workers were asked a question about how many African countries that are in the United Nations.
Approximately half of the participants was asked the following question:Do you think there are more or less than 65 African countries in the United Nations?
The other half was asked the following question:
Do you think there are more or less than 12 African countries in the United Nations?
Both the groups were then asked to estimate the number of African countries in the United Nations. As expected, participants exposed to the large anchor (65) provided higher estimates than participants exposed to the small anchor (12) […] It should be noted that means in our data (42.6 and 18.5 respectively) are very similar to those recently published by Stanovich and West (2008; 42.6 and 14.9 respectively).
The other experiments are also about different kinds of unconscious biases and heuristics. Fun stuff. Scintillae has a guide to doing experiments on the Mechanical Turk.