Full interview with Dutch reporter Maarten Keulemans from deVolkskrant newspaper
Note: Maarten sent me a set of questions which I answered in writing followed by a phone conversation (July 4th, 2012). His edited interview was printed in Dutch in the deVolkskrant newspaper.
This is the mostly unedited full set of questions and answers. 23 questions.
Rushed? See my favorite 4 questions.
1) First off, I'd like to learn something more about your own background. How did you get involved in the business of dubious results and bad statistics? What caught your interest, and what keeps you going?
I really believe in what I do. I think our mission is to contribute new knowledge to society, both practical and merely intrinsically valuable because it is just super interesting. While I cannot wait to go back to doing research that is not related to improving how we conduct and report science, I have a hard time thinking of a more important thing I could be doing. Now, cancer is more important, so is poverty, but I probably cannot help there. Given my skills, this may be the most important thing I could do.
2) There's lies, damned lies, and statistics. How do you respond to the old Mark Twain saying?
I think this whole thing has little to do with statistics actually.
I would go with another quote. To err is human. Scientists are human. We will err. Some teachers abuse students, some policemen take bribes, some nurses kill their patients, some wives cheat on their husband, etc. Some scientists fake data.
Stats has nothing to do with it. Stats is part of the solution actually. That's how the new cases were discovered, through statistical analyses.
3) How widespread is the problem anyway? Is the high rate of false-positive findings typically a problem from social psychology?
It is unfortunately impossible to know for sure. Like many crimes, most go undetected. Importantly, there is a lot of really good research too. I teach a full semester class to business students based almost exclusively on content informed by psychological research, and I think they are better practitioners thanks to it.
4) Put differently: what do research fields with a high 'nonsense-rate' have in common? What's the difference with physics, a field well-known for its accuracy of research findings?
I don't know nearly anything about physics. But as I mentioned regarding the fact that erring is human, this is, first of all, not a huge problem, a case a year among thousands of scientists is not an epidemic. I know we are tempted to make a big thing out of two researchers in the same country within one year... We exaggerate the meaning of coincidences.
And second, it is not specific to psychology. I routinely check a blog called RetractionWatch. Let me tell you the topic of some postings: breast cancer, neuroscience, endoscopy, anesthesiology, Parkinson's, foreign health aid. That's all in the last week!
The thing is, newspapers seldom talk about endoscopy or anesthesiology, but they do talk about psychology (because it is damn interesting) so the public at large knows about the (few) cases from psychology.
5) You demonstrated the problem in an already famous experiment in which you proved that listening to "When I'm 64" makes you 16 months younger. What did you do to get that result?
That work I did with Joe Simmons and Leif Nelson and is completely unrelated.
A false positive is like when we calculate the tip at a restaurant, we will tend to make calculations and rounding errors that favor us (e.g., I guess $4 is probably close enough to 15%). We shouldn't do this, but there is no ill intent.
is like robbing the restaurant.
So it is probably best to keep false-positives, and my work with Joe and Leif more broadly, out of this interview.
6) What is the deeper reason for all this? I always thought science was about finding the truth in an objective, neutral way. Are scientists too keen on getting attention, or is there something else driving us towards bias?
It is about finding truth, for sure. That's the reason the vast majority of scientists dedicate their lives to this. But... we are people. We cannot forget that.
The vast majority of people walking by our houses would not just enter and take stuff from it, still, we lock our doors.
We don't lock our doors in scientific journals. We should, in the form of requesting scientists to post their data unless they have a good documented reason not to.
Some journals already do that, like the Journal of Judgment and Decision Making and the American Economic Review. There are arguments against posting data, there are against everything. Even love has its cons.
The pros are much stronger than the cons for both love and publishing data.
7) In our country, social psychology has been severely damaged by, especially, the Diederik Stapel fraud, and later, the Dirk Smeesters case.
Not sure I agree with that. There are excellent social psychologists in the country. Stapel was just one rotten apple. I have read some very good work from the Netherlands in the last year.
8) First off, your take is that the very fact that these frauds were discovered are actually good for your field?
It depends on what we do about them. A lot of big positive reforms in all parts of life and through history have come as responses to specific negative events.
If we were to start requesting data to be posted in our journals because of all this, the fabrication cases would have ironically done a lot of good for science.
9) On the other hand, Stapel must have faked data for many years. What does it tell us that nobody took notice any sooner?
Lock your doors.
10) Should we file fraud cases like the Stapel case under the 'individual crimes', or was Mr. Stapel victim of the general research practice in his line of research?
Individual crime. No question. The vast majority of researchers are honest and appalled by these cases.
11) The public sentiment in our country nowadays is: those social psychologists, with their funny little experiments, they're more entertainers than scientists. What would you tell them?
There are 1000s of psychologists doing all sorts of interesting and important research, from learning about the basic underpinnings of thought in the brain, and how and when stereotyping happens, to how we can frame information to lead to better decisions and what facilitates learning in young or disadvantaged children. Newspapers do disproportionately cover the minority of cute and entertaining findings, we would all benefit if that changed.
12) And then there's the case of Dirk Smeesters. First off, you agreed to do this interview on the explicit condition we wouldn't discuss the case in a 'he-said', 'she-said' way. Why?
Merely answering this question would be doing that I am afraid.
13) How did you come across the Smeesters case? Coincidence?
Yes. Somebody sent me a paper they had read with far-fetched predictions and it made me curious. I did not know him or his work. I certaintly did not set out to find fake data.
14) You were, in fact, testing a new statistical tool for detecting suspicious research results. Without discussing your technique, which is still to be published, in great detail - what's the principle behind it?
Imagine I told you I have a die. And I throw it and I say it came out 5. And then I throw it again and I say 5 again, and again, again, and so on. Until I claim to have thrown it 49 times and still always 5. At some point you decide the die is loaded. Something like that is present in these papers. For example, in one paper people are reported as taking a general knowledge test with 20 questions. Six groups of just 14 people each take the test. In the first group the average score turned out to be 9. In the second, 9. In the third, good guess, 9. Then again and again, in all six groups it is 9. Turns out, that is extremely unlikely. But the same thing happened in another set of six groups. And the same thing happened in a different paper. And in another. And in another.
It is not entirely different from a test Sir Ronald Fisher, one of the founders of statistics, did in the 1930s. Same intuition, different operationalization.
15) You earlier experimented with 'p-hacking': checking out and plotting the p-values, to see if there's a disproportionate number of p's <.05. Is there in your opinion a need for simple 'detectors' like these?
A clarification of slang. p-hacking is when you try many things to get a result (e.g., taking into account the age of the father of the participants and also not taking it into account).
p-curving is a tool to detect that.
Is there a need for them? Only until people start posting data and properly disclosing how they run their studies.
16) One of the criticisms was that we might risk getting a witch-hunt
Witch hunting is terrible, so is being a do-nothing bystander that witnesses a crime and becomes an accomplice by remaining silent.
A witch hunt is when you use a tool that has no ability to say anything and indiscriminately accuse people with it.
I did not set out to find fraud. It found me. And when it did, I applied the basic principle of replication that is so fundamental to good scientific inference. And then I applied it again. And then I analyzed their raw data. And then I talked to the authors, and then to the co-authors, and then to the co-authors' friends. And I gave them my phone number and told them to bring a trusted stat savvy friend. And then nothing happened and I waited and waited. By the time I was raising this issue to the universities, they had already begun investigating, and that was over 6 months ago. There was no rush to judgment here.
Back to the do-nothing bystander. Some of the analyses I have run give a chance of less than 1 in 60 billion for the data to look this way if they were legitimately obtained (this is the American case I am talking about). If I do not respond when facing that evidence, I am half guilty of fraud.
Now, if I take matters in my own hands I am a vigilante. But I merely passed on my concerns to authorities tasked with taking care of them. I wouldn't call it a witch hunt.
17) As the Dutch statistician Richard Gill put it, techniques like these are the modern-day equivalent of torture instruments - a reason to talk to somebody. Do you agree?
I have not read his posting beyond the medieval torture remark. A sensible inference is that he does not know much about my analyses nor about medieval torture instruments.
18) Is there any way to provide definite evidence for data fabrication like in the Stapel case?
There is no such thing as definite proof. We send people to jail, we kill enemies in battle, and worse, we flunk students in our psychology classes, without knowing for sure the outcome is just. Sounds horrible. But, what's the alternative?
We must make that error probability as small as possible, infinitely small, and we must follow procedure so that sane individuals who have done nothing wrong can prove their innocence without their reputation suffering. I cannot think of a single thing I could have done to reduce that probability of an error any further than I did.
19) Finally, in a more general sense - you propose a much more open science practice, where full research details are revealed, and reviewers see to it that this requirement is met. What are the responses to your proposal so far?
I prefer leaving False-positives out of this. They are completely unrelated.
20) Of course, this would mean more work for everyone, and would dampen many 'spectacular' discoveries and media attention. My gut feeling tells me that nobody would want to adopt your 'simple plan'. What makes you believe otherwise?
I think posting ones data would embolden spectacular discoveries. If you have an incredible finding, and you can so easily show it is legitimate, show the world the data behind your evidence, people will believe you faster and justifiably so. Also, we may be able to analyze the data in new ways and learn all there is to learn from it.
21) You're about to publish a paper outlining 'four examples' of trouble with research - a title that you only recently coined. Two of them we know: Dirk Smeesters and Diederik Stapel. Is it your intention to scare the remaining two?
NO! Not at all. My biggest fear is that anybody I interact with will do something out of fear rather than from being persuaded. The remaining two cases cannot possibly be scared by the title. One is a done deal, the person has resigned, but the university in question does not seem to want to make a public statement about it. That's part of what's is holding me back. It is hard to finish the paper without knowing for sure what they determined. The other case is one I won't pursue at all.
22) Can you reveal their identity? Or tell us anything about the cases?
23) The headlines tend to decribe you as 'fraud hunter'. Is that insulting, or flattering, or something else?