“Science is inevitably biased to some extent,” says Dr Daniele Fanelli, “because it’s made by human beings.” One might easily dismiss this claim as unfounded, but Fanelli has the numbers to back it up. His recent research paper combined over 20 previous studies on scientific misconduct, and found that nearly 2% of scientists admit to falsifying or fabricating data.
Whilst most scientists would shudder at the thought of distorting or inventing results, it seems that a small number are prepared to do so. Fanelli, a researcher in science and technology studies at the University of Edinburgh, believes quantifying and identifying this practice is essential to improving science.
He’s not alone. The UK Research Integrity Office (UKRIO) is an independent advisory body set up in 2006 to support good practice in research and help address cases of scientific misconduct. UKRIO head James Parry stresses that whilst misconduct is not a common occurrence, it is a problem. “We need to take steps to actively promote good conduct and research,” he says.
What causes a scientist to turn away from good conduct, and good science? Fame and fortune are obvious answers, but Fanelli argues some scientists might feel forced in to it. “There is an excessive pressure to publish, an excessive reliance on publication record to assess scientific careers.” With scientists needing to keep up appearances, perhaps publishing a falsified paper in an obscure journal seems like the only solution.
It isn’t just smaller journals that fall foul of misconduct, as even the giants of the science publishing world can get it wrong. Parry recalls the case of Jan Hendrik Schön, a physicist at Bell Labs in New Jersey. Over the course of a few years Schön published a slew of papers on superconductivity in high profile journals, including Science and Nature. “It turned out he was faking results,” says Parry. “Some of the data used in one paper had actually been used in another – he’d just labelled it differently.”
Intentionally mislabelling data is high on the list of crimes against science, but Fanelli’s research shows that a much larger proportion of scientists are guilty of lesser offences. One third of those asked admit to a variety of “questionable research practices”, including dropping data based on gut feeling or allowing funding sources to influence a study. Whilst these may just be the research equivalent of a parking ticket or speeding fine, their high prevalence is worrying.
More worrying is that the true misconduct figures could be even higher. Scientists in the surveys Fanelli analysed were self-reporting, and may have chosen not to admit their misconduct. When asked about their colleagues, 14% reported knowing someone who had falsified results, whilst 72% suggested other questionable research practices were taking place. Even these figures don’t paint the whole picture, because one case of misconduct could be reported multiple times. “How these figures relate to the true frequency of misconduct is partly an open question,” says Fanelli.
Whilst just answering a survey might be easy, actually dealing with a colleague’s misconduct can be harder. “It’s a very stressful situation,” explains Parry, but the UKRIO can help. “If someone comes to us with concerns, we offer confidential and independent advice and guidance.” This support can play a crucial role in exposing potentially harmful misconduct, especially when it comes to health and biomedical research. “It’s the area where there is the most potential for mishap if things go wrong,” says Parry.
It is also the area with the most reported misconduct. “Medically related research has consistently higher admission rates,” says Fanelli. There are two possible explanations for this. Perhaps these researchers are more aware of issues surround scientific misconduct and so are more honest, or maybe misconduct rates simply are higher in medicine. Both explanations could be true.
Should we be concerned that we don’t know how many researchers are cooking the scientific books? Fanelli believes this behaviour is not necessarily bad for science, because dodgy data can be used to support research that is subsequently accepted as true. The 19th century scientist Gregor Mendel was posthumously accused of data that was too good to be true, but his work forms the foundation of modern genetics. Thus science is self-correcting in the long term, but for contemporary research misconduct is more of a problem.
The solution, says Fanelli, is greater transparency. “Scientists should report more faithfully what they actually did.” He suggests that if dropping a few data points lends weight to an argument then scientists should go ahead and do so, but must admit to it. And of course, he practices what he preaches: “I’m trying to be as unbiased and objective as I possibly can.”
Fanelli, D. (2009). How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005738











3 Comments
It’s interesting that there’s a huge taboo against just inventing data out of thin air – rightly so of course – but there is much less of a taboo against not publishing data, i.e. sweeping it under the carpet.
if anyone were proven to have made data up their career would be over. But every scientist knows that “uncomfortable” data often doesn’t get published. It’s a running joke at scientific meetings. Yet the consequences are just as bad.
To take an extreme example, if someone made up a result with a p=0.05, they would be sacked. But if they did a lot of work and did twenty statistical tests until they found a “significant” p=0.05 result just by chance (which will happen in 1/20 tests, by definition), they would… be published. It’s a big problem.
By Neuroskeptic on Friday 19 June, 2009 at 12:46 pm
My supervisor Professor Hong Yan, numerous bioinformatics PhD students and I recently found that Belinda Herring, the Deputy Director at Australian National Biosecurity Centre (http://www.biosecurity.edu.au/index.php), at University of Sydney in cooperation with Australian National University, engaged in scientific misconducts through misrepresentation or falsification of the bootstrap values, maximum likelihood test statistic and its associated p-value used for testing the reliability of bootstrap trees, as those for testing phylogenetic relationships. She also misrepresented or falsified the bootstrap confidence intervals as ranges in the following 6 published papers and her PhD thesis entitled “Molecular investigation of variation in HIV-1 genes”:
1. Segregation of human immunodeficiency virus type 1 subtypes by risk factor in Australia, Journal of Clinical Microbiology 2003, 41(10):4600-4604
2. Potential drug resistance polymorphisms in the integrase gene of HIV type 1 subtype A, AIDS Research and Human Retroviruses 2004, 20(9):1010-1013
3. Wide range of quasispecies diversity during primary hepatitis C virus infection, Journal of Virology 2005 79(7): 4340-4346
4. Polygenetic analysis of WNV in North American blood donors during the 2003-2004 epidemic seasons, Virology 2007 363(1): 220-228
5. Human immunodeficiency virus type 1 superinfection was not detected following 215 years of infection drug user exposure, Journal of Virology 2004, 78(1): 94-103.
6. Frequent hepatitis C virus superinfection in injection drug user, Journal of Infection Disease 2004, 190(8): 1396-1403.
Statistically, bootstrap is a way of testing the reliability of the dataset. It is the construction of pseudo-replicate datasets by re-sampling. Bootstrap enables us to evaluate whether or not the distribution of characters has been influenced by stochastic effects. In phylogenetic analyses, the pseudo-replicate datasets are generated by randomly sampling the original character matrix to generate new matrices of the sample size as the original. The frequency with which a given branch is found is recorded as the bootstrap proportion. These proportions can be used as a measure of the reliability of individual branches in the optimal tree. Therefore, bootstrap analysis is a statistical method for obtaining an estimate of error. It is used to assess the reliability of a tree, and is used to examine how often a particular cluster in a tree appears when nucleotides are re-sampled. There is no distance information in this tree. It can only tell use how reliable some groupings are.
If the entire data is compatible and has not been biased by stochastic effects, all bootstrap trees should in principal have the same topology. However, if the original dataset is biased, a cluster may be regarded as statistically significant, even if it is a wrong one!
By ILENE CHEN on Thursday 24 September, 2009 at 12:30 am