Reproducibility is not a newly adopted principle, in fact, it dates back to 1600s. The Irish chemist Robert Boyle was the first to emphasize the importance of obtaining the same results when the study is re-created. Since then, scientists consistently reflected on how adopting a reproducible workflow helps advancing the science. Question is, does it only benefit science? During an invited talk at Edinburgh ReproducibiliTea Journal Club, Kaitlyn Hair explained how sharing your data, materials and code is also in your own interest.
Perhaps the most important reason for adopting a reproducible workflow is simply to avoid a disaster, according to Hair. Researchers all around the world publish their works continuously. These publications are leading up to other ideas, discoveries and products like vaccines and cancer therapeutics. A few years earlier, scientists from Duke University published a paper in which they claimed to find a way to efficiently target tumours based on their genetic sequencing. It is not very hard to imagine that this was a very important achievement at the time. However, after some failed attempts to replicate the results, scientists came to a shocking realisation; the promising findings were only a by-product of a technical problem which occurred in the process of copying the data from an excel sheet to another statistical program. Being unable to provide evidence against fraud claims, “this mistake was career ending for some of its authors”, says Hair. These kinds of mistakes are not as rare as we wish it to be. However, adopting open science principles can be life-saving in such situations. This was the case for Dr. Julia Strand. In 2018, she published the greatest achievement of her career. As it turns out, there was a way to improve speech perception and diminish the cognitive effort that we spend in noisy environment. Simply presenting a modulating circle which got expanded when the speech got louder made participants respond faster, or she thought so. In 2020, she published a blog post which drew lots of attention from scientists. “The central finding was the result of a software glitch…” she said. She found what she found only because she made a mistake while programming the timing clock. In this case, however, the code was openly available, the study was pre-registered and her work was fully transparent. Detecting her own mistake too, helped her prove that this was not a scientific fraud.
Another reason for working reproducibly according to Hair is that, it makes writing easier. Writing a paper is not a smooth process if you don’t use tools like R, OSF and GitHub in the process. Having to copy your figures and paste it to your Word document, going back and forth to create a table for your analysis can be overwhelming, especially on a tight schedule. One of the best features of R is that you can run your analysis, make tables and figures and write your results up at the same time. On top of that, Hair explains that the rticles package in RMarkdown comes with many different paper formats such as PLOS or Frontiers style and knitr package allows you to save your document as PDF, word document or HTML file when you are done with it.
In addition, if you are collaborating with other scientists, you may easily end up with dozens of updated versions of the same code. GitHub is a great tool to avoid getting lost in a pool of code files in such situations. All you need to do is to upload your file on a remote GitHub repository so that the co-authors can pull it to their own computer, make changes and push it back to the repository. It tracks the changes that are being made and who made them, which means that you don’t need to keep all the versions on your computer.
Working reproducibly also ensures continuity, according to Hair. Especially as you make progress on your career and publish more often, you will notice that you forgot what you did, what the variable X in your dataset refers to and how it is different than the variable Y which looks just like it. Uploading our notes on OSF or creating a bookdown page can help us remember what we did and easily inform our team without having to go through everything again and again. Likewise, you can use GitHub to share your codes and readme files in which you can explain what your code exactly does.
Furthermore, it helps you get through the peer review. “If you are using RMarkdown, it helps the reviewers understand what you have done.”, says Hair. Reviewers can just download your data and RMarkdown file, re-run the whole analysis which could improve the reviewing process and help to avoid misunderstandings.
Reproducibility can help building your reputation and future-proof your work as well. Open science practices such as sharing your code and data are increasingly adopted by not only scientists but also the stakeholders. There are some funding available specifically for open science projects, therefore adopting open science practices can help you secure a funding for your studies. Hair explains that some journals like eLIFE are also sharing “living figures”. Here, the text and the figures are getting updated in the light of new data and information. It is possible to create such work by using RMarkdown and you can even submit it directly to journals like PLOS One.
Finally, Hair underlines that adopting open science practices eventually leads to getting more citations, recognitions and opportunities therefore helps you build a career as a scientist. She makes a convincing case and finishes her talk by saying;
“It’s not just good for science – it’s good for you!”
Adopting open science practices like reproducible and transparent workflow is becoming widespread among scientists. Whatever your reason is, -be it a selfless act of advancing the science or sparing yourself quite a lot of time- the best time to start is now.
Check out the full video on Edinburgh ReproducibiliTea Youtube page to get more detail on “Selfish Reasons to Work Reproducibly” by Kaitlyn Hair.
Her talk includes the reasons discussed in:
Markowetz, F. Five selfish reasons to work reproducibly. Genome Biol 16, 274 (2015).
Read more about the Duke University paper, Dr. Julia Strand’s blog post and the Living Figures.
For the materials related to this talk please refer to our OSF page
This blog was written by Bengü Kalo