Reproducible research for biofuels and biogas

By Gigascience | July 31, 2015

New research in the open access journal GigaScience presents a virtual package of data for biogas production, made reusable in a containerized form to allow scientists to better understand the production of biofuels.

One of the promising areas in biofuels development is biogas, which has huge potential as a renewable and clean source of energy. Biogas is the production of methane gas through the anaerobic digestion (fermentation) of organic matter such as agricultural or food waste. Detailed knowledge on the functioning of the fermentation process is key for optimizing this process; however, the vast majority of the microbes involved remain unknown and cannot be cultivated in laboratories.

In new research just published in the open access journal GigaScience, researchers from Bielefeld University in Germany have now characterized the complex communities of micro-organisms in a biogas plant that generates heat and power from maize silage and pig manure. Further, the authors took an unusual step to make their research more reproducible by creating a virtual 'container' of their data and tools.

For their study, the researchers carried out metagenomic and meta-transcriptomic analyses, which resulted in the generation of DNA and RNA sequences from the thousands of microbial species present. From this they were able to create a catalogue of 250,000 genes that enabled the researchers to begin defining the underlying biology of methane production. While this data production only scratches the surface of the vast amount of information gathered, the authors furthered the usefulness of this resource by releasing all of the data and computational methods as a shareable container. These containers enable others, at the press of a few buttons, to execute the same analyses in the cloud. This not only makes the research reproducible, but also allows researchers around the world to build on these resources to more rapidly delineate the important processes involved in biogas generation and to better explore its use for biofuel.

As experiments become more data-intensive, reviewing and publishing the methods and results of scientific studies become increasingly challenging. To get around this, the authors used the rapidly emerging Docker platform, which effectively wraps software in a system that includes everything needed to rerun it. This removes the need for other researchers to install and maintain the many complex bioinformatics tools and software libraries: something that can be very technically challenging for researchers without the computational resources and skills.

“We decided to use virtualisation techniques to encapsulate our analysis workflow and make it basically independent from the host it is executed on,” says Andreas Bremges, first author of the study. Peter Belmann built the Docker container for the biogas study, and is a core team member of the bioboxes project to standardize interchangeable bioinformatics software containers.

“The reproducibility of published research is an important aspect of science,” highlights Peter Li, lead data manager at GigaScience, who undertook the step of exactly recreating the results in the paper, which is extremely unusual in any other scientific publication. “Andreas and his colleagues provided a Docker container that encapsulated the method used to process the data from their biogas study. This made my job of checking the reproducibility of their results much easier as their Docker container took care of installing the bioinformatics tools and their dependencies on my cloud server.”

The use of Docker in this “container” publication is a step towards moving publishing away from static and often un-reproducible papers—which have changed little since the 17th century-- to more reproducible digital objects that better fit 21st century technology.