Funded through the federal government's Genomics Research and Development Initiative (GRDI), dozens of researchers from 7 federal departments and agencies have been collaborating in the collection and metagenomic analysis of thousands of soil and water samples from locations across Canada. This is the Ecobiomics project—a 5‑year, multidepartment effort to advance environmental assessment, monitoring and remediation by using metagenomics technologies to better characterize and detect changes in microbial and invertebrate biodiversity.
The next level
Unlike earlier technologies—which provided the DNA sequence of one organism at a time—metagenomics technologies can generate DNA sequences for any number of organisms in a sample in one test. With this technology, researchers are able, for the first time, to see the stunning biodiversity that exists at the microbial and invertebrate levels, where a single gram of soil can contain millions of microorganisms.
Metagenomic analysis of that gram of soil can produce genetic information on thousands of species of microorganisms. Multiply that by thousands of samples, add in the new data generated as the original data is manipulated and studied, and the need for powerful bioinformatics tools becomes clear: the amount of data being generated is measured in petabytes—millions-upon-millions of gigabytes.
A first priority
Agriculture and Agri‑Food Canada (AAFC) research scientist Dr. James Macklin—co‑leader of the Ecobiomics project—says having the capacity to manage all that data is fundamental to the success of the project. "Without a place to store the data, the software tools that enable us to do research on it, and the wide‑area networks that allow us to share large volumes of data across departments, we would never be able to realize the potential of this data to generate knowledge," says Dr. Macklin.
Making it work
The necessary computing power was delivered by Shared Services Canada through its High Performance Computing centre near Montréal. Building on the bioinformatics platform initially constructed at AAFC for the GRDI Quarantine and Invasive Species project, Glen Newton and Iyad Kandalaft at AAFC's Biological Informatics Centre of Excellence in Ottawa led the assembly of inner workings of the system—including the algorithms to enable users of the platform to manipulate the data in various ways. Ecobiomics scientists across 7 departments and agencies can now login to analyze their data.
Setting the standard
In order to make the best possible end‑use of all this data—to inform future environmental assessments or provide evidence to recommend more sustainable and productive agricultural practices, for example—its reliability has to be beyond question. For the Ecobiomics project, that meant everyone involved would have to use the same protocols and methods to collect and analyze samples, and use the same scientific vocabulary to describe their findings.
"Without a standardized approach, you can't do a valid comparison of the results of research conducted by different departments, in different parts of the country, or at different times," says Dr. Macklin. "Thanks to the collaboration enabled by the GRDI and the vision of the researchers involved in this project, we've been able to develop and implement a standardized approach that ensures our data is as useful and reliable as possible."
A scientific legacy
While the Ecobiomics project itself will come to an end in March of 2021, the bioinformatics capacity that has been assembled and the data it holds will continue to serve researchers both inside and beyond the Government of Canada for years to come.
"This is a powerful resource," says Dr. Macklin. "And with these standardized protocols in place, we will be able to maintain the integrity of the system as future projects of this kind generate new data to be added to the platform."
A remarkable accomplishment
Dr. Tom Edge, who initially co‑led the Ecobiomics Project at Environment and Climate Change Canada (ECCC), says it is difficult to overstate the value in this cross‑government approach.
"Without this collaboration, individual departments would be struggling to find the resources to store and manage all this data on their own," says Dr. Edge. "Without the standardized protocols, all this information about water quality and soil health would be hard to compare across studies and it would remain in departmental silos, instead of being a resource available to researchers throughout the Government of Canada."
Now an adjunct professor of biology at McMaster University in Hamilton. Dr. Edge says this degree of interdepartmental collaboration, pioneered in GRDI Shared Priority Projects, is a uniquely Canadian success story. "I've had informal chats with colleagues in other countries, and they just kind of smile when they try to imagine how they could possibly achieve this kind of interdepartmental collaboration and agreement within their government."
The Ecobiomics bioinformatics platform is hosted on supercomputers like this one at Shared Services Canada's High Performance Computing environment near Montréal—home to some of the fastest computers in the world.