Category : Feature Stories
Published : August 13, 2018 - 1:01 PM
Michael Major, Crop Trust
Pre-breeders generate a lot of data. A LOT of data. They can make thousands of crosses between wild and domesticated species of food crops and evaluate those thousands of crosses under various conditions, in different climates and countries. Then they’ll make backcrosses and evaluate those crosses. Collecting and managing the data is hard work but analyzing it is an even bigger challenge – but one that must be addressed if pre-breeding is going to contribute to the development of studier ‘climate-proof’ crops.
The Crop Wild Relatives (CWR) Project coordinated by the Crop Trust is managing pre-breeding projects on 19 crops. “These projects are bringing back to our most important crops the many useful traits that their wild cousins still have in the genetic make-up, but the crops themselves have left behind. By crossing and backcrossing these plants, our partners are generating complex data in huge quantities,” said Hannes Dempewolf, Head of Global Initiatives at the Crop Trust. “For example, the sunflower pre-breeding project resulted in 545,000 molecular markers. This kind of data is an amazing contribution to the breeding community. We wanted to make sure it is publicly available, so we could maximize the number of breeders around the world that use it, and the plants themselves, in their improvement programs.”
Pre-breeding data available to everyone
The Crop Trust teamed up with the James Hutton Institute in Invergowrie, Scotland to ensure the CWR project’s pre-breeding data is available in a format that allows breeders and scientists to view and analyze the data as easily as possible. Hutton has been developing software known as Germinate which is specifically tailored to handle complex data from the use of plant genetic resources collections.
“We wanted to create a tool so that researchers and breeders can share data about different crops on a customizable, yet common, platform,” said Paul Shaw, a research leader in Information Systems at Hutton. “So we developed Germinate, which is a database system that can be used to view and select plant genetic resources data and then analyze it using various visualization tools.”
It became apparent that Germinate 3 – the latest version of Germinate – would be a perfect fit for the CWR project’s pre-breeding data. “We began looking at Germinate 3 as a platform for us to share our pre-breeding data with the world back in 2014,” said Hannes. “We wanted an easy-to-use tool that allows users to drill down through our partners’ massive datasets and make decisions which would help in their breeding or research activities. We felt the James Hutton Institute was ideally suited to lead this effort due to their experience in handling such data.”
These days new technologies are being developed which are making it much easier and cheaper to generate huge amounts of phenotypic and genotypic information. But storing data is just a start. Users need to be able to find the data they require, so presenting the data in a user-friendly and intuitive interface is equally important.
“Germinate 3 fills a role not offered by other plant genetic resources software platforms,” said Paul. “In short, it is capable of integrating both genotypic and phenotypic data with passport data.”
Additionally, Hutton has developed versatile graphical search functions on Germinate 3. “These allow users to identify groups of samples that meet selected passport, molecular or phenotypic criteria,” said Sebastian Raubach, Hutton’s Bioinformatics Software Developer who has worked extensively on the development of Germinate 3. “Once users have identified the data they are interested in, they can download it in a variety of formats.”
But the real power of Germinate 3 lies in its ability to integrate with a range of external data visualization software. These programs allow breeders to review large datasets in easily digestible graphics.
“We built functions into Germinate 3 which allow plant breeders and other scientists to export the data and then import it into these visualization programs,” said Paul. “That means users can perform complex analyses of the data outside of Germinate 3 and generate data-rich graphics.”
Hutton has also developed several external programs which provide user-friendly analysis of large datasets. Helium can help breeders determine the “genealogy” of a plant line. Flapjack helps users compare lines, markers and chromosomes by visually displaying similarities. CurlyWhirly can help users find patterns and outliers in the data.
Thus far, the Hutton team have created Germinate 3 pilot platforms for rice and sunflower using the CWR project’s pre-breeding data. Whereas the species may differ, the approach taken ensures that the tools are compatible, and developments can benefit all crops. In other words, work that Hutton has done on the rice database will benefit the Crop Wild Relatives’ durum wheat project, and all the others as well.
Going Live
Pre-breeding data from the below crops will be made available on Germinate 3
Alfalfa | Barley |
Chickpea | Cowpea |
Durum wheat | Eggplant |
Finger millet | Grass pea |
Lentil | Pearl millet |
Pigeonpea | Rice |
Sorghum | Sunflower |
“In the following months, we will be developing and deploying a series of web portals using Germinate 3 to support access to data from 14 of our CWR pre-breeding projects,” said Benjamin Kilian, Plant Genetic Resource Specialist at the Crop Trust. “We are now ready to launch the portal for the eggplant pre-breeding project.” The CWR Eggplant Database lists data on nearly 1,000 eggplant samples and more than 1,500 molecular markers.
“The Eggplant Database will give us an opportunity to receive some user feedback concerning Germinate 3,” said Benjamin. “This will help Hutton further improve on the product, as we continue releasing the data from our partners’ other pre-breeding projects.”
The CWR pre-breeding projects will continue to generate important data that will help plant breeders improve many of our food crops, making them more resilient to climate change. With Germinate 3, plant genetic scientists and breeders can rest assured that this data will long remain readily available on a versatile and powerful platform.
###
All material collected under the Crop Wild Relatives project is shared under the terms of the Standard Material Transfer Agreement (SMTA) within the framework of the multi-lateral system of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA).
All data is publicly available and shared under a Creative Commons license CC 4.0. A short demonstration video of how the license is implemented can be found here. Furthermore, all users of the database are encouraged to use a system of unique identifiers when referencing germplasm, called DOIs, as implemented by the Global Information System under the ITPGRFA.