Back to R&D main

Developing new methods to determine genomic relationships for improved breeding

Project start date: 10 June 2013
Project end date: 03 March 2015
Publication date: 01 September 2015
Project status: Completed
Livestock species: Grassfed cattle, Grainfed cattle
Relevant regions: National
Download Report (0.5 MB)

Summary

The ability to accurately infer animal relationships through shared genetics underpins our ability to perform genomic selection and interpret GWAS studies. These in turn drive the speed of artificial selection and the discovery of genes that contribute to commercial traits. In previous milestone reports we explored the inference of cattle and sheep population relationships' using a new similarity metric called Normalised Compression Distance (NCD). This approach yields a Compression Relationship Matrix (CRM).  Like existing genetic relationship analyses based on correlation such as the genomic relationship matrix (GRM), the new metric quantifies similarity between numerical patterns in SNP data - that is, allele composition and order shared (to varying extents) by genome pairs. Not surprisingly, we previously found a very high concordance between CRM and GRM, and any genetic ranking made by the two methods would be very similar. A striking finding was that the new CRM approach can genetically discriminate very closely related individuals (such as half-sibs versus full sibs) in sheep and cattle populations where GRM cannot.
In this final report we focus on a deeper exploration of the genetics of yearling weight in Brahman (BB) and Tropical Composite (TC) cows. Using the latest 71K Indicus SNP chip, we explore heritability and genetic parameters under the usual assumptions. Further, we systematically explored the impact of the 3 different relationship matrices (NRM, GRM and CRM) in isolation and all combinations therein. Surprisingly, we find that an adapted version of the CRM reduces the 'missing heritability' associated with yearling weight in both breeds. The NCD output first needs to be mapped in such a way that 1) it takes advantage of the high sensitivity of NCD but then 2) grounds the output more strongly in the biology of genetic inheritance via meiosis (~0.5 sharing for full sibs, ~0.25 for half-sibs etc). Finally, a sliding window-based application of the compression approach recovers regions of evolutionary interest between the two populations. Some overlap with established signatures of selection. The remainder presumably reflect bottlenecks and other population history phenomena.
Our meeting of the 4 research objectives can be summarised as follows:
1) We have used compression efficiency to accurately infer animal to animal genetic relatedness. We ground-truthed our new output based on the known biology of the 4 populations in question (2 cattle and 2 sheep), and compared the independent predictions made by GRM. Several lines of evidence imply CRM is particularly sensitive at discriminating very closely related individuals.
2) We have used compression efficiency to verify parentage. We found that sire groups from the Faulkner sheep flock could be successfully clustered based on compression efficiency producing 3 main clusters that correctly reflect breed-level inter-relatedness.
3) We have used CRM to predict Estimated Breeding Values. We used the relationships predicted by CRM  to estimate genetic paramaters for BB and TC cattle. A version of CRM performs very well, explaining more genetic variance, exhibiting a reduction in missing heritability and yielding an increase in phenotype accuracy compared to not only NRM but also GRM.
4) We have published one manuscript in BMC Bioinformatics (Hudson et al 2014b), presented at the WCGALP 2014 conference (Hudson et al 2014a), and have another manuscript in preparation.

More information

Project manager: Terry Longhurst
Primary researcher: CSIRO