Revealing Presence-Absence Variation in Plant Genomes: Unveiling Pangenes Through Genome Alignments

As the cost of genome sequencing has declined and the quality of sequencing data and assembly algorithms has increased, there has been an increased interest in pangenomes. Pangenomes allow a crop research community to better understand the genetic diversity in their species of interest. Since a large proportion of many crop genomes are repeats, it is common to focus on the regions that encode genes. Researchers at the European Bioinformatics Institute, Estacion Experimental Aula Dei-CSIC, and University of Liverpool developed a method to cluster gene annotations across a set of assembled genomes into pangenes. The get_pangenes.pl program uses whole genome alignments (WGA) to identify pairs of collinear genomic segments that are used to lift gene models between genome assemblies. This approach avoids reference genome bias and accounts for false negative gene annotations that arise from lack of RNA-seq evidence. The benchmarks demonstrate that the protocol can successfully model presence-absence variation and is robust to annotation errors and frame shift mutations.

The most challenging part was to align large genomes, such as barley or wheat, which we eventually managed to do by masking out long regions with no gene annotation. - Contreras-Moreira

An exciting aspect of this approach is the potential to predict missing gene models for individual genomes from pangenes in conserved regions. - Dyer

Gramene examples:

Figure 1: Example view of a core pangene Os01g0100600

Figure 2: Example neighborhood conservation view of core pangene Os01g0100600

Image 1: Graphical summary of pangene set analysis. Genes from three example genomes are clustered based on whole-genome alignments and classified as core, shell and cloud according to their occupancy.

Reference:

Contreras-Moreira B, Saraf S, Naamati G, Casas AM, Amberkar SS, Flicek P, Jones AR, Dyer S. GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation. Genome Biol. 2023 Oct 5; 24 (223). https://doi.org/10.1186/s13059-023-03071-z. Read more

Related Project Websites: