We recently undertook an effort to project our manually curated Plant Reactome Oryza sativa (rice) pathway data onto Zea mays (maize) and Arabidopsis thaliana, and we were curious to compare gene-level projection results derived using both Compara and Inparanoid methodologies. We wrote a Python script to generate gene lists for our curated rice gene products and map them to pre-computed projections provided by the Compara workflow and an in-house implementation of Inparanoid. The same script also provides the option to create Venn diagrams illustrating the overlap between both projection sets (see following figures).
As to the specifics of our Compara projection data, we isolated gene identifiers and their Uniprot counterparts in our curated rice pathways and downloaded Ensembl Plants rel. 40 orthology predictions for Z. mays and A. thaliana, using O. sativa as the reference. Using Compara's reciprocal identity data, we limited the orthology predictions to those meeting a threshold of 30% reciprocal identity for Arabidopsis and 40% reciprocal identity for maize. Both high and low confidence Compara data were utilized.
To gather Inparanoid projection data, we ran a customized in-house super-cluster program that generates the best clustered projection hits for a specific set of compared species – in this case, rice, Arabidopsis, and maize.
The results of the comparison demonstrate that, in our interpretation, there is a substantial amount of overlap between the two projection methods when stringency thresholds are kept at or below 30% in Compara (it should be noted that this is likely a species-dependent observation). As stringency was raised in Compara, we noted that a concomitant number of projections began to fall out of the intersection set of projected loci , mostly in the direction of the Inparanoid projections. This suggests that Inparanoid super-clustering is more aggressive than Compara in its projection methodology, subject to variables such as the species involved and identity thresholds.
At this time, we have decided to use Ensembl Compara projection as the basis for our Plant Reactome projections. You can browse our curated and projected pathways at the Plant Reactome site.