Notes from Biocuration 2017 conference

by Sushma Naithani and Parul Gupta

The 10th International Biocuration Society conference (http://med.stanford.edu/biocuration.html) was held at Stanford university from 25th -29th March, 2017. One of the major focus of this meeting was on how to improve, support and make biocuration cost effective. Several groups discussed use of text-mining tools that can extract summaries of the paper; figures captions and gene associations, and metabolic pathway from text. Tool like MedlineRanker extracts pathway genes for rice and other species from published literature including their external links to GenBank. Others discussed tools that may help in identifying and prioritizing publications for curation of high quality information to make the best use of resources (e.g. PubTator from NCBI). WikiGenomes and WikiData projects talked about crowdsourcing of biocuration. Taner Sen discussed technology vs. usefulness of resources and how to prioritize. Peter Karp in his talk “Current issues in biocuration” compared manual curation vs. text-mining in terms of cost and accuracy. Error rates of information extraction programs are much higher in comparison to professional curation process. Cost of curation per paper was $219 per article over a 5-year period. That cost is 6–15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Workshop on the topic “Reading assembling and Reasoning for Biocuration” focused on capture of big mechanism, which includes machine reading of articles, semi-automated assembly of signaling pathway models for diseases and effect of drugs to treat such diseases. This workshop was a good example for automated curation and biological modeling. Another workshop organized by Fabio Rinaldi and team on the topic “Biocuration and research life cycle: Advances and challenges” provided forum to discuss major advances and challenges in biocuration workflow; collecting the information from publications; database entry, extraction and triage of data from figures or supplementary information of publications. Other useful resources represented were ExplorEnz, the enzyme database useful for metabolic pathway curators; and NaviCell. (NaviCom tool): uses google map concept to display and browse the gene-gene interaction network. 150 participants including Sushma Naithani and Parul Gupta from Gramene database attended the meeting. Gramene curators presented a poster on Plant Reactome (http://plantreactome.gramene.org/), the pathway portal of Gramene. This meeting was particularly useful for our curators to learn about the various resources and tools useful for curation and to network with the community. The Gramene curators explored possibility to collaborate with EMBL-EBIs curators working on curation of plant protein complexes and gene-gene interaction networks (IntAct); implementing standardized description of scientific evidence using the Evidence Ontology (ECO); and participating in Google summer of code. The 3D anatomical models (shown for mouse by Chris Armit) displaying gene expression and other data types specifically in tissue and cell–specific manner was impressive and can be used for other models. This meeting was especially very useful in terms of developing standard operating protocol for pathway curation. Sushma Naithani acknowledges travel support from International Society of Biocuration.