The 18th annual International Biocuration Conference was held in Kansas City, Missouri, April 5-9. This event brought together biocurators, software developers and users of life sciences and clinical data to share their work, encourage collaboration, and highlight the essential curatorial efforts that support research and innovation across academia, government, and industry. There were 135 attendees (96 in-person and 39 virtual), 14 workshops, 84 abstracts, and 45 posters (35 in-person and 10 virtual). The conference was co-organized with the International Society of Biocuration (ISB).
The first two days revolved around 15 concurrent workshops with great content and lively discussions. The main three days of the conference featured a unified schedule, allowing everyone to focus on the same topics and connect over shared experiences during breaks and networking sessions. In addition, there were two poster sessions and a workshop on Biocuration Career Opportunities on the last day.
Artificial Intelligence (AI) was a running theme focusing on what is already being applied in biocuration and what else could potentially be done with it. Although there continues to be a healthy degree of skepticism within the biocuration community aiming for “gold standard” sets, there seems to be a growing sense of optimism. Biocurators continue to be crucial in generating the highest quality datasets that are needed to train and validate large language models (LLMs). AI tools are rapidly proliferating and evolving, hence require expert curators to device, assess, and inform their development. The hope is that AI will assist curatorial tasks to increase efficiency.
Dr. Marcela Karey Tello-Ruiz of Cold Spring Harbor Laboratory, presented the poster titled, “Accelerating Agricultural Research Through Interoperable Genetic and Phenotypic Variation Data”. This work represents the collaborative efforts of the Gramene Plants and SorghumBase Teams, and a milestone of the Standards for Genetic Variation Working Group of the AgBioData Consortium, which Marcela co-chairs with Timothee Cezard of the European Variation Archive. Collaborators Melanie Harrison of the Germplasm Resource Information Network (GRIN Global) and Sarah Dyer of Ensembl Plants were also involved in this work. Dr. Sushma Naithani of Oregon State University gave a talk on how Gramene’s Plant Reactome combines manual biocuration, AI tools, and gene orthology to map metabolic, transport, hormone signaling, and gene regulatory pathways across ~130 plant species. Dr. Naithani also described her NASA-funded undergraduate students projects for the curation of gravitropism pathways.
Conference keynote speakers included:
Nirav Merchant, University of Arizona, Cyverse - “Egalitarian AI: Enabling Exploration and Experimentation for Everyone”. This keynote addressed the importance of AI being egalitarian in terms of resources, access and training, as well as being involved in building the technology to guide its priorities. Nirav described his scientific journey centered around the evolution of the NSF-funded Cyverse Project, shared seminal resources to understand Foundation Models, and valuable insights including his top takeaways from the Artificial Index Report. He also described AI-Verde, a unified service designed to meet the research and teaching needs of a university campus by facilitating integration of commercial, cloud-hosted, and on-premise open LLMs in an academic setting. Conference participants also had a chance to create their own LLM using Google’s Teachable Machine, which was really fun!
Samuel Stevens, The Ohio State University - “Imageomics: Images as the Source of Information about Life”. Sam is a PhD student at the Imageomics Institute of the Ohio State University who discussed imageomics projects including BioCLIP, a revolutionary tool for biological image classification that improves the efficiency and accuracy of analyzing vast datasets from diverse sources, aiding research in biodiversity and species identification. Training models ideally require large, clean and diverse data sets to develop efficient classifiers. A great plus of this tool was that iterations in real time enabled improving the study’s experimental design.
Shannon Farrell, University of Minnesota - “Data Sharing Is Not Enough: Focusing on Curation, Collaboration, and Sustainability”. Takeaways from Shannon’s presentation included that human curation remains critical and resources for it are often limited or nonexistent, and that flexibility is critical to deal with the unexpected. She described a standardized set of protocols and checklists from the Data Curation Network, an initiative to align data sharing efforts, and mentioned the Data Rescue Project, an effort to preserve at-risk big data in the face of defunding.
Paul Thomas, University of Southern California - “Annotating and predicting gene functions through evolutions and revolution”. Key takeaways from Paul’s talk were that function prediction remains to be an outstanding problem, and large-scale evolution modeling can be used to integrate knowledge. Paul emphasized the need to democratize knowledge—through standardization and biocuration—to be correctly interpreted and used by non-experts to enable trustworthy AI models.
Sandra Orchard, EMBL-European Bioinformatics Institute - “How will the role of the Biocurator change in this world of Artificial Intelligence and Machine-Learning?” Dr. Orchard, a recipient of ISB’s 2023 Exceptional Contribution to Biocuration Award, was awarded the “Biocuration Lifetime Achievement Award”. Her keynote presented the outcomes from evaluating precision and recall of LLMs in the manual curation workflow of UniProt/SwissProt, and reinforced the importance of human biocurators to guide and validate LLMs.
Andy Hickl, Allen Institute - “From Annotation to Insight: How NLP Transformed Biocuration”. Andy’s talk made a case for biocuration to be one the best examples of AI transformation—i.e., the systematic adoption of AI by an industry— in science to date. He emphasized that natural language model (NLP) methods have never simply automated away human roles, but rather have steadily shifted biocurator’s key indicators of performance toward interpretative quality, expert judgement and insight. In Andy’s opinion, human oversight and interpretation will continue to drive innovation in this unprecedented era of machine-assisted scientific sensemaking.
Other highlights of this symposium included a fantastic panel and group discussions on the implications for sustainability and best practices in biocuration. The panel for “AI Impact in Biocuration“ included Chris Hunter of Giga Science, Madhura Vipra of Medvolt Tech, Robert Allaway of Sage Bionetworks, Kim van Auken of the Alliance of Genome Resources, Rachel Lyne of InterMine.
All sessions were recorded and made available to registered participants.
Sushma Naithani, Oregon State University representing Gramene’s Plant Reactome, talk: “Plant Reactome: A plant pathways Knowledgebase and discovery platform”
Marcela Karey Tello-Ruiz, CSHL, representing the SorghumBase and Gramene database, poster: “Accelerating Agricultural Research Through Interoperable Genetic and Phenotypic Variation Data.”
Members of the AgBioData Consortium. Left to right: Sushma Naithani of Oregon State University (Plant Reactome), Leonore Reiser of Phoenix Bioinformatics (TAIR), Marcela K. Tello-Ruiz of CSHL (Sorghumbase/Gramene), and Karen Yook of CalTech (Micropublications).
Plant scientists. Left to right: Marcela K. Tello-Ruiz representing SorghumBase and Gramene, Sonia Balyan, representing the Indian Crop Phenome Database at Indian Biological Data Centre (IBDC) Susma Naithani, representing the Plant Reactome.