Published July 14, 2020
Barry Smith, PhD, SUNY Distinguished Professor of philosophy and affiliate professor of biomedical informatics, is among the co-authors of two new papers discussing how ontologies can assist in the fight against COVID-19.
The mass of growing and constantly changing data resulting from multiple disciplines represents one of the biggest challenges researchers and public health officials must confront while trying to manage the ongoing COVID-19 pandemic.
But several centers across the country, including the University at Buffalo’s National Center for Ontological Research (NCOR), are working to develop ontologies to assist in the efforts to control the current outbreak, accelerate data discovery in future pandemics, and promote reproducible infectious disease research, according to Smith, director of NCOR.
To realize the scope of the challenge faced by scientists confronting COVID-19, consider the disciplines involved in the fight — everything from immunochemistry to behavioral population modeling.
All the data collected by biologists, pathologists, sociologists, geographers, physicians and epidemiologists require integration, but the relevant information is captured using discipline-specific terms and is often stored in ways that are accessible only to those working in the fields in which they originated.
“Ontology was designed to address that problem by creating common controlled vocabularies for discipline-neutral data descriptions that everyone can use,” says Smith, who was named one of the 50 most influential living philosophers in 2016 by TheBestSchools.org.
“It’s nearly impossible, unless you’re an expert in multiple separate disciplines, to join data deriving from multiple different sources,” Smith says. “This problem is especially acute in the face of a novel pathogen such as SARS-CoV-2, where no one can anticipate which combinations of factors will prove crucial in understanding how it affects its human hosts.”
Accessing and integrating massive amounts of information from multiple data sources in the absence of ontologies is like trying to find information in library books using only old catalog cards as a guide, when the cards themselves have been dumped on the floor.
Ontologies are data sharing tools that provide for interoperability through a computerized lexicon with a taxonomy and a set of terms and relations with logically structured definitions. When these terms and definitions are used in annotating literature and data, the resultant ontologically enhanced information becomes not only more readily discoverable by human beings, but also more capable of being analyzed by computers.
Smith has been working for some 15 years with biologists, clinicians and bioinformatics specialists to create a suite of ontologies to cover all the life sciences.
The first paper — with co-authors from Niagara University, Northwestern University and University of Texas Southwestern Medical School — has not yet been accepted for publication. However, in light of the urgency of the pandemic, it appears already on the preprint repository of the Open Science Foundation.
Titled “The Infectious Disease Ontology in the Age of COVID-19,” the paper first presents an Infectious Disease Ontology (IDO) Core, which contains terms relating to infectious diseases generally before describing how this IDO Core has been extended in a number of ontologies relating to specific infectious diseases, such as malaria, staph and flu.
The paper concludes with a treatment of IDO ontologies, first, for viral infectious diseases in general; second, for coronavirus infectious diseases, and finally, for COVID-19 in particular.
These ontologies help to fill the need for standardized terminology in describing infectious disease data and information, and because they are all constructed in the same way, they make it easier to compare COVID-19 data with data pertaining to other coronavirus diseases — such as SARS, MERS — and the novel coronavirus diseases of the future.
The UB ontology team also regularly collaborates with information disease informatics researchers from other universities — including the Center for Vaccine Ontology Research at the University of Michigan at Ann Arbor.
That collaboration has already led to a number of papers, including most recently, “CIDO, A Community-Based Ontology for Coronavirus Disease Knowledge and Data Integration, Sharing, and Analysis,” which was published in the Nature journal Scientific Data on June 12.
This paper outlines the Coronavirus Infectious Disease Ontology (CIDO), which covers multiple areas in the domain of coronavirus diseases, including etiology, transmission, epidemiology, pathogenesis, diagnosis, prevention and treatment, emphasizing CIDO developments relevant to COVID-19.
“An infectious disease ontology can also contribute to solving the problem of reproducibility,” Smith says.
Reproducing the results of experiments as part of the research process requires a precise description not merely of the results achieved but also of the protocols, statistics, equipment, samples and tests used.
“We believe that, when used in combination with other life science ontologies such as the Ontology for Biomedical Investigations, the IDO framework provides a promising strategy for the creation of comparable, integrable and discoverable provenance metadata for the data generated in infectious disease research,” Smith says.