Published July 12, 2016
Scientists are using machine learning to identify important sequences of DNA within the mosquito genome that regulate how the insect’s cells develop and behave.
The research project, funded by the National Institutes of Health (NIH), could have implications for disease control, potentially facilitating efforts to use genetic engineering to control mosquito populations or to create mosquitoes that have reduced ability to transmit maladies, such as malaria, to humans.
“Our work will break new ground in the field of mosquito genomics and genetics,” says Marc Halfon, professor of biochemistry in the Jacobs School of Medicine and Biomedical Sciences. “Mosquitoes are responsible for hundreds of thousands of deaths each year. Although we know the sequence of the mosquito genome, we have little functional information about what much of that genome sequence does.
“Our work will take important steps toward filling in this crucial missing information. It will demonstrate our ability to functionally annotate the regulatory elements within genomes of various insect disease vectors without requiring extensive — and expensive — new genome-scale experimental data for each.”
The project is funded by a $449,000 grant from the National Institute of Allergy and Infectious Diseases. It focuses on Anopheles gambiae, an important vector for malaria transmission.
Within the genome of every plant and animal, there are regulatory switches — strings of DNA that control the behavior of genes, dictating when and where in the body different genes are turned on and off.
These regulatory sequences matter because they can affect a species’ mating success and resistance to insecticides, Halfon says. In addition, regulatory mechanisms are crucial to genetic engineering of mosquitoes, in which researchers seek to control the expression of foreign or mutated genes introduced in a target animal.
For more than a decade, Halfon has worked with UB’s Center for Computational Research to build a database called REDfly that contains more than 5,600 regulatory sequences for a different insect species, the fruit fly Drosophila melanogaster. Now, his team is leveraging this trove of information to learn more about regulatory mechanisms within the mosquito genome.
With Saurabh Sinha, a computer scientist at the University of Illinois at Urbana-Champaign, Halfon developed a software called SCRMshaw that learns the regulatory sequences within REDfly, then searches the genomes of other insects for strings of DNA with similarities. The software has successfully identified regulatory sequences in mosquitoes that look nothing like Drosophila sequences to the human eye, but that possess similar traits (such as containing a related assortment of short three- to six- letter DNA subsequences).
“Finding regulatory elements is hard — traditionally, it has been done by tedious experimental work that examines one gene at a time,” Halfon says. “We wanted to know how you can do this faster: Just by looking at a DNA sequence, can you tell where the regulatory elements are? In at least some cases, the answer appears to be ‘Yes.’”
Using SCRMshaw in mosquitoes, Halfon, Sinha and colleagues were able to identify some of the regulatory sequences that may cause the activity of a network of genes to shift from the midline of the ventral nerve cord — analogous to the human spinal cord — to the lateral regions during the formation of the embryo of the mosquito Aedes aegypti, which transmits Zika, dengue fever and chikungunya.
This work, published online June 21 in the journal Developmental Biology, highlights how SCRMshaw can pinpoint regulatory sequences in non-Drosophila species.
“It shows how we can use SCRMshaw to address interesting biological questions of development and evolution,” Halfon says.
The next step is to use the new NIH funding to conduct extensive discovery of regulatory elements within Anopheles gambiae.
“We will focus on trying to identify regulatory sequences most useful for understanding aspects of mosquito biology that are relevant to its role as a disease vector — for instance, development of the salivary glands or the midgut, or olfaction — or that could be useful for biocontrol methods, such as genes affecting reproduction,” Halfon says. “Once we have generated a high-confidence set of regulatory element predictions, we will test them in transgenic mosquitoes.”
The new NIH project is a collaboration between UB and the University of Maryland. The effort will be bolstered by continued development of the REDfly database, which is supported by a $1.2 million grant from the National Institute of General Medical Sciences, part of the NIH, and a $447,000 grant from the National Science Foundation.