Sowing the Seeds of AI Success

Jonathan Hull, PhD ’88, MS ’83, BA ’80, with former students.

Jonathan Hull, PhD ’88, MS ’83, BA ’80, principal consultant at Bay Area Research Associates (CENTER), with former students
Siamak Khoubyari, MS '92, BS '90, (LEFT) and Tao Hong, PhD '96, MS '92 (RIGHT)

Before Silicon Valley, it was in the City of Good Neighbors where Jonathan Hull, with a raft of graduate students, helped lay some of the foundations for artificial intelligence.

The furniture of the coworking office Jonathan Hull has leased for the day is completely unremarkable: wide worktables pushed together, a few rolling chairs, a squat cabinet. But the vibrant throw rugs, bright red bean bags and colorful assortment of Rubik’s Cube puzzles counteract the corporate feel; they demand that all who enter here engage in creative play. Silicon Valley is, after all, known for its disruptive creativity.

The Valley tends to draw in people like Hull, so perhaps it was inevitable that he ended up here, so far from Buffalo. Like many others in recent years, he’s busy working on machine learning and data analysis problems—in short, artificial intelligence applications. Hull finds his particular skill set in high demand.

But before the Valley, it was in the City of Good Neighbors where Hull hashed out artificial intelligence problems, decades before AI was a buzzy catchphrase. Together with a raft of graduate students in a quiet powerhouse of a research center at UB called CEDAR, Hull helped lay some of the foundations for artificial intelligence and other “smart” technologies we all rely upon today—voice recognition and image search, for starters.

“In that moment, it seemed so run-of-the-mill,” Hull says. “But looking back now—the people we graduated, the work they did—it’s mind-blowing.”

Jonathan Hull with Baidu co-founder Robin Li in China in 2007.

Hull with Baidu co-founder Robin Li in China in 2007.

Many of the graduate students Hull advised as a research associate professor and the associate director of CEDAR now are filling key roles in the tech world, both in Silicon Valley and elsewhere. Michal Prussak, BS ’91, first a data scientist at Amazon, now at Microsoft. Tin Kam Ho, PhD ’92, a senior AI scientist with IBM Watson. Dar-Shyang Lee, PhD ’95, MS ’90, BS ’88, a software engineer at Google. Yanhong (Robin) Li, MS ’94, CEO and cofounder of Baidu, China’s largest search engine provider. Tao Hong, PhD ’96, MS ’92, former senior scientist at Baidu, now a consultant for various tech companies. Siamak Khoubyari, MS ’92, BS ’90, software architect and technical director at Symphony, a cloud-based team collaboration company.

Hull isn’t the kind of person to take credit for his students’ successes; affable, unassuming and quick to share, he’s pretty much the opposite of the Silicon Valley stereotype. But with more than 100 academic publications and 200-plus patents to his name, he would certainly be entitled to a bit of bragging. When he does speak of successes, it’s in the collective: as the beneficiary of advice from inspirational figures, or as part of a circle of thoughtful guidance for those who came after him.

“It was at UB that I learned the importance of nurturing ideas, of not cutting folks off,” he says. “We used to say that there are no bad ideas. But sometimes you have to take an idea that’s half-formed and work it a little bit until you find the good portion.”

The Foundations of AI

At UB, Hull and his team of graduate students were among the original players at the Center of Excellence for Document Analysis and Recognition, or CEDAR. Established in 1990 through funding by the U.S. Postal Service, CEDAR had among its primary early initiatives the development of AI algorithms to read handwritten ZIP codes—and, eventually, entire handwritten addresses—for faster, more accurate sorting of mail.

The optical character recognition system (OCR) they developed allowed the Postal Service to achieve a long-held goal of processing 12 to 13 pieces of mail per second, equivalent to 43,200 pieces per hour. The service’s best automated systems at the time were capable of sorting 30,000 letters per hour, but that figure doesn’t tell the whole story. Thirty percent of the pre-OCR letters were rejected for having an unreadable address, with 95 percent of those containing handwritten elements.

Gleaning information from a document is a fairly intuitive process for the human brain. You know at a glance, for instance, whether a piece of paper is a bill or a newspaper article, and can infer your name on an envelope even if it’s misprinted. CEDAR’s challenge was to develop software that could give computers the same ability to pull out key information from even messy, irregular documents.

The researchers began with ZIP codes and addresses, and a pivotal discovery by Hull—that the simultaneous recognition of both street number and ZIP code would drastically limit the number of addresses where a piece of mail could be delivered—set the entire project on a path to success. From there, the team of graduate students, each working on individual issues related to OCR, developed multiple algorithms for recognizing and using other kinds of information within documents to categorize them: keywords, high-frequency words, syntax, grammar, font.

These processes are ubiquitous today, but there was no precedent for them in the 1990s. Competing in a space with high-powered and well-funded peers like Bell Labs and Hughes Research Laboratories, the CEDAR researchers had to chip away at pattern recognition problems, stymied by underpowered computer processors and low-resolution cameras. But their work paid off, and not just for the Postal Service.

“All of these technologies we use today—handwriting recognition software on your mobile device that can understand your scribbled notes, using voice recognition to ask for directions—all of that is possible and mainstream because of the work we and other researchers were doing back then,” says Khoubyari.

Siamak Khoubyari (LEFT) and Tao Hong (CENTER) talking with Hull.

Paying it forward

And yet, says Hull, nobody at the time considered what they were doing to be potentially world-changing. It was just a big problem that needed to be solved, and a bunch of hungry, scrappy graduate students eager to knock it out. How they pulled it off speaks volumes about the way Hull worked with his students, an approach he says was modeled on the way his mentors nurtured his own academic growth.

Born and raised in Tonawanda, N.Y., Hull was a star student, but credits a broken-down ’69 Pontiac LeMans for starting him down the path of invention. His father, Joseph, BS ’50, an accountant, and his mother, Katherine, a homemaker, refused to indulge their teenaged son’s desire for a car, so Hull scrimped and saved and bought a cheap fixer-upper. His older brother, Christopher, a mechanic, helped him take the engine apart and rebuild it.

“We put the pieces back together, turned the key, and it started,” Hull recalls. “It’s like magic. You go from almost nothing to something, and it works.”

After graduating third in his high school class of nearly 500, Hull arrived at UB with a vague idea that he’d pursue math. But computer science lured him away, and into the orbit of two key figures who would cement his future: Richard Schmidt, a professor of biostatistics, and Sargur Srihari, the founding director of CEDAR.

Both of these mentors encouraged Hull to pursue graduate studies at UB. The idea hadn’t really occurred to him, he says; he figured that after graduation, he’d find a job. As Hull obtained a master’s degree and then went for his PhD, his mentors’ interest didn’t wane. Srihari rejected Hull’s first dissertation proposal, pushing him to pursue a more rigorous path because he knew Hull was up to the challenge.

So when Hull became the associate director of CEDAR, he was determined to provide the same exceptional guidance to his students.

“It was always important to me to match the student to their interests, dream up a project that fits their background and the needs of the larger project,” he says. “And I tried to make sure students had distinct projects that weren’t competing with someone else’s, that they had the opportunity to dig in, make something new happen, not worry about looking over their shoulder, and then feel proud of their own work.”

Hong, for example, had a background in the psychology of reading and cognitive science, so Hull worked with him to develop a project involving syntax and context. At CEDAR, Hong developed OCR techniques to improve recognition accuracy by using linguistic rules as well as the shapes of words as reference points. After earning his PhD, he worked for Microsoft and then went on to help the company Tegic develop the “T9” predictive text software used on many cell phones prior to the smartphone era. Later, for the company ID Analytics (acquired by LifeLock and Symantec), he used similar pattern recognition techniques to build predictive models for identity fraud detection. At Baidu, he helped build the engineering and research teams on natural language processing, web data mining, machine learning and computational advertising.

“I tried to make sure students had distinct projects that weren’t competing with someone else’s, that they had the opportunity to dig in, make something new happen.” -Jonathan Hull

Baidu itself is perhaps the most direct descendant of CEDAR. Arriving at UB with a background in information science and management, Li sought out Hull to work on a master’s project. At the time, Hull was looking for a student to work on linguistic analysis. So together they came up with a project for Li related to information retrieval—using keywords to find documents in a database that could be thematically related to one another.

If that sounds familiar, it’s because it’s the essential premise for how search engines work. In 1999, five years after obtaining his master’s from UB, Li founded Baidu; today, it’s the second largest search company in the world, after Google, and a world leader in autonomous driving and AI programs.

Khoubyari’s projects focused on font recognition and identification of high-frequency words to increase OCR accuracy. Post-CEDAR, he joined CAERE Corporation (founded by the inventor of the microchip, Robert Noyce) to work on a project that eventually became OmniPage, which converts scanned paper documents into editable digital documents and extracts information from the originals in more than 120 languages. Khoubyari changed direction during the corporate consolidations of the mid-‘90s, becoming more of a generalist developer and software architect (for Symphony, he helps develop secure messaging and other collaboration tools for financial professionals), but he has enjoyed watching others reference Hull and build upon the foundational work he did.

Both Khoubyari and Hong say Hull’s approach was a critical factor in how their own careers unfolded—and that his deep knowledge set them and their peers up for success.

“When I came to UB, Jon gave me the freedom to explore,” Hong says. “The usual way is to follow your adviser and publish on what they’re working on, but Jon would send me to conferences even without a paper to present. And he coached me. You need that to do solid work on these complex problems.”

Khoubyari echoes the sentiment.

“Innovation is a buzzword now, but back then, it was a necessity,” he says. “To get from point A to point B, you had to think about making new ways. Students might have all kinds of energy and ability, but to have a guru like Jon there to guide them, to avoid pitfalls, to set them in the right direction—that’s priceless.”

Story by Michelle Z. Donahue
Photographs by Timothy Archibald