Mining Transportation Information from Social Media for Planned and Unplanned Events

Dr. Qing He's research explores using social media data to accurately describe weather conditions in real time, and was featured in the Washington Post.

Social media has become ubiquitous in our daily life and there has been a growing interest in mining useful information therein. Vast of user-generated content strengthens the linkage and interaction between each individual within the community circle, and also provides large amount of information related to various areas. Examples include Facebook, Twitter, YouTube, Google Plus and Wikipedia. The trend of easily accessing social media will continuously grow with the development and commercialization of wearable computer devices, like Google Glass and other smart watches.

The focus of this project is on mining such data to deduce useful information about present or future travelers’ behavior, with a special emphasis under events, including both planned events (sporting games, concert, parade, holidays and etc.), and unplanned events (such as inclement weather, earthquakes, hurricanes, floods and etc.). Specifically, the project proposes to develop effective and efficient techniques to collect, extract and mine social media data to support advanced traveler information systems and traffic operators. By mining social media based semantics, especially text semantics, this project aims to achieve the following goals: 1) Assess the impact of unplanned events. 2) Extract useful travel information to indicate congestion for planned events. 3) Identify causality between abnormal traffic pattern and social media data.

In order to accomplish these goals, a 24-month project is defined in this proposal with a multidisciplinary team assembled with two PIs from transportation engineering and computer science, respectively.

Education: The results delivered by this project will be used to further refine and formalize two new graduate elective courses at SUNY Buffalo: CIE573/IE515 “Transportation Analytics” and CIE572/IE554 “Transportation System Modeling and Control” that were offered by Dr. He. The project outcomes in text mining and data fusion will be introduced in course CIE573/IE515. Moreover, the outcomes in incident management with social media can be incorporated in CIE572/IE554. The major content of the courses will cover specific topics in data fusion of floating and fixed data, incident detection, and multi-modal traffic signal control and simulation. Also this project will benefit the courses, CSE601 “Data Mining and Bioinformatics” and CSE722 “Selected Topics in Data Mining”, taught by Dr. Gao. Dr. Gao is introducing a senior level data mining course. The research activities and results from this project will be critical for our undergraduate and graduate students to get valuable experience with analyzing large amounts of data. The project deliverables in social media and transportation informatics are expected to be integrated in education activities in UB’s National Summer Transportation Institute (NSTI), funded by Federal Highway Administration (FHWA). 2013 NSTI at Buffalo, led by PI He, successfully recruited 20 high school students, shown as Figure 1. We hope that such experiences will be able to generate their interests toward pursuing transportation engineering careers in future.

Expected Product, Benefits and Technology Transfer: The project team will work closely with the NITTEC, City of Buffalo, New York State Department of Transportation to ensure that the findings of the project are transferable to practice.  The proposed algorithms and models are anticipated to contribute in development of an event traffic management system, which fuses both traditional traffic data and emerging social media data. The methodology and outcome of this project can provide very useful information to public, government agencies, and emergency preparedness agencies.

Researchers: Dr. Qing He, and Dr. Jing Gao

Partners: Niagara International Transportation Technology Coalition (NITTEC)

Data Sources: (1) Traffic volume datasets; (2) Traffic incident datasets; (3) Public available event data; (4) Public available social media data, such as Twitter, Facebook, Foursquare, Google+, and so on.