An Evaluation of Knowledge Discovery Techniques for ‘Big’ Transportation Data

Electric vehicle charging station on UBs north campus.

As an extension of the work that was completed for year one funding related to “Developing Highway Safety Performance Metrics in an Advanced Connected Vehicle Environment Utilizing Near-Crash Events from the SHRP 2 Naturalistic Driving Study” CUBRC intends to investigate the application of knowledge discovery (KD) techniques to analyze the data that was compiled in year one in relation to other ‘big’ transportation data sets. The title for the new work will be “An Evaluation of Knowledge Discovery (KDD) Techniques for ‘Big’ Transportation Data.” It is envisioned that CUBRC will complete work under the following tasks.

Task 1 - Literature Review

CUBRC will review and compile literature related to knowledge discovery and data mining with a specific emphasis on massive data its applications for transportation data. The literature will include peer reviewed research articles as well reports published by government and private agencies. The findings from the literature review will aid in identifying and applying KD techniques for the analysis of the collected data, as well as provide context for the final report.

Task 2 – Data Compilation

In order to evaluate potential KD techniques for transportation data, CUBRC will collect and compile a variety of ‘big’ datasets. These datasets will be selected from the large number of crash related datasets currently available and will include the SHRP 2 NDS data obtained from year one. While the majority of the available datasets are structured, we will investigate the inclusion of non-traditional unstructured datasets (i.e. video, social media, weather).

Task 3 - Evaluation of Effectiveness of KD Techniques

After identifying and accessing the databases to be used, CUBRC will create a common data model for the collected data. This task will consist of creating ontologies for data elements of interest (i.e. crash types) as well as align the data sets. CUBRC will then create metrics to evaluate the usefulness and added worth of utilizing a KD framework for future transportation research with massive data. We will summarize and report the findings of this task in the final report.

Task 4 - Project Management and Reporting

We recognize that no matter how significant or accurate the findings of this study, they will have only limited value if they are not adequately documented and disseminated. With this in mind we will provide the UB with quarterly progress reports throughout the project and produce a final report. The final report will include the key points from each of the tasks as well as issues and problems encountered during their performance. A draft final report will be submitted for review and comments and revised per the feedback.