Today we witness the explosion of data which are continuously disseminated by countless individuals all over the world through a variety of channels, such as question-answering sites, review websites, social networks, discussion forums, blogs and wikis. The crowd wisdom mined from such massive streams of human-generated data provides critical insights for decision making in government, business, and industry.
My current research focus is to aggregate information about the same set of objects or events collected from a crowd of users to get true facts or consensus opinions. The key factor in aggregating crowdsourced data is to capture the difference in information quality among different contributors—some users constantly provide correct and meaningful information while others may submit wrong or fake information due to insufficient knowledge, privacy concerns or lack of attention. We have developed a series of approaches to model the probability of a user providing accurate information (trustworthiness) and incorporate user trustworthiness into the crowdsourced data aggregation. The methods’ effectiveness have been demonstrated in applications such as weather forecast, stock prediction, and crowdsourcing question answering, and may benefit other applications in which critical decisions have to be made based on correct information extracted from diverse sources.
Data and information analytics with a focus on data mining and machine learning