Research Project

Scalable Non-linear Dimensionality Reduction Methods to Accelerate Scientific Discovery

Principal Investigator:

Co-PIs:

Abstract

The progress in science and engineering increasingly depends on our ability to analyze massive amounts of observed and simulated data. The vast majority of this data, coming from high-performance high-fidelity simulations, high-resolution sensors, or Internet-connected devices, arise from physical processes that, while complex and nonlinear, depend on only a few parameters. However, these low-dimension parameters are often hidden in the deluge of high-dimensional data, and are frequently impossible to discover, and thus reason about, by the existing methods.

This project will develop new efficient methods to help scientists and engineers, especially in manufacturing and robotics, to simplify complex data such that dynamic processes underlying the data can be better represented, understood and controlled. By leveraging the nation’s advanced cyberinfrastructure, these methods will accelerate the pace of materials design, reduce the cost and time-to-market of tailored devices, and aid the design, control, and operation of new complex robotic systems. The research outcomes of the project are closely integrated with the educational components, to train the next generation of scientists and engineers on these new technologies, resulting in a skilled and globally competent workforce, especially in the high-priority areas of Artificial Intelligence, Data Science, and Scientific Computing.

This project thus promotes the advancement of science, welfare and prosperity, as stated by NSF's mission.This multidisciplinary research project aims at developing scalable end-to-end non-linear dimensionality reduction-based solutions to accurately learn the dynamic behavior of complex systems. To this end the project introduces new parallel primitives and algorithmic innovations to enable deployment of non-linear spectral dimensionality reduction (NLSDR) and manifold learning methods on the next generation extreme-scale computing systems.

The project is based on the following key components: i) development of novel locality-aware data distribution and task scheduling strategies for individual NLSDR building blocks taking into account their inter-dependencies when executing in distributed memory environments such as Message Passing Interface and Map/Reduce clusters of multi-core processors, ii) design of new algorithmic strategies to manage data influx while maintaining crucial properties of the sub-manifold characterized by the data, and, iii) development of end-to-end solutions for two transformative example applications pertaining to advanced manufacturing and robotics.

Funding Source: National Science Foundation (NSF)