Skip to feedback

Award Abstract # 1910539
OAC Core: Small: Scalable Non-linear Dimensionality Reduction Methods to Accelerate Scientific Discovery

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: THE RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK
Initial Amendment Date: April 22, 2019
Latest Amendment Date: September 16, 2021
Award Number: 1910539
Award Instrument: Standard Grant
Program Manager: Seung-Jong Park
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2019
End Date: April 30, 2023 (Estimated)
Total Intended Award Amount: $499,814.00
Total Awarded Amount to Date: $499,814.00
Funds Obligated to Date: FY 2019 = $499,814.00
History of Investigator:
  • Jaroslaw Zola (Principal Investigator)
    jzola@buffalo.edu
  • Olga Wodo (Co-Principal Investigator)
  • Nils Napp (Co-Principal Investigator)
  • Varun Chandola (Former Principal Investigator)
  • Jaroslaw Zola (Former Co-Principal Investigator)
Recipient Sponsored Research Office: SUNY at Buffalo
520 LEE ENTRANCE STE 211
AMHERST
NY  US  14228-2577
(716)645-2634
Sponsor Congressional District: 26
Primary Place of Performance: University at Buffalo
338 Davis Hall
Buffalo
NY  US  14260-2500
Primary Place of Performance
Congressional District:
26
Unique Entity Identifier (UEI): LMCJKRFW5R81
Parent UEI: GMZUKXFDJMA9
NSF Program(s): OAC-Advanced Cyberinfrast Core
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 9179
Program Element Code(s): 090Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The progress in science and engineering increasingly depends on our ability to analyze massive amounts of observed and simulated data. The vast majority of this data, coming from high-performance high-fidelity simulations, high-resolution sensors, or Internet connected devices, arise from physical processes that, while complex and nonlinear, depend on only few parameters. However, these low-dimension parameters are often hidden in the deluge of high-dimensional data, and are frequently impossible to discover, and thus reason about, by the existing methods. This project will develop new efficient methods to help scientists and engineers, especially in manufacturing and robotics, to simplify complex data such that dynamic processes underlying the data can be better represented, understood and controlled. By leveraging nation?s advanced cyberinfrastructure, these methods will accelerate pace of materials design, reduce the cost and time-to-market of tailored devices, and aid the design, control, and operation of new complex robotic systems. The research outcomes of the project are closely integrated with the educational components, to train the next generation of scientists and engineers on these new technologies, resulting in a skilled and globally competent workforce, especially in the high-priority areas of Artificial Intelligence, Data Science, and Scientific Computing. This project thus promotes advancement of science, welfare and prosperity, as stated by NSF's mission.

This multidisciplinary research project aims at developing scalable end-to-end non-linear dimensionality reduction based solutions to accurately learn the dynamic behavior of complex systems. To this end the project introduces new parallel primitives and algorithmic innovations to enable deployment of non-linear spectral dimensionality reduction (NLSDR) and manifold learning methods on the next generation extreme scale computing systems. The project is based on the following key components: i) development of novel locality-aware data distribution and task scheduling strategies for individual NLSDR building blocks taking into account their inter-dependencies when executing in distributed memory environments such as Message Passing Interface and Map/Reduce clusters of multi-core processors, ii) design of new algorithmic strategies to manage data influx while maintaining crucial properties of the sub-manifold characterized by the data, and, iii) development of end-to-end solutions for two transformative example applications pertaining to advanced manufacturing and robotics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Guggilam, Sreelekha and Chandola, Varun and Patra, Abani "Tracking clusters and anomalies in evolving data streams" Statistical Analysis and Data Mining: The ASA Data Science Journal , v.15 , 2022 https://doi.org/10.1002/sam.11552 Citation Details
Juneja, Namit and Zola, Jaroslaw and Chandola, Varun and Wodo, Olga "Graph-based Strategy for Establishing Morphology Similarity" International Conference on Scientific and Statistical Database Management (SSDBM) , 2021 https://doi.org/10.1145/3468791.3468819 Citation Details
Liu, Hao and Yucel, Berkay and Wheeler, Daniel and Ganapathysubramanian, Baskar and Kalidindi, Surya R. and Wodo, Olga "How important is microstructural feature selection for data-driven structure-property mapping?" MRS Communications , v.12 , 2022 https://doi.org/10.1557/s43579-021-00147-4 Citation Details
Mahdi Javanmard, Mohammad and Ahmad, Zafar and Zola, Jaroslaw and Pouchet, Louis-Noel and Chowdhury, Rezaul and Harrison, Robert "Efficient Execution of Dynamic Programming Algorithms on Apache Spark" EEE International Conference on Cluster Computing (CLUSTER) , 2020 https://doi.org/10.1109/CLUSTER49012.2020.00044 Citation Details
Roy, Mriganka and Wodo, Olga "Feature Engineering for Surrogate Models of Consolidation Degree in Additive Manufacturing" Materials , v.14 , 2021 https://doi.org/10.3390/ma14092239 Citation Details
Schoeneman, Frank and Chandola, Varun and Napp, Nils and Wodo, Olga and Zola, Jaroslaw "Learning Manifolds from Dynamic Process Data" Algorithms , v.13 , 2020 10.3390/a13020030 Citation Details
Schoeneman, Frank and Zola, Jaroslaw "Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache Spark" Proceedings of the 48th International Conference on Parallel Processing , 2019 10.1145/3337821.3337852 Citation Details
Zaidi, Syed Mohammed Arshad and Chandola, Varun and Ibrahim, Muhanned and Romanski, Bianca and Mastrandrea, Lucy D. and Singh, Tarunraj "Multi-step ahead predictive model for blood glucose concentrations of type-1 diabetic patients" Scientific Reports , v.11 , 2021 https://doi.org/10.1038/s41598-021-03341-5 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Many problems in science and engineering depend on our ability to efficiently analyze high-dimensional data. In this project, we developed new algorithmic and computational methods that allow us to reduce dimensionality of the data such that its key properties are preserved and yet it can be visualized and effectively utilized. This work has been driven by the emerging applications in multiple domains, including the design of new materials for solar cells, simulations of additive manufacturing, e.g., 3D-printing, and better tracking of biomedical markers in medical applications. To address the underlying computational challenges, we proposed efficient algorithms to solve All-Pairs Shortest Path (APSP) problem and to perform Bayesian optimization in parallel, and taking into account the actual cost of finding an optimal solution. These algorithms are designed to run on large parallel computers found in high performance computing and data centers, and can be used in other domains that deal with complex graphs or challenging optimization problems.

Collectively, our research findings contribute new mathematical knowledge required to perform dimensionality reduction, including methods for assessing errors in reduced data and methods to use graphs to represent similarity between materials, as well as algorithmic strategies to perform reduction efficiently, including parallel algorithms for APSP. These research findings have been disseminated for a broader use via 10 publications in the leading peer-reviewed scientific conferences and journals.

The project also created ample opportunities to train future generartion of data analytics professionals. In total, five graduate students participated in the project, and got introduced to interdisciplinary research spanning computer science, data analytics and materials science. Four of the students graduated or will graduate with a PhD degree and one graduated with MSc degree.


Last Modified: 07/21/2023
Modified by: Jaroslaw S Zola

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page