Research

Deep Learning

» Scalable build pipeline for Generative AI

At Samsung Research America, I worked on the development of NEONs — virtual humans, which are powered by generative AI and state-of-the-art language models. Here, I led a sub-team on creating a scalable, automated, and reproducible pipeline for delivering deployment-ready, GPU-enabled models.

Starting with captured data, this data pipeline entails feature extraction for multimodal data (video, audio, and text), distributed training of PyTorch models, conversion to TensorRT, and packaging relevant models for deployment and delivery. Of special focus was automated versioning and tracking of all input, output, and intermediate data assets.

» Deep learning and Computer Vision for Cancer Research

At LLNL, As part of the DOE-NCI partnership, I worked on developing AI solutions for facilitating autonomous, multiscale simulations in the context of Cancer research.

Building and utilizing computer vision approaches for multimodal data (multichannel images and spatial trajectories), I developed unsupervised and semi-supervised techniques to identify similarities between given data samples. Upon deployment, these approaches are used for on-demand recommendations based on the real-time analysis of streaming data.

These AI models form the basis of the “novelty sampling” framework that facilitates the first-of-its-kind multiscale simulations focusing on interactions of RAS-RAF (certain cancer-inducing proteins) with human cell membranes, and have been deployed on some of the largest supercomputers on the planet.

See news coverage of this award-winning work: HPCwire, HPCwire, DEIXIS.

For details, please see

The Nature Machine Intelligence paper detailing the sampling mechanism that uses a variational autoencoder.
The Machine Learning: Science & Technology paper that describes the metric learning approach used for the second generation of this work.

Also see an opinion article on the topic The Confluence of Machine Learning and Multiscale Simulations.

» Deep Learning for Plasma Physics

In search of faster solutions to complex scientific problems, data-driven surrogate models are an attractive alternative. This work shows the value of variational autoencoders to capture intricate relationships between the nontrivial topology of a Tokamak reactor and the various relevant observables, allowing accurate predictions to complement expensive simulations.

High-performance Computing

» Large-scale, AI-driven scientific workflows

With increasing reliance on heterogenous architectures in HPC, large monolithic applications are being replaced by modular simulation codes that are customized to specific device types. Orchestrating such applications on large supercomputers requires addressing challenges of scalability, portability, fault-tolerance, and ease-of-use. In this work, we developed an arbitrarily-scalable, AI-driven workflow for multiscale simulations in computational biology and demonstrated full scaling to two of the world’s largest supercomputers — Sierra and Summit, utilizing almost 130,000 GPUs simultaneously.

For details, see

The introductory paper on MuMMI, which was adjudged the Best Paper at Supercomputing 2019 conference.
The follow-up MuMMI paper that highlights several new innovations and shows unprecedented scaling.

» Performance profiling of large-scale applications

To ensure judicious use of dollar investment in HPC hardwares and softwares, it is imperative to profile application codes towards continual improvement in performance. With my team, I have developed several tools and techniques for visualizing (and sometimes capturing) profiled data to facilitate performance improvements, focusing on software traces, network interconnects, GPU power consumption, and CPU–GPU data movements.

For details, see

Scientific Visualization

» Adaptive data representations

I have developed and contributed to several tools and techniques aimed broadly at reducing data footprint, both in-memory and on-disk. These techniques combine the two traditional axes of data reduction — reducing data precision (e.g., compression) and reducing data resolution (e.g., spatial hierarchies). See TVCG 2018, TVCG 2020, LDAV 2021, TVCG 2022, and TVCG 2023 papers.

» Topological analysis of flow fields

I have developed and contributed to several tools and techniques for analyzing and visualizing vector field data. Flow fields are notoriously difficult to analyze both at the fundamental level (due to presence of turbulence and dependence on reference frames) as well as from a computational perspective (sensitive to numerical errors). Broadly, my work focused on

Developing a new variant of the well-known Helmholtz-Hodge decomposition and utilizing it for flow analysis with respect to Galilean frames of reference. See TVCG 2013, CGF 2014, and TVCG 2014, and TVCG 2020 papers.
Developing and utilizing techniques for robust detection of singularities in flow fields. See TopoInVis 2013, EuroVis 2013, TVCG 2016, and TopoInVis 2020 papers.
Developing a new data structure, “Edge maps”, that can replace traditional approach for trajectory calculation (explicit integration) with mapping from simplex boundaries in a given domain mesh. See Pacific Vis 2011, TopoInVis 2012, TVCG 2012, and EuroVis 2012 papers.

Scientific Applications

I have developed and contributed to several tools and techniques aimed broadly at exploring scientific data, focusing on different applications. See the list below.

Proceedings of National Academy of Sciences (PNAS): 2022.
Journal of Chemical Theory & Computation: 2022, 2019.
Biophysics Journal: 2022, 2017.
The Journal of Physical Chemistry: 2020, 2017.
The Journal of Computational Chemistry: 2018.
Computer Graphics Forum: 2021.
IEEE Pacific Visualization Symposium: 2016.