About me

I am an Assistant Professor in the College of Information and Computer Sciences (CICS) at the University of Massachusetts Amherst, the flagship campus of the UMass system. I received my Ph.D. in Electrical Engineering from North Carolina State University in 2020. I am a member of the Programming Language and Systems at Massachusetts (PLASMA) lab at UMass. My research lies in machine learning systems, with an emphasis on improving the speed, scalability, and reliability of Machine Learning.

My research aims to address a fundamental question in Machine Learning adoption: How can we create machine learning systems that efficiently deliver reliable predictions to meet the requirements of diverse applications running on various systems? To tackle this, my group focuses on reducing the costs of model development and enabling deep learning in resource-constrained edge environments. Our approaches leverage insights from the inherent trade-offs in accuracy and efficiency of machine learning, along with principles of system design such as composability, pipelining, and locality awareness. Ultimately, our aim is to democratize machine learning by making it a readily accessible technology applicable to a wide array of real-world scenarios.

I am currently on leave from UMass and working at Amazon AWS on developing efficient and reliable LLM systems for safe and intelligent cloud operations. Please ping me at huiguan@amazon.com if you are interested in discussing the new exciting area or exploring research intern opportunities.

News

[Apr. 2025]: Congratulations to Sandeep for our work on Scaling Graph Neural Network Training on Large Graphs accepted to MLSys’25. The work proposes split parallelism that distributes mini-batch GNN training workloads across multi-GPUs to reduce redundant data movement and computation and thus accelerate training.
[Feb. 2025]: Congratulations to Sohaib for successfully defending his PhD Thesis on “Optimized Resource Allocation for Serving Deep Learning Models”. 🎉🎉🎉
[Feb. 2025]: Congratulations to Sohaib and Qizheng for our work “DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling” accepted to MLSys’25. We use Diffusion Models as a case study to demonstrate the potential of model cascading in improving model serving system efficiency (e.g, higher throughput, lower SLO violations, etc) 🔥🔥🔥
[Oct. 2024]: Congratulations to Lijun Zhang for our work “Attack-Resilient Image Watermarking Using Stable Diffusion” accepted to NeurIPS’24.
[Oct. 2024]: Congratulations to Kunjal Panchal for our work “Thinking Forward: Memory-Efficient Federated Finetuning of Language Models” accepted to NeurIPS’24.
[Mar. 2024]: Congratulations to Mohammad for our work “CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT Inference” accepted to MobiSys’24.
[Mar. 2024]: Congratulations to Sohaib for our work “Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling” accepted to HPDC’24.
[Mar. 2024]: Thanks for the gift funds from Adobe and Dobly.
[Feb. 2024]: Thanks for the support from the NSF for the CAREER award on Adaptive Deep Learning Systems Towards Edge Intelligence.
[Feb. 2024]: Congratulations to Qizheng Yang for our work “GMorph: Accelerating Multi-DNN Inference via Model Fusion” accepted to EuroSys’24.
[Sept. 2023]: Congratulations to Kunjal Panchal for our work on “Flow: Per-instance Personalized Federated Learning” accepted to NeurIPS’23.
[Sept. 2023]: Congratulations to Lijun Zhang for the 2023 IBM PhD Fellowship Award.
[Sept. 2023]: Congratulations to Sohaib Ahmad for our work on “Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling” accepted to ASPLOS’24.
[Sept. 2023]: Thanks for the support of NSF to our project Memory-Driven Full-Stack Collaboration for Embedded Systems. With collaborators, we will bring the power of deep learning to resource-constrained embedded systems!
[Aug. 2023]: Thanks for the support of NSF to our project Deep Learning on Anomaly Detection for Human Dynamics and Hazard Response. With collaborators, we will work on graph machine learning for anomaly detection.
[Aug. 2023]: Congratulations to Juelin and Sandeep for their work on “Accelerating Subgraph Enumeration Using Auxiliary Graphs” accepted to PACT’23.
[May. 2023]: Our work on “Flash: Concept Drift Adaptation in Federated Learning” is accepted to ICML’23. It proposes a novel adaptive optimizer that simultanuously addresses both data heterogeneity and the concept drift issues in federated learning.
[May. 2023]: Our work on “Automatically marginalized MCMC in probabilistic programming” is accepted to ICML’23. It proposes automatic marginalization to make sampling process using Hamiltonian Monte Carlo more efficient.
[May. 2023]: Our work on “NUMAlloc: A Faster NUMA Memory Allocator” is accepted to ISMM’23.
[May. 2023]: Our work on “GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism” is on Arxiv.
[Apr. 2023]: Our work on Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios is accepted to MobiCom’23.
[Oct. 2022]: I’m excited to share that we have received an Amazon Research Award for our proposal “Groot: A GPU-Resident System for Efficient Graph Machine Learning” at UMass Amherst. Learn more about the program on the website.
[Sept. 2022]: Our work on AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning is accepted to NeurIPS’22. Congratulations to Lijun. The project is open-sourced
[Sept. 2022]: Thanks for the support of NSF to our project Transparently Scaling Graph Neural Network Training to Large-Scale Models and Graphs.
[Jul. 2022]: Our work on Fine-Grained Personalized Federated Learning Through Dynamic Routing is accepted to CrossFL’2022 Workshop @MLSys. Congratulations to Kunjal.
[Jul. 2022]: Our work on Improving Subgraph Representation Learning via Multi-View Augmentation is accepted to AI4Science’22 Workshop @ICML.
[May. 2022]: Our work “A Tree-Structured Multi-Task Model Recommender” is accepted to AutoML’22. Congratulations to Lijun. The project is open-sourced.
[May. 2022]: Welcome a new PhD student Qizheng Yang to join our lab this summer.
[Mar. 2022]: Thanks for the support of NVIDIA Academic Hardware Grant Program to the project “Multitasking-Centric Optimization for Deep Learning Applications”.
[Mar. 2022]: Our paper “Rethinking Hard-Parameter Sharing in Multi-Domain Learning” is accepted to ICME’22. Congratulations to Lijun.
[Mar. 2022]: Our paper “Enabling Near Real-Time NLU-Driven Natural Language Programming through Dynamic Grammar Graph-Based Translation” is accepted to CGO’22.
[Mar. 2022]: Our paper “COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression” is accepted to VLDB’22.
[Nov. 2021]: Our collaborative project with Prof. Zhou Lin on “Accelerating Fragment-Based Quantum Chemistry via Machine Learning” received UMass ADVANCE Collaborative Research Seed Grant.
[Oct. 2021]: Our paper “FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks” is accepted to MCHPC’21 Workshop, in conjunction with SC’21.
[Oct. 2021]: Our paper “Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone” is accepted to ICDM’21.
[June 2021]: Our paper “Scalable Graph Neural Network Training: The Case for Sampling” has appeared in the ACM SIGOPS Operating Systems Review.
[June 2021]: Our paper CoCoPIE is accepted to CACM’21.
[June 2021]: Our paper NumaPerf is accepted to ICS’21.
[May 2021]: I have received an Adobe Research Collaboration Grant on developing resource-efficient deep multi-task learning solutions.
[May 2021]: Our paper “Reuse-Centric Kmeans Configuration” is accepted to Information Systems. Congratulations to Lijun.

Awards

NSF CAREER Award, 2024
Amazon Research Award, 2022
NCSU Electrical and Computer Engineering Outstanding Dissertation Award, 2020
IBM PhD Fellowship, 2015-2018

Hui Guan

News

Awards