
I am an Assistant Professor at the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst. I received my Ph.D. from North Carolina State University in 2020. My research sits at the intersection of systems and machine learning. My work focuses on designing algorithms and building systems and programming abstractions that reduce the cost of developing, training, and deploying machine learning models.
Modern AI systems are increasingly powerful but also increasingly expensive to build and operate. My lab aims to make AI more efficient, scalable, and reliable across the entire ML lifecycle — from training to serving to edge deployment. The long-term vision is to make AI capabilities more accessible and economically sustainable.
Currently on leave at AWS, working on agentic AI systems for cloud automation and code transformation. Reach me at huiguan@amazon.com if you’re interested in this space or research internships.
Research
Learning Algorithms and Systems
Co-designing across algorithms and systems to accelerate training and reduce memory and compute costs.
View publications →Model Serving and Inference
Exploiting workload contexts to serve ML efficiently without over-provisioning hardware.
View publications →Edge and On-Device Machine Learning
Building adaptive deep learning systems to enable edge and IoT deployments.
View publications →Agentic Systems
Using LLM-powered agents to automate coding, cloud infrastructure management, and operational tasks.
View publications →News
[Jan. 2026]: Congratulations to Mohammad and Jin for our work “WildFiT: Autonomous In-situ Model Adaptation for Resource-Constrained IoT Systems” accepted to SenSys’26.
[Jan. 2026]: Congratulations to Hanmei for our work “ProTrain: Efficient LLM Training via Automatic Memory Management” accepted to MLSys’26.
[Jan. 2026]: Congratulations to Kunjal for our work “Atom: Efficient On-Device Video-Language Pipelines Through Modular Reuse” accepted to MMSys’26.
[Jan. 2026]: Congratulations to Kylie Lin for our work “A Four-Stage Framework of Visual Complexity and Trust as Mediated by Effort” accepted to PacificVis’26.
[Nov. 2025]: Thanks for the support from the NSF for the DESC project on Repurposing Batteryless SmartPhones as Long-lived and Adaptable Sensors for Sustainable and Scalable Environmental Monitoring.
[Nov. 2025]: Thanks for the support from the NSF for the ACED project on Revolutionizing Instrumental Analysis Using Foundation Models.
[Oct. 2025]: Congratulations to Hanmei for our work “An Empirical Study of Microscaling Formats for Low-Precision LLM Training” accepted to ARITH’25.
[Oct. 2025]: Congratulations to Saurabh for our work “Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch” accepted to VLDB’25.
[May. 2025]: 🎉🎉🎉Congratulations to Lijun Zhang, Hanmei Yang, and Jin Zhou for successfully defending their PhD Thesis. Lijun will join Amazon as Postdoc Scientist; Hanmei will join Meta and Jin will join NVIDIA. Wish them a wonderful new jouney.
[Apr. 2025]: ⭐Congratulations to Sandeep for our work on Scaling Graph Neural Network Training on Large Graphs accepted to MLSys’25. The work proposes split parallelism that distributes mini-batch GNN training workloads across multi-GPUs to reduce redundant data movement and computation and thus accelerate training.
[Feb. 2025]: 🎉Congratulations to Sohaib for successfully defending his PhD Thesis on “Optimized Resource Allocation for Serving Deep Learning Models”.
[Feb. 2025]: 🔥Congratulations to Sohaib and Qizheng for our work “DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling” accepted to MLSys’25. We use Diffusion Models as a case study to demonstrate the potential of model cascading in improving model serving system efficiency (e.g, higher throughput, lower SLO violations, etc)
[Oct. 2024]: Congratulations to Lijun Zhang for our work “Attack-Resilient Image Watermarking Using Stable Diffusion” accepted to NeurIPS’24.
[Oct. 2024]: Congratulations to Kunjal Panchal for our work “Thinking Forward: Memory-Efficient Federated Finetuning of Language Models” accepted to NeurIPS’24.
[Mar. 2024]: Congratulations to Mohammad for our work “CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT Inference” accepted to MobiSys’24.
[Mar. 2024]: Congratulations to Sohaib for our work “Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling” accepted to HPDC’24.
[Mar. 2024]: Thanks for the gift funds from Adobe and Dobly.
[Feb. 2024]: Thanks for the support from the NSF for the CAREER award on Adaptive Deep Learning Systems Towards Edge Intelligence.
[Feb. 2024]: Congratulations to Qizheng Yang for our work “GMorph: Accelerating Multi-DNN Inference via Model Fusion” accepted to EuroSys’24.
[Sept. 2023]: Congratulations to Kunjal Panchal for our work on “Flow: Per-instance Personalized Federated Learning” accepted to NeurIPS’23.
[Sept. 2023]: Congratulations to Lijun Zhang for the 2023 IBM PhD Fellowship Award.
[Sept. 2023]: Congratulations to Sohaib Ahmad for our work on “Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling” accepted to ASPLOS’24.
[Sept. 2023]: Thanks for the support of NSF to our project Memory-Driven Full-Stack Collaboration for Embedded Systems. With collaborators, we will bring the power of deep learning to resource-constrained embedded systems!
[Aug. 2023]: Thanks for the support of NSF to our project Deep Learning on Anomaly Detection for Human Dynamics and Hazard Response. With collaborators, we will work on graph machine learning for anomaly detection.
[Aug. 2023]: Congratulations to Juelin and Sandeep for their work on “Accelerating Subgraph Enumeration Using Auxiliary Graphs” accepted to PACT’23.
[May. 2023]: Our work on “Flash: Concept Drift Adaptation in Federated Learning” is accepted to ICML’23. It proposes a novel adaptive optimizer that simultanuously addresses both data heterogeneity and the concept drift issues in federated learning.
[May. 2023]: Our work on “Automatically marginalized MCMC in probabilistic programming” is accepted to ICML’23. It proposes automatic marginalization to make sampling process using Hamiltonian Monte Carlo more efficient.
[May. 2023]: Our work on “NUMAlloc: A Faster NUMA Memory Allocator” is accepted to ISMM’23.
[May. 2023]: Our work on “GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism” is on Arxiv.
[Apr. 2023]: Our work on Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios is accepted to MobiCom’23.
[Oct. 2022]: I’m excited to share that we have received an Amazon Research Award for our proposal “Groot: A GPU-Resident System for Efficient Graph Machine Learning” at UMass Amherst. Learn more about the program on the website.
[Sept. 2022]: Our work on AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning is accepted to NeurIPS’22. Congratulations to Lijun. The project is open-sourced
[Sept. 2022]: Thanks for the support of NSF to our project Transparently Scaling Graph Neural Network Training to Large-Scale Models and Graphs.
[Jul. 2022]: Our work on Fine-Grained Personalized Federated Learning Through Dynamic Routing is accepted to CrossFL’2022 Workshop @MLSys. Congratulations to Kunjal.
[Jul. 2022]: Our work on Improving Subgraph Representation Learning via Multi-View Augmentation is accepted to AI4Science’22 Workshop @ICML.
[May. 2022]: Our work “A Tree-Structured Multi-Task Model Recommender” is accepted to AutoML’22. Congratulations to Lijun. The project is open-sourced.
[May. 2022]: Welcome a new PhD student Qizheng Yang to join our lab this summer.
[Mar. 2022]: Thanks for the support of NVIDIA Academic Hardware Grant Program to the project “Multitasking-Centric Optimization for Deep Learning Applications”.
[Mar. 2022]: Our paper “Rethinking Hard-Parameter Sharing in Multi-Domain Learning” is accepted to ICME’22. Congratulations to Lijun.
[Mar. 2022]: Our paper “Enabling Near Real-Time NLU-Driven Natural Language Programming through Dynamic Grammar Graph-Based Translation” is accepted to CGO’22.
[Mar. 2022]: Our paper “COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression” is accepted to VLDB’22.
[Nov. 2021]: Our collaborative project with Prof. Zhou Lin on “Accelerating Fragment-Based Quantum Chemistry via Machine Learning” received UMass ADVANCE Collaborative Research Seed Grant.
[Oct. 2021]: Our paper “FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks” is accepted to MCHPC’21 Workshop, in conjunction with SC’21.
[Oct. 2021]: Our paper “Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone” is accepted to ICDM’21.
[June 2021]: Our paper “Scalable Graph Neural Network Training: The Case for Sampling” has appeared in the ACM SIGOPS Operating Systems Review.
[June 2021]: Our paper CoCoPIE is accepted to CACM’21.
[June 2021]: Our paper NumaPerf is accepted to ICS’21.
[May 2021]: I have received an Adobe Research Collaboration Grant on developing resource-efficient deep multi-task learning solutions.
[May 2021]: Our paper “Reuse-Centric Kmeans Configuration” is accepted to Information Systems. Congratulations to Lijun.
Awards
- NSF CAREER Award, 2024
- Amazon Research Award, 2022
- NCSU Electrical and Computer Engineering Outstanding Dissertation Award, 2020
- IBM PhD Fellowship, 2015-2018