Posts by Collection

projects

Data Readiness Level

Data nowadays are produced at an unprecedented rate; Cheap sensors, existing and synthesized datasets, the emerging internet of things, and especially social media, make the collection of vast and complicated datasets a relatively easy task. With limited time and human power, the ability to effectively harness the power of big data is a problem encountered by many companies and organizations. The project tries to ease the data understanding process by compressing and evaluating the valuable information contained in the data. Specifically, <ul class='archive__item-excerpt'> <li>Proposed a topological collapse-based unsupervised method for document summarization. The method outperforms state-of-the-art methods on standard datasets composed of scientific papers. (Published in [SPAWC’16]) </li>

Proposed information-theoretic—based metrics to measure relative richness/readiness of text data to answer specific questions; validated the metrics through a text-based experiment using Twitter data. (Available on [ arXiv'17])

</ul>

Low-Precision DNN-based Recommendation Models

Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral. (Accepted in [MLSys@NeurIPS’19])

NLP-based Program Optimization and Synthesis

Wootz Framework
(Figure: A framework for automatic synthesis of HPC advising tools.)

Achieving high performance on computing systems is a complicated process. It requires a deep understanding of the underlying computing systems, the architectural properties, and proper implementations to take full advantage of the computing systems. In this project, we explore novel ideas to address problems in program optimization and synthesis by leveraging the recent progress in Natural Language Processing. <ul class='archive__item-excerpt'> <li> Proposed a Natural Language Understanding-driven programming framework that automatically synthesizes code based on inputs expressed in natural language using no training examples. (Accepted in [FSE’20]) </li>

Automatically synthesized advising tools for suggesting program optimization knowledge from HPC programming guides (CUDA, OpenCL, etc.); leveraged multiple NLP techniques including dependency parsing, semantic role labeling, TF-IDF and topic modeling. (Published in [SC'17)]

</ul>

Reuse-Centric Programming System Support of Machine Learning

(Figure: Reuse-centric optimization.)

As a critical link between software and computing hardware, programming system plays an essential role in ensuring the efficiency, scalability, security, and reliability of machine learning. This project examines the challenges in machine learning from the programming system perspective by developing simple yet effective reuse-centric approaches. Specifically, <ul class='archive__item-excerpt'> <li>Proposed a flexible ensemble DNN training framework for efficiently training a heterogeneous set of DNNs; achieved up to 1.97X speedups over the state-of-the-art framework that was designed for homogeneous DNN ensemble training. (Published in [MLSys’20]) </li>

Proposed in-place zero-space ECC assisted with a new training scheme, weight distribution-oriented training, to provide the first known zero space cost memory protection for CNNs. (Published in [ NeurIPS’19])

Developed a compiler-based framework that, for the first time, enables composability-based CNN pruning by generalizing Teacher-Student Network training for pre-training common convolutional layers; achieved up to 186X speedups. (Published in [ PLDI'19])

Accelerated CNN training by identifying and adaptively avoiding similar vector dot products during training on the fly; saved up to 69% CNN training time with no accuracy loss. (Published in [ICDE'19])

Improved the performance of DNN ensemble training by eliminating pipeline redundancies in preprocessing through data sharing; reduced CPU usage by 2-11X. (Published in [SC'18])

Accelerated K-Means configuration by promoting multi-level computation reuse across the explorations of different configurations; achieved 5-9X speedups. (Published in [ICDE'18])

Accelerated distance calculation-based machine learning algorithms (K-Means, KNN, etc.) by developing Triangle Inequality-based strength reduction; produced tens of times of speedups. (Published in [PLDI'17])

</ul>

publications

A Topological Collapse for Document Summarization

Published in IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC'16)., 2016

Download here

First Study on Data Readiness Level

Published in arXiv preprint arXiv:1702.02107 (Preprint), 2016

Download here

Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-Based Strength Reduction

Published in Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'17). (Acceptance rate: 15% (47/322)) , 2017

Download here

Egeria: a Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing

Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17). (Acceptance rate: 18% (61/327)), 2017

Download here

TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality

Published in SysML, Feb 16th, 2018 (Poster), 2018

Download here

Reuse-Centric K-Means Configuration

Published in 34th International Conference on Data Engineering (ICDE'18). (short paper) (Acceptance rate: 23%), 2018

Download here

Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines

Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18). (Acceptance rate: 23%) , 2018

Download here

Adaptive Deep Reuse: Accelerating CNN Training on the Fly

Published in 35th International Conference on Data Engineering (ICDE'19). (Acceptance rate: 18%), 2019

Download here

Wootz: a Compiler-Based Framework for Fast CNN Pruning via Composability

Published in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'19). (Acceptance rate: 27.7% (76/274)) , 2019

Download here

Post-Training 4-bit Quantization on Embedding Tables

Published in MLSys Workshop on Systems for ML @ NeurIPS, 2019 (Poster), 2019

Download here

In-Place Zero-Space Memory Protection for CNN

Published in Advances in Neural Information Processing Systems (NeurIPS'19). (Acceptance rate: 21.2% (1428/6743)) , 2019

Download here

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks

Published in 3rd Conference on Machine Learning and Systems (MLSys'20). (Acceptance rate: 20% (34/170)) , 2020

Download here

HISyn: Human Learning-Inspired Natural Language Programming

Published in The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'20). (acceptance rate: 101/360=28%) , 2020

Download here

An Automatic Synthesizer of Advising Tools for High Performance Computing

Published in IEEE Transactions on Parallel and Distributed Systems (TPDS 2020). , 2020

Download here

Deep NLP-Based Co-Evolvement for Synthesizing Code Analysis from Natural Language

Published in The ACM SIGPLAN 2021 International Conference on Compiler Construction (CC'21), 2021

Download here

Reuse-Centric K-Means Configuration

Published in Information Systems, 2021, 2021

Download here

CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design

Published in Communications of the ACM, 2021, 2021

Download here

NumaPerf: Predictive NUMA Profiling

Published in Proceedings of International Conference on Supercomputing (ICS'21), 2021

Hui Guan

Posts by Collection

projects

Data Readiness Level

Low-Precision DNN-based Recommendation Models

NLP-based Program Optimization and Synthesis

Reuse-Centric Programming System Support of Machine Learning

publications

A Topological Collapse for Document Summarization

First Study on Data Readiness Level

Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-Based Strength Reduction

Egeria: a Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing

TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality

Reuse-Centric K-Means Configuration

Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines

Adaptive Deep Reuse: Accelerating CNN Training on the Fly

Wootz: a Compiler-Based Framework for Fast CNN Pruning via Composability

Post-Training 4-bit Quantization on Embedding Tables

In-Place Zero-Space Memory Protection for CNN

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks

HISyn: Human Learning-Inspired Natural Language Programming

An Automatic Synthesizer of Advising Tools for High Performance Computing

Deep NLP-Based Co-Evolvement for Synthesizing Code Analysis from Natural Language

Reuse-Centric K-Means Configuration

CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design

NumaPerf: Predictive NUMA Profiling

Scalable Graph Neural Network Training: The Case for Sampling

talks

teaching

COMPSCI 692S: SEMINAR - SYSTEMS FOR MACHINE LEARNING, MACHINE LEARNING FOR SYSTEMS

COMPSCI 532: Systems for Data Science (Spring 2021) Permalink

COMPSCI 532: Systems for Data Science (Fall 2021) Permalink