Reuse-Centric Programming System Support of Machine Learning

09/2016 - present

Reuse-Centric Optimization
(Figure: Reuse-centric optimization.)

As a critical link between software and computing hardware, programming system plays an essential role in ensuring the efficiency, scalability, security, and reliability of machine learning. This project examines the challenges in machine learning from the programming system perspective by developing simple yet effective reuse-centric approaches. Specifically,

  • Proposed a flexible ensemble DNN training framework for efficiently training a heterogeneous set of DNNs; achieved up to 1.97X speedups over the state-of-the-art framework that was designed for homogeneous DNN ensemble training. (Published in [MLSys'20])
  • Proposed in-place zero-space ECC assisted with a new training scheme, weight distribution-oriented training, to provide the first known zero space cost memory protection for CNNs. (Published in [ NeurIPS’19])
  • Developed a compiler-based framework that, for the first time, enables composability-based CNN pruning by generalizing Teacher-Student Network training for pre-training common convolutional layers; achieved up to 186X speedups. (Published in [ PLDI'19])
  • Accelerated CNN training by identifying and adaptively avoiding similar vector dot products during training on the fly; saved up to 69% CNN training time with no accuracy loss. (Published in [ICDE'19])
  • Improved the performance of DNN ensemble training by eliminating pipeline redundancies in preprocessing through data sharing; reduced CPU usage by 2-11X. (Published in [SC'18])
  • Accelerated K-Means configuration by promoting multi-level computation reuse across the explorations of different configurations; achieved 5-9X speedups. (Published in [ICDE'18])
  • Accelerated distance calculation-based machine learning algorithms (K-Means, KNN, etc.) by developing Triangle Inequality-based strength reduction; produced tens of times of speedups. (Published in [PLDI'17])

NLP-based Program Optimization and Synthesis

09/2016 - present
Wootz Framework
(Figure: A framework for automatic synthesis of HPC advising tools.)

Achieving high performance on computing systems is a complicated process. It requires a deep understanding of the underlying computing systems, the architectural properties, and proper implementations to take full advantage of the computing systems. In this project, we explore novel ideas to address problems in program optimization and synthesis by leveraging the recent progress in Natural Language Processing.

  • Proposed a Natural Language Understanding-driven programming framework that automatically synthesizes code based on inputs expressed in natural language using no training examples. (Accepted in [FSE'20])
  • Automatically synthesized advising tools for suggesting program optimization knowledge from HPC programming guides (CUDA, OpenCL, etc.); leveraged multiple NLP techniques including dependency parsing, semantic role labeling, TF-IDF and topic modeling. (Published in [SC'17)]

Low-Precision DNN-based Recommendation Models

Summer 2019: Research Intern, Facebook.
Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral. (Accepted in [MLSys@NeurIPS'19])

Data Readiness Level

01/2014-09/2016: Research Assistant, NCSU.
Data nowadays are produced at an unprecedented rate; Cheap sensors, existing and synthesized datasets, the emerging internet of things, and especially social media, make the collection of vast and complicated datasets a relatively easy task. With limited time and human power, the ability to effectively harness the power of big data is a problem encountered by many companies and organizations. The project tries to ease the data understanding process by compressing and evaluating the valuable information contained in the data. Specifically,

  • Proposed a topological collapse-based unsupervised method for document summarization. The method outperforms state-of-the-art methods on standard datasets composed of scientific papers. (Published in [SPAWC'16])
  • Proposed information-theoretic—based metrics to measure relative richness/readiness of text data to answer specific questions; validated the metrics through a text-based experiment using Twitter data. (Available on [ arXiv'17])