Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

projects

Data nowadays are produced at an unprecedented rate; Cheap sensors, existing and synthesized datasets, the emerging internet of things, and especially social media, make the collection of vast and complicated datasets a relatively easy task. With limited time and human power, the ability to effectively harness the power of big data is a problem encountered by many companies and organizations. The project tries to ease the data understanding process by compressing and evaluating the valuable information contained in the data. Specifically, <ul class='archive__item-excerpt'> <li>Proposed a topological collapse-based unsupervised method for document summarization. The method outperforms state-of-the-art methods on standard datasets composed of scientific papers. (Published in [SPAWC’16]) </li>

Proposed information-theoretic—based metrics to measure relative richness/readiness of text data to answer specific questions; validated the metrics through a text-based experiment using Twitter data. (Available on [ arXiv'17])

</ul>

Low-Precision DNN-based Recommendation Models

Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral. (Accepted in [MLSys@NeurIPS’19])

NLP-based Program Optimization and Synthesis

Wootz Framework
(Figure: A framework for automatic synthesis of HPC advising tools.)

Achieving high performance on computing systems is a complicated process. It requires a deep understanding of the underlying computing systems, the architectural properties, and proper implementations to take full advantage of the computing systems. In this project, we explore novel ideas to address problems in program optimization and synthesis by leveraging the recent progress in Natural Language Processing. <ul class='archive__item-excerpt'> <li> Proposed a Natural Language Understanding-driven programming framework that automatically synthesizes code based on inputs expressed in natural language using no training examples. (Accepted in [FSE’20]) </li>

Automatically synthesized advising tools for suggesting program optimization knowledge from HPC programming guides (CUDA, OpenCL, etc.); leveraged multiple NLP techniques including dependency parsing, semantic role labeling, TF-IDF and topic modeling. (Published in [SC'17)]

</ul>

Reuse-Centric Programming System Support of Machine Learning

(Figure: Reuse-centric optimization.)

As a critical link between software and computing hardware, programming system plays an essential role in ensuring the efficiency, scalability, security, and reliability of machine learning. This project examines the challenges in machine learning from the programming system perspective by developing simple yet effective reuse-centric approaches. Specifically, <ul class='archive__item-excerpt'> <li>Proposed a flexible ensemble DNN training framework for efficiently training a heterogeneous set of DNNs; achieved up to 1.97X speedups over the state-of-the-art framework that was designed for homogeneous DNN ensemble training. (Published in [MLSys’20]) </li>

Proposed in-place zero-space ECC assisted with a new training scheme, weight distribution-oriented training, to provide the first known zero space cost memory protection for CNNs. (Published in [ NeurIPS’19])

Developed a compiler-based framework that, for the first time, enables composability-based CNN pruning by generalizing Teacher-Student Network training for pre-training common convolutional layers; achieved up to 186X speedups. (Published in [ PLDI'19])

Accelerated CNN training by identifying and adaptively avoiding similar vector dot products during training on the fly; saved up to 69% CNN training time with no accuracy loss. (Published in [ICDE'19])

Improved the performance of DNN ensemble training by eliminating pipeline redundancies in preprocessing through data sharing; reduced CPU usage by 2-11X. (Published in [SC'18])

Accelerated K-Means configuration by promoting multi-level computation reuse across the explorations of different configurations; achieved 5-9X speedups. (Published in [ICDE'18])

Accelerated distance calculation-based machine learning algorithms (K-Means, KNN, etc.) by developing Triangle Inequality-based strength reduction; produced tens of times of speedups. (Published in [PLDI'17])

</ul>

publications

A Topological Collapse for Document Summarization

Published in IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC'16)., 2016

Download here

First Study on Data Readiness Level

Published in arXiv preprint arXiv:1702.02107 (Preprint), 2016

Download here

Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-Based Strength Reduction

Published in Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'17). (Acceptance rate: 15% (47/322)) , 2017

Download here

Egeria: a Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing

Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17). (Acceptance rate: 18% (61/327)), 2017

Download here

TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality

Published in SysML, Feb 16th, 2018 (Poster), 2018

Download here

Reuse-Centric K-Means Configuration

Published in 34th International Conference on Data Engineering (ICDE'18). (short paper) (Acceptance rate: 23%), 2018

Download here

Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines

Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18). (Acceptance rate: 23%) , 2018

Download here

Adaptive Deep Reuse: Accelerating CNN Training on the Fly

Published in 35th International Conference on Data Engineering (ICDE'19). (Acceptance rate: 18%), 2019

Download here

Wootz: a Compiler-Based Framework for Fast CNN Pruning via Composability

Published in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'19). (Acceptance rate: 27.7% (76/274)) , 2019

Download here

Post-Training 4-bit Quantization on Embedding Tables

Published in MLSys Workshop on Systems for ML @ NeurIPS, 2019 (Poster), 2019

Download here

In-Place Zero-Space Memory Protection for CNN

Published in Advances in Neural Information Processing Systems (NeurIPS'19). (Acceptance rate: 21.2% (1428/6743)) , 2019

Download here

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks

Published in 3rd Conference on Machine Learning and Systems (MLSys'20). (Acceptance rate: 20% (34/170)) , 2020

Download here

HISyn: Human Learning-Inspired Natural Language Programming

Published in The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'20). (acceptance rate: 101/360=28%) , 2020

Download here