Wencong Xiao

I am Wencong Xiao (肖文聪), an AI system developer/researcher in PAI team of Alibaba Group. My work mostly focuses on building a highly efficient deep learning infrastructure for Alibaba. Previously, I spent 5+ years wonderful time in the system research group of Microsoft Research, pursuing my Ph.D. My supervisors are Lidong Zhou in Microsoft Research and Prof. Wei Li in Beihang University.

I am always in the job market :)Curriculum vitae.

My research interests widely spread in computer system related areas, including both traditional topics for operating system, and modern directions with heterogeneous hardwares and new applications. I am excited about building high-performance systems that fully extract hardware capability, while I also enjoy the innovation of summarizing the common pattern of new workloads, and therefore to apply it in general system design. Recent days I focus on providing better system support for large-scale artificial intelligent applications.

Research interests

  • Deep Learning System
  • Large-scale cluster resource management
  • Distributed graph computing

Publications

  • AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
    Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, Yangqing Jia
    The 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’20)
    [To appear]

  • Distributed Graph Computation Meets Machine Learning
    Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, Ming Wu, Wei Li, Lidong Zhou
    IEEE Transactions on Parallel & Distributed Systems (TPDS)
    [pdf]

  • An Empirical Study on Program Failures of Deep Learning Jobs
    Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang
    The 42nd International Conference on Software Engineering (ICSE 2020, Distinguished Paper Award!)
    [pdf]

  • PRmalloc: Leveraging Predictability for Deep Learning Memory Allocation
    Wencong Xiao, Shiru Ren, Tongxuan Liu, Yong Li
    Workshop on AI Systems at SOSP 2019
    [pdf][poster]

  • AliGraph: An Industrial Graph Neural Network Platform
    Kun Zhao, Wencong Xiao, Baole Ai, Wenting Shen, Xiaolin Zhang, Yong Li, Wei Lin
    Workshop on AI Systems at SOSP 2019
    [pdf][poster]

  • Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
    Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang
    2019 USENIX Annual Technical Conference (ATC ’19)
    [pdf][slides][trace]

  • SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization
    Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’19)
    [pdf]

  • Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity
    Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxing Liu, Ming Wu, Lintao Zhang
    27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’19)
    [pdf][slides]

  • Balanced Sparsity for Efficient DNN Inference on GPU
    Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie
    33rd AAAI Conference on Artificial Intelligence (AAAI ’19)
    [pdf][poster]

  • Scheduling CPU for GPU-based Deep Learning Jobs
    Wencong Xiao, Zhenhua Han, Hanyu Zhao, Xuan Peng, Quanlu Zhang, Fan Yang, Lidong Zhou
    ACM Symposium on Cloud Computing 2018 (SoCC ’18 poster)
    [pdf][poster]

  • Gandiva: Introspective Cluster Scheduling for Deep Learning
    Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, Lidong Zhou
    The 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18)
    [pdf][slides][poster]

  • BeamRaster: A Practical Fast Massive MU-MIMO System with Pre-computed Precoders
    Meng Meng, Wencong Xiao, Tong He, Yuechen Tao, Kun Tan, Jiansong Zhang, Wenjie Wang
    IEEE Transactions on Mobile Computing (TMC)
    [pdf]

  • Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications
    Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang
    Microsoft Research Technical Report (MSR-TR-2018-13)
    [pdf]

  • Optimization Mapping for Deep Learning
    Wencong Xiao, Cheng Chen, Youshan Miao, Jilong Xue, Ming Wu
    The 26th ACM Symposium on Operating Systems Principles AI Systems Workshop (SOSP ’17 AISys)
    [pdf][poster]

  • All You Need to Know about Scheduling Deep Learning Jobs
    Wencong Xiao, Fan Yang, Lidong Zhou
    The 26th ACM Symposium on Operating Systems Principles Student Research Competition (SOSP ’17 SRC)
    [pdf][poster]

  • KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC
    Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, Lintao Zhang
    The 26th ACM Symposium on Operating Systems Principles (SOSP ’17)
    [pdf]

  • Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter
    Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, Enhong Chen
    The 1st Asia-Pacific Workshop on Networking (APNet ’17)
    [pdf]

  • TuX2: Distributed Graph Computation for Machine Learning
    Wencong Xiao, Jilong Xue, Youshan Miao, Cheng Chen, Zhen Li, Ming Wu, Wei Li, Lidong Zhou
    The 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17)
    [pdf][slides]

  • GRAM: Scaling Graph Computation to the Trillions
    Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, Lidong Zhou
    ACM Symposium on Cloud Computing 2015 (SoCC ’15)
    [pdf]

Articles

A collection of articles. Some of them are invited articles, mainly about conference impression. Note that they might be written in CHINESE.

Articles - Machine Learning System Research after TensorFlow

A paper list about machine learning system and infrastructure published after TensorFlow. (will be updated)

January 2019

Articles - OSDI2018:探寻计算机系统之美

这篇文章是受微软亚洲研究院学术交流部邀请撰写,主要记录了参加OSDI2018的参会过程和感想,以及对系统研究内容和方向的思考。原稿发表在微软亚洲研究院学术交流部官方微信公众号上,为了让更多人能看到现转载至此。

November 2018

Articles - 惊喜与挑战并行的NSDI 2017

这篇文章主要记录了参加NSDI2017的参会过程和感想,以及对于网络系统研究内容和方向的思考。

April 2017

Articles - 方兴未艾的云计算:SoCC 2015大会

这篇文章主要记录了参加SoCC2015的参会过程和感想,以及对于网络系统研究内容和方向的思考。

September 2015

Experience

Research Intern

Microsoft Research Asia

Conduct research works in distributed machine learning, cluster resource management, graph computing, etc.

July 2013 - March 2019

Research Intern

Microsoft Research Redmond

Work on a research project about Microsoft GPU cluster for deep learning.

July 2016 - October 2016

Education

Beihang University

Doctor of Philosophy
Distributed System, Computer Science
2014 - 2019 (expected)

Beihang University

Bachelor
Computer Science
2010 - 2014