Selected Publications
View all publications on Google Scholar
Robust LLM Training Infrastructure at ByteDance
B Wan, G Liu, Z Song, J Wang, Y Zhang, G Sheng, S Wang, H Wei, ..., Wencong Xiao, ...
SOSP 2025
Llumnix: Dynamic Scheduling for Large Language Model Serving
Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin
OSDI 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin
arXiv 2024
MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters
Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, Yu Ding
NSDI 2022
AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, Yangqing Jia
OSDI 2020
An Empirical Study on Program Failures of Deep Learning Jobs 🏆 Distinguished Paper Award
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang
ICSE 2020
Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity
Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxing Liu, Ming Wu, Lintao Zhang
FPGA 2019
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang
USENIX ATC 2019
Gandiva: Introspective Cluster Scheduling for Deep Learning
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, Lidong Zhou
OSDI 2018
TuX²: Distributed Graph Computation for Machine Learning
Wencong Xiao, Jilong Xue, Youshan Miao, Cheng Chen, Zhen Li, Ming Wu, Wei Li, Lidong Zhou
NSDI 2017