Publications

* denotes equal contribution.

2025

  1. arXiv
    mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training
    Xudong Liao, Yijun Sun, Han TianXinchen WanYilun Jin , Zilong Wang, Zhenghang Ren, Xinyang Huang, Wenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu , Ying Zhang, Xiaofeng Ye , Yiming Zhang, and Kai Chen
    arXiv:2501.03905, 2025
  2. OSDI
    Enabling Efficient GPU Communication over Multiple NICs with FuseLink
    Zhenghang Ren, Yuxuan Li , Zilong Wang, Xinyang Huang, Wenxue Li, Kaiqiang Xu, Xudong Liao, Yijun Sun, Bowen Liu, Han TianJunxue Zhang , Mingfei Wang, Zhizhen Zhong, Guyue Liu , Ying Zhang, and Kai Chen
    In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2025) , 2025
  3. INFOCOM
    A Generic and Efficient Communication Framework for Message-level In-Network Computing
    Xinchen Wan, Luyang Li, Han TianXudong Liao, Xinyang Huang, Chaoliang Zeng , Zilong Wang, Xinyu Yang, Ke Cheng, Qingsong Ning, Guyue Liu, Layong Luo, and Kai Chen
    In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM 2025) , 2025
  4. ASPLOS
    Design and Operation of Shared Machine Learning Clusters on Campus
    Kaiqiang Xu, Decang Sun, Hao Wang, Zhenghang Ren, Xinchen WanXudong Liao , Zilong Wang, Junxue Zhang, and Kai Chen
    In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2025) , 2025
  5. EuroSys
    Achieving Fairness Generalizability for Learning-based Congestion Control with Jury
    Han TianXudong Liao, Decang Sun, Chaoliang ZengYilun JinJunxue ZhangXinchen Wan , Zilong Wang , Yong Wang, and Kai Chen
    In Proceedings of the 20th ACM European Conference on Computer Systems (EuroSys 2025) , 2025

2024

  1. EuroSys
    Astraea: Towards Fair and Efficient Learning-based Congestion Control
    Xudong Liao*Han Tian*Chaoliang ZengXinchen Wan, and Kai Chen
    In Proceedings of the 19th ACM European Conference on Computer Systems (EuroSys 2024) , 2024
  2. NSDI
    Accelerating Neural Recommendation Training with Embedding Scheduling
    Chaoliang Zeng*Xudong Liao*, Xiaodian Cheng, Han TianXinchen WanHao Wang, and Kai Chen
    In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024) , 2024

2023

  1. SIGMOD
    Scalable and Efficient Full-Graph GNN Training for Large Graphs
    Xinchen Wan, Kaiqiang Xu, Xudong LiaoYilun JinKai Chen , and Xin Jin
    In Proceedings of the ACM on Management of Data (SIGMOD 2023) , 2023
  2. TON
    Efficient DRL-Based Congestion Control With Ultra-Low Overhead
    Han Tian*Xudong Liao*Chaoliang Zeng, Decang Sun, Junxue Zhang, and Kai Chen
    IEEE/ACM Transactions on Networking, 2023

2022

  1. CoNEXT
    Spine: An Efficient DRL-Based Congestion Control with Ultra-Low Overhead
    Han Tian*Xudong Liao*Chaoliang ZengJunxue Zhang, and Kai Chen
    In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT 2022) , 2022
  2. EuroSys
    Multi-Objective Congestion Control
    Yiqing Ma, Han TianXudong LiaoJunxue Zhang , Weiyan Wang, Kai Chen , and Xin Jin
    In Proceedings of the 17th European Conference on Computer Systems (EuroSys 2022) , 2022

2021

  1. ArXiv
    Tacc: A full-stack cloud computing infrastructure for machine learning tasks
    Kaiqiang Xu, Xinchen WanHao Wang, Zhenghang Ren, Xudong Liao, Decang Sun, Chaoliang Zeng, and Kai Chen
    arXiv preprint arXiv:2110.01556, 2021
  2. Book
    Datacenter Traffic Optimization with Deep Reinforcement Learning
    Li Chen, Justinas Lingys, Kai Chen, and Xudong Liao
    2021