Xudong Liao

Founding Engineer at Netpreme | Ph.D., HKUST

xudong.jpg

I am a founding engineer at Netpreme, where we build next-generation computer systems to break the memory wall for AI.

I received my Ph.D. from The Hong Kong University of Science and Technology (HKUST), where I was advised by Prof. Kai Chen. Before that, I earned my B.Eng. in Software Engineering from Wuhan University in 2020, graduating as an Outstanding Graduate.

My research focuses on building high-performance systems that bridge applications, algorithms, and hardware. In particular, I work on:

  • application-aware optimization for distributed systems. I design systems that push the hardware-software boundary by tailoring architectures and algorithms to workload behavior:
    • MixNet (SIGCOMM’25) is a runtime-reconfigurable optical-electrical fabric for distributed Mixture-of-Experts training. It exploits dynamic, sparse, and localized traffic patterns to adapt topology on the fly, enabling scalable and cost-efficient training across thousands of GPUs while maintaining near-ideal training speed.
    • Herald (NSDI’24) is an embedding-aware scheduler for DLRM training. It leverages predictable and infrequent in-cache embedding access patterns to eliminate a substantial portion of communication overhead and accelerate training.
    • Pallas (ATC’25) is a rack-scale CPU scheduling system that combines switch programmability with request-level predictability to enable efficient in-network workload shaping and achieve near-optimal microsecond-level tail latency.
  • practical, high-performance learning-based congestion control. This line of work includes MOCC (EuroSys’22), Spine (CoNEXT’22), Astraea (EuroSys’24), Jury (EuroSys’25), Learn-to-Probe (EuroSys’26), and PolicyCache (NSDI’26). Across these projects, we address challenges such as multi-objective optimization, runtime overhead, fairness, convergence, signal distinguishability, and performance generalization, with the goal of making learning-driven transport practical in real deployments.

During my time at WHU, I was fortunate to be advised by Prof. Yanjiao Chen. I have also had the opportunity to collaborate closely with Prof. Guyue Liu from Peking University and Dr. Zhizhen Zhong from MIT on several recent projects.

Research Interests

  • Machine Learning Systems
  • Datacenter Networking
  • Congestion Control
  • Optical Networking

news

Sep 25, 2025 Passed PhD thesis defense!
Aug 22, 2025 Two co-authored papers LTP and MFS accepted to EuroSys 2026!
Jul 12, 2025 MixNet accepted to SIGCOMM 2025!
Apr 25, 2025 Pallas accepted to ATC 2025!
Jan 10, 2024 Astraea accepted to EuroSys 2024!

selected publications

* equal contribution

View Full Publication List →
  1. SIGCOMM
    MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training
    Xudong Liao, Yijun Sun, Han TianXinchen WanYilun JinZilong WangZhenghang RenXinyang HuangWenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu , Ying Zhang, Xiaofeng Ye , Yiming Zhang, and Kai Chen
    In Proceedings of the 2025 ACM SIGCOMM Conference (SIGCOMM 2025) , 2025
  2. ATC
    Towards Optimal Rack-scale μs-level CPU Scheduling through In-Network Workload Shaping
    Xudong LiaoHan TianXinchen WanChaoliang ZengHao WangJunxue Zhang, Mengyu Ma, Guyue Liu, and Kai Chen
    In 2025 USENIX Annual Technical Conference (ATC 2025) , 2025
  3. OSDI
    Enabling Efficient GPU Communication over Multiple NICs with FuseLink
    Zhenghang Ren , Yuxuan Li , Zilong WangXinyang HuangWenxue Li, Kaiqiang Xu, Xudong Liao, Yijun Sun, Bowen Liu, Han TianJunxue Zhang , Mingfei Wang, Zhizhen Zhong, Guyue Liu , Ying Zhang, and Kai Chen
    In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2025) , 2025
  4. EuroSys
    Achieving Fairness Generalizability for Learning-based Congestion Control with Jury
    Han TianXudong Liao, Decang Sun, Chaoliang ZengYilun JinJunxue ZhangXinchen WanZilong Wang , Yong Wang, and Kai Chen
    In Proceedings of the 20th ACM European Conference on Computer Systems (EuroSys 2025) , 2025
  5. EuroSys
    Astraea: Towards Fair and Efficient Learning-based Congestion Control
    Xudong Liao*Han Tian*Chaoliang ZengXinchen Wan, and Kai Chen
    In Proceedings of the 19th ACM European Conference on Computer Systems (EuroSys 2024) , 2024
  6. NSDI
    Accelerating Neural Recommendation Training with Embedding Scheduling
    Chaoliang Zeng*Xudong Liao*, Xiaodian Cheng, Han TianXinchen WanHao Wang, and Kai Chen
    In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024) , 2024
  7. SIGMOD
    Scalable and Efficient Full-Graph GNN Training for Large Graphs
    Xinchen Wan, Kaiqiang Xu, Xudong LiaoYilun JinKai Chen , and Xin Jin
    In Proceedings of the ACM on Management of Data (SIGMOD 2023) , 2023
  8. CoNEXT
    Spine: An Efficient DRL-Based Congestion Control with Ultra-Low Overhead
    Han Tian*Xudong Liao*Chaoliang ZengJunxue Zhang, and Kai Chen
    In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT 2022) , 2022
  9. EuroSys
    Multi-Objective Congestion Control
    Yiqing Ma, Han TianXudong LiaoJunxue Zhang , Weiyan Wang, Kai Chen , and Xin Jin
    In Proceedings of the 17th European Conference on Computer Systems (EuroSys 2022) , 2022
  10. NSDI
    PolicyCache: Intra-flow Learning in Congestion Control
    Han Tian , Han Wang , Wenbo Li, Xudong Liao, Decang Sun, Wenxue Li , Donghui Chen , Bin Huang, Senbo Fu, Junxue Zhang, Dian Shen, and Kai Chen
    In Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 2026) , 2026
  11. EuroSys
    Learn-to-Probe: Achieving Signal Distinguishability in Learning-based Congestion Control
    Han Tian , Wenbo Li, Junxue ZhangXudong Liao, Decang Sun , Donghui Chen , Bin Huang, Wenxue Li , Yong Wang, and Kai Chen
    In Proceedings of the 21th ACM European Conference on Computer Systems (EuroSys 2026) , 2026