Publications

Underlined are students/interns working with me.

Recent Preprints

Minchen Yu, Yinghao Ren, Jiamu Zhao, Jiaqi Li, “Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics,” in arXiv preprint arXiv:2507.11929.
Minchen Yu*, Rui Yang*, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen, “$\lambda$Scale: Enabling Fast Scaling for Serverless Large Language Model Inference,” in arXiv preprint arXiv:2502.09922.

Refereed Papers

Hao Wu, Yaochen Liu, Minchen Yu, Qizhen Weng, Junxiao Deng, Yue Yu, Hao Fan, Song Wu, Wei Wang, and Hai Jin, “Efficient Data Passing for Serverless Inference Workflows: A GPU-Centric Approach,” in ACM European Conference on Computer Systems (EuroSys’26), Edinburgh, UK, April 2026.
Kaiyu Huang, Hao Wu, Zhubo Shi, Han Zou, Minchen Yu, Qingjiang Shi, “AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving,” in the Proceedings of ACM Symposium on Cloud Computing (SoCC’25), Virtual Conference, November 2025.
Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang, “Toppings: CPU-Assisted, Rank-Aware Adapter Serving for LLM Inference,” in USENIX Annual Technical Conference (ATC ’25), Boston, MA, July 2025.
Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang, Yu Ding, “Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference,” in USENIX Annual Technical Conference (ATC ’25), Boston, MA, July 2025.[code]
Minchen Yu, Tingjia Cao, Wei Wang, Ruichuan Chen, “Pheromone: Restructuring Serverless Computing with Data-Centric Function Orchestration,” in IEEE/ACM Transactions on Networking (TON), 2024.
Minchen Yu, Tingjia Cao, Wei Wang, Ruichuan Chen, “Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing,” in the Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI’23), Boston, MA, April 2023. [code]
Minchen Yu, Zhifeng Jiang, Hok Chun Ng, Wei Wang, Ruichuan Chen, and Bo Li, ‘‘Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning,’’ in the Proceedings of the 41st IEEE International Conference on Distributed Computing Systems (ICDCS’21), Virtual Conference, July 2021. (Best Paper Runner Up) [code]
Huangshi Tian, Minchen Yu, and Wei Wang, ‘‘CrystalPerf: Learning to Characterize the Performance of Dataflow Computation through Code Analysis,’’ in the Proceedings of USENIX Annual Technical Conference (ATC ’21), Virtual Conference, July 2021.
Minchen Yu, Yinghao Yu, Yunchuan Zheng, Baichen Yang, and Wei Wang, “RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data,” in the Proceedings of IEEE INFOCOM’20, Virtual Conference, July 2020.
Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan, “Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud,” in IEEE Transactions on Cloud Computing (TCC), 2020.
Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan, “MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving,” in the Proceedings of USENIX Annual Technical Conference (ATC’19), Renton, WA, July 2019.
Huangshi Tian, Minchen Yu, and Wei Wang, “Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning,” in the Proceedings of ACM Symposium on Cloud Computing (SoCC’18), Carlsbad, CA, October 2018.

Minchen Yu

Publications

Recent Preprints

Refereed Papers