Publications
Recent Preprints
- Kaiyu Huang, Hao Wu, Zhubo Shi, Han Zou, Minchen Yu, Qingjiang Shi, “SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding,” in arXiv preprint arXiv:2503.05096.
- Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen, “$\lambda$Scale: Enabling Fast Scaling for Serverless Large Language Model Inference,” in arXiv preprint arXiv:2502.09922.
- Hao Wu, Junxiao Deng, Minchen Yu, Yue Yu, Yaochen Liu, Hao Fan, Song Wu, Wei Wang, “FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing,” in arXiv preprint arXiv:2411.01830.
- Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang, “CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference,” in arXiv preprint arXiv:2401.11240.
- Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang, “FaaSwap: SLO-Aware, GPU-Eficient Serverless Inference via Model Swapping,” in arXiv preprint arXiv:2306.03622.
Refereed Papers
- Minchen Yu, Tingjia Cao, Wei Wang, Ruichuan Chen, “Pheromone: Restructuring Serverless Computing with Data-Centric Function Orchestration,” in IEEE/ACM Transactions on Networking (TON), 2024.
- Minchen Yu, Tingjia Cao, Wei Wang, Ruichuan Chen, “Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing,” in the Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI’23), Boston, MA, April 2023. [code]
- Minchen Yu, Zhifeng Jiang, Hok Chun Ng, Wei Wang, Ruichuan Chen, and Bo Li, ‘‘Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning,’’ in the Proceedings of the 41st IEEE International Conference on Distributed Computing Systems (ICDCS’21), Virtual Conference, July 2021. (Best Paper Runner Up) [code]
- Huangshi Tian, Minchen Yu, and Wei Wang, ‘‘CrystalPerf: Learning to Characterize the Performance of Dataflow Computation through Code Analysis,’’ in the Proceedings of USENIX Annual Technical Conference (ATC ’21), Virtual Conference, July 2021.
- Minchen Yu, Yinghao Yu, Yunchuan Zheng, Baichen Yang, and Wei Wang, “RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data,” in the Proceedings of IEEE INFOCOM’20, Virtual Conference, July 2020.
- Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan, “Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud,” in IEEE Transactions on Cloud Computing (TCC), 2020.
- Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan, “MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving,” in the Proceedings of USENIX Annual Technical Conference (ATC’19), Renton, WA, July 2019.
- Huangshi Tian, Minchen Yu, and Wei Wang, “Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning,” in the Proceedings of ACM Symposium on Cloud Computing (SoCC’18), Carlsbad, CA, October 2018.