Abstract: Large language models increasingly rely on pipeline parallelism for distributed inference, but existing systems face critical challenges in serverless environments: heterogeneous request ...