Renting a cloud GPU from RunPod, running a large language model via vLLM’s OpenAI compatible endpoint, and load testing it with K6.