Automation of vLLM on GPU using GitLab CI/CD and RunPod

ivan · December 8, 2025, 12:35am

![The image shows a container management interface where deployment, list, and pod management actions (stop/start/destroy) are organized as interconnected blocks. (Image caption from AI)|690x190](upload://vIBp6USq9mV03c5dOg4tHNxgJAj.png)

GitHub

Medium

https://komnata-lomki.medium.com/automating-vllm-on-gpu-using-gitlab-ci-cd-and-runpod-8040d8efb70d?source=friends_link&sk=97b512eef8bfc1ca12aa6518135bc957

Getting Started

Running large language models (LLM) may require significant computational resources. When you need to experiment with different GPUs, you must rent cloud power.

Managing these resources manually takes a lot of time, leads to errors, is difficult to scale, and has many drawbacks. For example, there’s no control over how much each employee spent, and who often forgets to shut down virtual machines.As Lead DevOps, I encountered the problem of providing an easy way to deploy, manage, and destroy GPU instances on RunPod for our team. Our primary interest was inference using vLLM on different GPUs.

To solve this problem, I developed a GitLab CI/CD pipeline that automates the entire lifecycle of RunPod GPU modules directly from the GitLab interface.

This solution aims to simplify GPU resource management, improve workflow efficiency, and ensure transparency in the deployment process. In this post, I’ll walk you through how to set it up and how it works.

The Problem

Regardless of whether the RunPod account is correctly configured, there are inconveniences regarding control and billing per team members. GPU computations are expensive, so automated control in this area is especially important.

Need I say that manual deployment requires more time and constant attention?It is natural for humans to make mistakes, especially under conditions of repeated trials.

There is another issue (which has not yet been reliably solved) — automated GPU scope management. After all, parallel operation with several vLLM instances is clearly better than sequential operation.

Solution: GitLab CI/CD Pipeline

As a result of my research, I have created a file I wish to present — .gitlab-ci.yml. It uses simple cURL requests to create and delete Pods, and configure vLLM.

How It Works

I divided the code into stages:

deploy
listing
destroy
stop
start

In the future, additional steps will be added:

sending emails
sending notifications to DiscourseThe last three stages require the listing to be completed. Since the Pod ID becomes known at this stage, it cannot be obtained during the Pod creation phase, as deployment takes varying amounts of time and the API responds before deployment finishes. For this purpose, a separate listing step was created.

Each step includes detailed logging in the Gitlab Pipeline. This should help in the early stages and is convenient for onboarding new team members.

When launching a new pipeline, variables can be overridden (otherwise, update them via code edits).

Installation Guide (Brief)

Important variables should be stored in Gitlab, under the Variables section in the CI project. The code references them. These include variables such as:- RUNPOD_API_KEY: Your RunPod API key

HG_TOKEN: Your Hugging Face token
RUNPOD_SSH_PUBLIC_KEYS: Your SSH public keys for accessing the pod (if applicable, separated by \n if there are multiple)
RUNPOD_SSH_PRIVATE_KEY: Your SSH private key so that GitLab Runner can connect to the pod
(Use sensitive data masking so they do not appear in logs in plain text)

Why this approach is beneficialSuch an approach is beneficial for:

Assigning AI engineers of different levels to roles — one CI works for everyone
Each employee has their own Pipeline space with history
Easily split account costs among employees
Easily monitor runs and current environments
Easily store different configurations as branches
Conveniently deploy as many demo environments as needed
Universal DevOps approach — supports any engineer
Easy code portability between development teams
Most importantly — 100% compatibility with GitLab workflows when integrating AI elements into the project architecture

ConclusionThis GitLab CI/CD configuration provides a reliable and convenient way to interact with RunPod GPU instances for AI workloads. It bridges the gap between DevOps practices and the adoption of artificial intelligence models. You can find the code and detailed instructions in the repository. As the author, I encourage you to try it out, star it if you find it useful, and contribute with your ideas for improvement! You might also be interested in visiting my site https://discuss.rabkesov.ru to learn more about my work and participate in discussions.

Topic		Replies	Views
Развертывание LM Studio как сервиса в Ubuntu 25.04 AI	2	75	November 21, 2025
Настройка vLLM для максимальной производительности AI medium	0	11	December 1, 2025
GPU калькулятор AI документация	0	27	November 11, 2025
Как запустить AppImage LM Studio в качестве службы (разово без systemd) Ввод сервиса	0	13	August 18, 2025
Работа с Python и его окружением на примере VLLM AI env	0	59	June 26, 2025