Automation of vLLM on GPU using GitLab CI/CD and RunPod

![The image shows a container management interface where deployment, list, and pod management actions (stop/start/destroy) are organized as interconnected blocks. (Image caption from AI)|690x190](upload://vIBp6USq9mV03c5dOg4tHNxgJAj.png)

GitHub

Medium

https://komnata-lomki.medium.com/automating-vllm-on-gpu-using-gitlab-ci-cd-and-runpod-8040d8efb70d?source=friends_link&sk=97b512eef8bfc1ca12aa6518135bc957

Getting Started

Running large language models (LLM) may require significant computational resources. When you need to experiment with different GPUs, you must rent cloud power.

Managing these resources manually takes a lot of time, leads to errors, is difficult to scale, and has many drawbacks. For example, there’s no control over how much each employee spent, and who often forgets to shut down virtual machines.As Lead DevOps, I encountered the problem of providing an easy way to deploy, manage, and destroy GPU instances on RunPod for our team. Our primary interest was inference using vLLM on different GPUs.

To solve this problem, I developed a GitLab CI/CD pipeline that automates the entire lifecycle of RunPod GPU modules directly from the GitLab interface.

This solution aims to simplify GPU resource management, improve workflow efficiency, and ensure transparency in the deployment process. In this post, I’ll walk you through how to set it up and how it works.

The Problem

Regardless of whether the RunPod account is correctly configured, there are inconveniences regarding control and billing per team members. GPU computations are expensive, so automated control in this area is especially important.

Need I say that manual deployment requires more time and constant attention?It is natural for humans to make mistakes, especially under conditions of repeated trials.

There is another issue (which has not yet been reliably solved) — automated GPU scope management. After all, parallel operation with several vLLM instances is clearly better than sequential operation.

Solution: GitLab CI/CD Pipeline

As a result of my research, I have created a file I wish to present — .gitlab-ci.yml. It uses simple cURL requests to create and delete Pods, and configure vLLM.

How It Works

I divided the code into stages:

  • deploy
  • listing
  • destroy
  • stop
  • start

In the future, additional steps will be added:

  • sending emails
  • sending notifications to DiscourseThe last three stages require the listing to be completed. Since the Pod ID becomes known at this stage, it cannot be obtained during the Pod creation phase, as deployment takes varying amounts of time and the API responds before deployment finishes. For this purpose, a separate listing step was created.

Each step includes detailed logging in the Gitlab Pipeline. This should help in the early stages and is convenient for onboarding new team members.

When launching a new pipeline, variables can be overridden (otherwise, update them via code edits).

Installation Guide (Brief)

Important variables should be stored in Gitlab, under the Variables section in the CI project. The code references them. These include variables such as:- RUNPOD_API_KEY: Your RunPod API key

  • HG_TOKEN: Your Hugging Face token
  • RUNPOD_SSH_PUBLIC_KEYS: Your SSH public keys for accessing the pod (if applicable, separated by \n if there are multiple)
  • RUNPOD_SSH_PRIVATE_KEY: Your SSH private key so that GitLab Runner can connect to the pod
    (Use sensitive data masking so they do not appear in logs in plain text)

Why this approach is beneficialSuch an approach is beneficial for:

  • Assigning AI engineers of different levels to roles — one CI works for everyone
  • Each employee has their own Pipeline space with history
  • Easily split account costs among employees
  • Easily monitor runs and current environments
  • Easily store different configurations as branches
  • Conveniently deploy as many demo environments as needed
  • Universal DevOps approach — supports any engineer
  • Easy code portability between development teams
  • Most importantly — 100% compatibility with GitLab workflows when integrating AI elements into the project architecture

ConclusionThis GitLab CI/CD configuration provides a reliable and convenient way to interact with RunPod GPU instances for AI workloads. It bridges the gap between DevOps practices and the adoption of artificial intelligence models. You can find the code and detailed instructions in the repository. As the author, I encourage you to try it out, star it if you find it useful, and contribute with your ideas for improvement! You might also be interested in visiting my site https://discuss.rabkesov.ru to learn more about my work and participate in discussions.

Related Topics