Deploying LM Studio as a service on Ubuntu 25.04

Article updated 2025.11.10, see comment.

Introduction

In this article, I will explain how to deploy LM Studio as a service in an Ubuntu 25.04 environment (may also work for other versions).

LM Studio as a server not only allows you to load language models and work with them independently but also organizes an API for connecting external services. You can return to local operation by stopping the service and launching the application as usual.

You will be able to:

  • work with different language models (load on demand when not in use)
  • connect your applications or plugins to the API

Download

Download the AppImage (we are talking about an application for Linux), website:

Place the file in the ~/llm directory and make it executable:

chmod +x ~/llm/LM-Studio-0.3.27-4-x64.AppImage

At the time of writing, this version was current.

headless

Deployment as a service is required when you are working with it remotely. Decide immediately:

a) on the server (PC/laptop, etc.) there is graphics and the user has already logged into the UI shell at least once. Or if you work in the UI - then skip this step.

b) there is no graphics on the server or the user never logs into the UI. In this case, you will need a couple of additional commands executed once:

sudo loginctl enable-linger $USER

this command makes it possible to work with an environment without logging into a graphical interface

loginctl show-user ivan | grep Linger

this command shows the status of the setting (active or unavailable):

Linger=yes

Commands to check that nothing is interfering (the output should be some understandable status like degraded, but errors are not allowed):

systemctl --user status
systemctl --user is-system-running

systemd

If you still haven’t decided on a headless mode (without graphics), then skip this step.

But if autostart is your everything, then please create a script:

~/.config/systemd/user/lm-studio.service

And let’s assume that your LM Studio executable file is located at

%HOME/llm/LM-Studio-0.3.27-4-x64.AppImage

content:

[Unit]
Description=LM Studio Service
After=network.target

[Service]
Type=simple

ExecStart=/usr/bin/xvfb-run -a --server-args="-screen 0 1920x1080x24" %h/llm/LM-Studio-0.3.27-4-x64.AppImage --run-as-service
ExecStartPost=/bin/bash -c 'sleep 10 && exec lms server start'

Restart=always
RestartSec=10
Environment=PATH=%h/.local/bin:/usr/local/bin:/usr/bin:/bin:%h/.lmstudio/bin
Environment=DISPLAY=:99
WorkingDirectory=%h/llm

[Install]
WantedBy=default.target

The script consists of two parts:

  • launching the application
  • starting the server part

In essence, such a script is fragile because it consists of two parts that do not know about each other’s state. Do not use such solutions in production. And don’t use LM Studio at all, as VLLM is much faster.

Note that here a screen is emulated, so you need to additionally install:

sudo apt update && sudo apt install xvfb

Please note that systemd in this example will be installed by the user, not root.

Perform standard operations for a startup script (apply changes and enable autostart):

systemctl --user daemon-reload
systemctl --user enable --now lm-studio.service

Launch

systemctl --user status lm-studio.service

The service should now be silent, as it has not been launched yet.

systemctl --user start lm-studio.service

and now check the server status:

lms server status

it should be listening on port 1234.

Strictly speaking, you should first play with the LM Studio UI to set up the necessary parameters and load models. And also translate the listening from address 127.0.0.1 to 0.0.0.0 if you need to organize external connection to your API (potentially dangerous, so first install encryption and authorization)

CURL

Check your server endpoint:

curl -v http://127.0.0.1:1234/v1/models

you should get back the available models:

* Trying 127.0.0.1:1234...
* Connected to 127.0.0.1 (127.0.0.1) port 1234
* using HTTP/1.x
> GET /v1/models HTTP/1.1
> Host: 127.0.0.1:1234
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
<
{
  "data": [
    {
      "id": "nvidia_nvidia-nemotron-nano-9b-v2",
      "object": "model",
      "owned_by": "organization_owner"
    },
    {
      "id": "text-embedding-qwen3-embedding-0.6b",
      "object": "model",
      "owned_by": "organization_owner"
    },
    {
      "id": "qwen/qwen3-8b",
      "object": "model",
      "owned_by": "organization_owner"
    },
    {
      "id": "google/gemma-3-4b",
      "object": "model",
      "owned_by": "organization_owner"
    }
  ],
  "object": "list"
* Connection #0 to host 127.0.0.1 left intact

REST API

Now you can connect to your REST API using the endpoints:

GET http://127.0.0.1:1234/v1/models
POST http://127.0.0.1:1234/v1/chat/completions
POST http://127.0.0.1:1234/v1/completions
POST http://127.0.0.1:1234/v1/embeddings

Somewhere between versions 0.3.31-2 and 0.3.31-7 a change occurred that forcibly changes the listening interface from external to 127.0.0.1 and disables CORS.

Since LM Studio supports hot-reload of the HTTP server config (at least when changing cors and networkInterface), you can make the change without restarting (it will work once):

%h/.lmstudio/.internal/http-server-config.json
jq '.cors = true | .networkInterface = "0.0.0.0"' ~/.lmstudio/.internal/http-server-config.json | sponge ~/.lmstudio/.internal/http-server-config.json

remember to install sponge, which is included in the moreutils package.

To make the change automatic, modify the startup script:

$HOME/.config/systemd/user/lm-studio.service
[Unit]
Description=LM Studio Service
After=network.target

[Service]
Type=simple

ExecStart=/usr/bin/xvfb-run -a --server-args="-screen 0 1920x1080x24" %h/llm/lmstudio --run-as-service

# 1. Start the HTTP server
ExecStartPost=/bin/bash -c 'sleep 15 && exec lms server start'

# 2. Apply required settings (after the server has already started)
ExecStartPost=/bin/bash -c ' \
  sleep 2 && \
  jq ".cors = true | .networkInterface = \"0.0.0.0\"" \
     "%h/.lmstudio/.internal/http-server-config.json" \
     > "%h/.lmstudio/.internal/http-server-config.json.tmp" && \
  mv "%h/.lmstudio/.internal/http-server-config.json.tmp" \
     "%h/.lmstudio/.internal/http-server-config.json" \
'

Restart=always
RestartSec=10
Environment=PATH=%h/.local/bin:/usr/local/bin:/usr/bin:/bin:%h/.lmstudio/bin
Environment=DISPLAY=:99
WorkingDirectory=%h/llm

[Install]
WantedBy=default.target

Apply the changes and restart the service (without sudo!):

systemctl --user daemon-reload

Pay attention to OS tuning parameters for optimal performance.

Example

Here’s an example from my system (this is a laptop with a graphics card connected via Oculink):

free -h
               total        used        free      shared  buff/cache   available
Mem:            37Gi       4.9Gi        29Gi       240Mi       3.5Gi        32Gi
Swap:          8.0Gi          0B       8.0Gi
cat /proc/meminfo | grep -E 'MemTotal|MemAvailable'

MemTotal:       39223064 kB
MemAvailable:   34073484 kB

VRAM is reported separately — it is not included in MemTotal

nvidia-smi

Fri Nov 21 12:24:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.09              Driver Version: 580.82.09      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     Off |   00000000:01:00.0 Off |                  N/A |
|  0%   44C    P8              8W /  180W |      13MiB /  16311MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5314      G   /usr/bin/gnome-shell                      2MiB |
+-----------------------------------------------------------------------------------------+

What to Do

Check the overcommit policy and swappiness:

cat