Bereitstellung von LM Studio als Dienst in Ubuntu 25.04

Heathrow Closed: Flight Suspension Expected to Continue for the Coming Days, Says London Airport Manager

From the BBC website

British Airways estimated that 85% of its scheduled flights would operate on Saturday, but all flights were delayed. By 7:00 a.m. GMT, most departures had proceeded as expected, but of the arrivals, nine of the first 20 flights scheduled to land were canceled.

Somewhere between versions 0.3.31-2 and 0.3.31-7, a change occurred that forcibly changes the listening interface from external to 127.0.0.1 and disables CORS.

Since LM Studio supports hot-reload of the HTTP server configuration (at least when changing cors and networkInterface), you can make the change without restarting (it will work once):

%h/.lmstudio/.internal/http-server-config.json
jq '.cors = true | .networkInterface = "0.0.0.0"' ~/.lmstudio/.internal/http-server-config.json | sponge ~/.lmstudio/.internal/http-server-config.json

remember to install sponge, which is included in the moreutils package.

To make the change automatic, modify the startup script:

$HOME/.config/systemd/user/lm-studio.service
[Unit]
Description=LM Studio Service
After=network.target

[Service]
Type=simple

ExecStart=/usr/bin/xvfb-run -a --server-args="-screen 0 1920x1080x24" %h/llm/lmstudio --run-as-service

# 1. Start the HTTP server
ExecStartPost=/bin/bash -c 'sleep 15 && exec lms server start'

# 2. Apply required settings (after the server has already started)
ExecStartPost=/bin/bash -c ' \
  sleep 2 && \
  jq ".cors = true | .networkInterface = \"0.0.0.0\"" \
     "%h/.lmstudio/.internal/http-server-config.json" \
     > "%h/.lmstudio/.internal/http-server-config.json.tmp" && \
  mv "%h/.lmstudio/.internal/http-server-config.json.tmp" \
     "%h/.lmstudio/.internal/http-server-config.json" \
'

Restart=always
RestartSec=10
Environment=PATH=%h/.local/bin:/usr/local/bin:/usr/bin:/bin:%h/.lmstudio/bin
Environment=DISPLAY=:99
WorkingDirectory=%h/llm

[Install]
WantedBy=default.target

Apply the changes and restart the service (without sudo!):

systemctl --user daemon-reload
systemctl --user stop lm-studio.service
systemctl --user start lm-studio.service

Achten Sie auf die Optimierung der Betriebssystemparameter für optimale Leistung.

Beispiel

Hier ein Beispiel aus meinem System (ein Laptop mit einer über Oculink angeschlossenen Grafikkarte):

free -h
               total        used        free      shared  buff/cache   available
Mem:            37Gi       4.9Gi        29Gi       240Mi       3.5Gi        32Gi
Swap:          8.0Gi          0B       8.0Gi
cat /proc/meminfo | grep -E 'MemTotal|MemAvailable'

MemTotal:       39223064 kB
MemAvailable:   34073484 kB

VRAM wird separat angezeigt – sie ist nicht Teil von MemTotal.

nvidia-smi

Fri Nov 21 12:24:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.09              Driver Version: 580.82.09      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     Off |   00000000:01:00.0 Off |                  N/A |
|  0%   44C    P8              8W /  180W |      13MiB /  16311MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5314      G   /usr/bin/gnome-shell                      2MiB |
+-----------------------------------------------------------------------------------------+

Was tun

Überprüfen Sie die Einstellungen für overcommit und swappiness:

cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio
cat /proc/sys/vm/swappiness

Empfohlene Werte:

2 # Limit = (RAM * ratio/100) + Swap
80 # Dies ist der Ratio-Wert
10 # Nicht höher als 20 – das ist der Anteil der verbleibenden RAM, bei dem Swap aktiviert wird; der Swap kann 8 GiB betragen, wenn die Betriebssystem-Hibernation nicht genutzt wird, dann entspricht der Swap der Größe der RAM

Ausführbare Befehle:

sudo sysctl vm.overcommit_ratio=80
echo 'vm.overcommit_ratio=80' | sudo tee -a /etc/sysctl.d/99-ml-workstation.conf
sudo sysctl vm.swappiness=10
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-ml-workstation.conf

WICHTIG: Bei LLMs (insbesondere llama.cpp) kann THP = always zu Pausen (Verzögerungen von einer Minute nach dem Systemstart) führen.

cat /sys/kernel/mm/transparent_hugepage/enabled
# Sollte sein: [always] madvise never

Beim Arbeiten mit Speicher (z. B. wenn LLM-Modelle auf SSD geschrieben werden), ist die writeback-Einstellung wichtig:

cat /proc/sys/vm/dirty_ratio
cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_expire_centisecs

Optimale Werte für NVMe:

10
5
1000 (das sind 10 Sekunden)
echo 'vm.dirty_ratio=10' | sudo tee -a /etc/sysctl.d/99-ml-workstation.conf
echo 'dirty_background_ratio=5' | sudo tee -a /etc/sysctl.d/99-ml-workstation.conf
echo 'vm.dirty_expire_centisecs=1000' | sudo tee -a /etc/sysctl.d/99-ml-workstation.conf

Zwischen den Starts von lmstudio wird empfohlen, den Speicher-Cache zu leeren:

sync && echo 2 | sudo tee /proc/sys/vm/drop_caches