CVE-2026-34159 · llama.cpp is an inference of several LLM models in C/C++.

01 · The Real Story

This is a loaded nail gun left in the AI cluster, not a landmine buried across your whole fleet

CVE-2026-34159 is an unauthenticated remote code execution bug in the ggml-rpc backend used by llama.cpp for distributed inference. In vulnerable builds, deserialize_tensor() skips bounds validation when an incoming tensor sets buffer=0, letting a remote client turn crafted GRAPH_COMPUTE messages into arbitrary process memory read/write and then code execution. Upstream/NVD track the vulnerable range as prior to b8492; the initial GitHub advisory was published before the fix landed and still shows <= b7991, so use the later NVD/Debian fixed boundary for patch decisions.

The vendor's 9.8/CRITICAL score is technically fair for a reachable target: no auth, low complexity, full RCE. But for enterprise prioritization it overstates population risk because exploitation requires an optional RPC build flag (-DGGML_RPC=ON) and a service that operators have actually exposed beyond localhost. That narrows affected hosts sharply compared with a normal internet-facing daemon, so this drops one bucket to HIGH unless you already know you run reachable rpc-server nodes.

"Unauthenticated RCE is real, but only on hosts that explicitly built and exposed llama.cpp RPC."

02 · The Attack Path

4 steps from start to impact.

STEP 01

Find exposed RPC nodes with nmap

The attacker first looks for rpc-server listeners on TCP 50052 or a custom port used for distributed inference. In real environments this is usually east-west discovery inside a flat AI segment, though some operators also publish it externally for multi-host inference or debugging.

Conditions required:

Attacker has TCP reachability to the rpc-server port
Target was built with -DGGML_RPC=ON and is actually running rpc-server

Where this breaks in practice:

Most llama.cpp installs do not use the RPC backend at all
Official RPC examples default to 127.0.0.1:50052, so external reachability often requires deliberate reconfiguration
Many enterprises isolate GPU nodes behind internal-only VLANs, SGs, or Kubernetes network policy

Detection/coverage: Asset scanners can find an open port, but generic vuln scanners have weak fingerprinting for ggml-rpc. NetFlow and host firewall telemetry on port 50052 are more reliable than signature-based scanning.

STEP 02

Speak raw RPC with a custom PoC client

The protocol is a purpose-built RPC service, not a hardened web API, so an attacker uses a custom client or exploit script rather than browser tooling. They only need protocol-level access; no login, token, or user interaction is required.

Conditions required:

Attacker can send arbitrary TCP payloads to the RPC listener
The target accepts RPC traffic from the attacker's source network

Where this breaks in practice:

This is not commodity HTTP exploitation; the attacker needs protocol knowledge or a ready-made write-up
Middleboxes that only proxy HTTP will not help the attacker reach this service

Detection/coverage: Look for malformed or unexpected RPC verbs, repeated short-lived TCP sessions to 50052, and crashes or errors in ggml-rpc logs. IDS coverage is likely thin unless you write your own decoder.

STEP 03

Leak process pointers via ALLOC_BUFFER and BUFFER_GET_BASE

The exploit chain uses protocol features to recover buffer addresses and defeat ASLR. That turns the bug from a memory corruption primitive into a practical code execution path on hardened builds.

Conditions required:

Target permits the normal RPC buffer-management commands
Attacker can complete enough protocol exchanges to harvest address information

Where this breaks in practice:

Some deployments may log or rate-limit repeated buffer allocation activity
EDR on the host may notice follow-on abnormal memory behavior even if it misses the protocol abuse itself

Detection/coverage: No common network scanner validates this stage. Host telemetry showing repeated rpc-server buffer operations or abnormal crash dumps is more useful than perimeter signatures.

STEP 04

Trigger GRAPH_COMPUTE null-buffer deserialization for arbitrary R/W and RCE

The attacker sends crafted GRAPH_COMPUTE tensors with buffer=0, causing deserialize_tensor() to skip validation and trust attacker-controlled pointers. From there they gain arbitrary read/write in the server process and can hijack function pointers for code execution as the service user, which the advisory notes is often root in Docker deployments.

Conditions required:

Vulnerable build earlier than b8492 or distro package lacking the backport
Service process runs with permissions valuable enough to matter

Where this breaks in practice:

A non-root runtime and tight container isolation reduce blast radius after code execution
Aggressive seccomp/AppArmor/SELinux profiles can limit post-exploit actions even if the crash-to-RCE step succeeds

Detection/coverage: Expect process crashes, abnormal ggml-rpc errors, core dumps, or EDR memory-corruption alerts if the exploit is noisy. There is little off-the-shelf scanner coverage for the exact vulnerable code path.

03 · Intelligence Metadata

The supporting signals.

In-the-wild status	No confirmed active exploitation found in the authoritative sources reviewed; not listed in CISA KEV.
Proof-of-concept availability	High confidence exploitability. The GitHub advisory states a full chain to RCE, and the public fix/PR (`#20908`, author `las7`) explains the vulnerable path clearly enough for reproduction.
EPSS	0.00534 (0.534%) from the intel you supplied; a third-party CVE mirror reports roughly P67-P68 percentile, which is low-to-middle rather than hotly exploited.
KEV status	Not KEV-listed as of the sources reviewed; no CISA due date applies.
CVSS vector reality check	`CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H` is accurate only after reachability exists. In practice, `AV:N` is narrowed by an optional build flag and frequent localhost-only binding.
Affected versions	Use upstream/NVD boundary: all versions before `b8492`. The original GHSA page still shows `<= b7991`, reflecting publication before the fix landed.
Fixed versions	Upstream fix is commit `39bf0d3...` / build `b8492`. Debian marks the issue fixed in package `8611+dfsg-1`; Ubuntu says 26.04 LTS not affected, older maintained releases mostly do not ship the package.
Exposure reality	Public scan evidence is weak, but official docs show the service defaults to `127.0.0.1:50052` and must be explicitly built with `-DGGML_RPC=ON`. That is meaningful downward pressure on broad-fleet urgency.
Disclosure timeline	GitHub advisory published 2026-03-26; CVE/NVD published 2026-04-01; NVD later added the `b8492` fix boundary on 2026-04-30.
Reporter / research context	Public reporting and the patch trail point to `las7` as the researcher/fix author; the advisory says the issue was reported to CERT/CC on 2026-02-08 before direct disclosure to maintainers.

04 · The Call

noisgate verdict.

Final Verdict

↓ DOWNGRADED to HIGH (8.1/10)

The single biggest reason this is not CRITICAL fleet-wide is that exploitation depends on an optional RPC backend that must be intentionally built and reachable, which sharply limits exposed population compared with a default network service. It still lands in HIGH because once that prerequisite is met, the chain is pre-auth, low-friction, and ends in full process compromise on high-value GPU hosts.

HIGH Technical exploitability once the RPC service is reachable

MEDIUM Estimated real-world exposure prevalence across enterprise fleets

HIGH Patch boundary at upstream `b8492` / Debian `8611+dfsg-1`

Why this verdict

Downgrade: optional feature gate — this is not reachable on a normal llama.cpp install; the vulnerable path requires a build with -DGGML_RPC=ON and an active rpc-server deployment.
Downgrade: attacker position is narrower than CVSS implies — the service defaults to 127.0.0.1:50052, so the attacker usually needs prior east-west foothold or an operator who deliberately exposed the port.
Upgrade pressure: full unauthenticated RCE on valuable nodes — if you do run reachable RPC nodes, the exploit chain needs no credentials and lands on GPU hosts that often have broad internal trust and expensive compute attached.
Downgrade: low threat telemetry — no KEV entry and the supplied EPSS is low, which argues against treating every mention of llama.cpp as an emergency across a 10,000-host estate.
Upgrade pressure: container practice can amplify impact — the advisory explicitly notes the process often runs as root in Docker, which turns a single service bug into a host/container-control event faster than many web-tier RCEs.

Why not higher?

This is not a ubiquitous listener like SSH, a browser, or a default enterprise management plane. The exploit population is trimmed by two compounding prerequisites: an explicit RPC build and network reachability to a service that commonly binds localhost by default. Those are real-world narrowing factors, not theoretical edge cases.

Why not lower?

Once the vulnerable service is reachable, there is very little defender friction left: no auth, no user action, and a clear path to arbitrary read/write then RCE. The target class also matters — GPU inference nodes often sit on trusted internal segments and may run privileged containers, so compromise value is high even if population is small.

05 · Compensating Control

What to do — in priority order.

Disable unused RPC services — Stop and remove rpc-server anywhere distributed inference is not actively required. For a HIGH verdict, deploy this compensating control within 30 days; it removes the reachable attack surface instead of trying to detect malformed protocol traffic.
Allowlist port 50052 — Restrict the RPC port to explicit peer IPs or cluster subnets with host firewalls, security groups, or Kubernetes NetworkPolicy. Apply within 30 days so only known inference peers can reach the service, cutting off opportunistic east-west abuse.
Keep RPC off public interfaces — Do not bind rpc-server to 0.0.0.0 unless you have an explicit private transport design around it; prefer loopback or tightly scoped internal addresses. Enforce within 30 days because the whole vendor risk model becomes accurate the moment you make the port broadly reachable.
Run as non-root with confinement — Move the service to a non-root UID and apply container/runtime restrictions such as seccomp, AppArmor, SELinux, read-only mounts, and minimal capabilities. Put this in place within 30 days to reduce post-exploit blast radius on the nodes you cannot patch immediately.
Segment GPU inference nodes — Treat multi-host LLM inference as a dedicated trust zone instead of a flat server LAN. Implement within 30 days so an initial foothold elsewhere in the environment does not automatically become reachability to every rpc-server.

What doesn't work

A WAF or API gateway in front of llama-server does not protect the raw ggml-rpc listener on 50052.
Relying on the fact that the service defaults to localhost does not help if your deployment scripts, Docker publish flags, or Kubernetes Services have already widened exposure.
MFA is irrelevant because the bug is pre-auth and hits a custom TCP service, not an interactive login flow.
EDR alone is not a preventive control here; it may catch the memory-corruption aftermath, but it does not stop the protocol flaw from being reachable.

06 · Verification

Crowdsourced verification payload.

Run this on the target Linux host or container image that ships llama.cpp/rpc-server. Invoke it as bash verify-cve-2026-34159.sh /path/to/llama/binaries or just bash verify-cve-2026-34159.sh; no root is required, but local filesystem access to the installed binaries makes detection more reliable.

noisgate-verify.sh

BASHREAD-ONLYSAFE

#!/usr/bin/env bash
# verify-cve-2026-34159.sh
# Checks whether a local llama.cpp installation is likely vulnerable to CVE-2026-34159.
# Logic:
#   - If upstream build number >= 8492, report PATCHED.
#   - If upstream build number < 8492 AND RPC components are present, report VULNERABLE.
#   - If version cannot be mapped cleanly, or build is old but no RPC component is found, report UNKNOWN.
# Exit codes: 0=PATCHED, 1=VULNERABLE, 2=UNKNOWN

set -euo pipefail

TARGET_ROOT="${1:-}"
FOUND_VERSION=""
FOUND_BUILD=""
FOUND_RPC="0"

have() { command -v "$1" >/dev/null 2>&1; }

add_candidate() {
  local p="$1"
  [ -n "$p" ] || return 0
  [ -e "$p" ] || return 0
  printf '%s\n' "$p"
}

collect_candidates() {
  {
    [ -n "$TARGET_ROOT" ] && add_candidate "$TARGET_ROOT"
    [ -n "$TARGET_ROOT" ] && add_candidate "$TARGET_ROOT/bin"
    add_candidate "$(pwd)"
    add_candidate "$(pwd)/build/bin"
    add_candidate "/usr/local/bin"
    add_candidate "/usr/bin"
    add_candidate "/opt"
    add_candidate "/app"
    add_candidate "/app/build/bin"
  } | awk '!seen[$0]++'
}

extract_build() {
  local text="$1"
  # Match upstream-style bNNNN first.
  if [[ "$text" =~ (^|[^A-Za-z0-9])b([0-9]{4,})([^A-Za-z0-9]|$) ]]; then
    printf '%s' "${BASH_REMATCH[2]}"
    return 0
  fi
  # Fallback: look for 'build 8492' or 'version 8492'.
  if [[ "$text" =~ (build|version)[^0-9]{0,8}([0-9]{4,}) ]]; then
    printf '%s' "${BASH_REMATCH[2]}"
    return 0
  fi
  return 1
}

probe_binary() {
  local bin="$1"
  local out=""
  if [ ! -x "$bin" ]; then
    return 1
  fi
  out="$({ "$bin" --version || "$bin" -v || true; } 2>&1 | head -n 5)"
  [ -n "$out" ] || return 1
  if build="$(extract_build "$out")"; then
    FOUND_VERSION="$out"
    FOUND_BUILD="$build"
    return 0
  fi
  return 1
}

check_rpc_presence() {
  local root="$1"
  [ -e "$root" ] || return 0

  if [ -d "$root" ]; then
    if find "$root" -maxdepth 4 \( -name 'rpc-server' -o -name 'libggml-rpc.so' -o -name 'libggml-rpc.dylib' \) 2>/dev/null | grep -q .; then
      FOUND_RPC="1"
      return 0
    fi
  elif [ -f "$root" ]; then
    case "$(basename "$root")" in
      rpc-server|libggml-rpc.so|libggml-rpc.dylib) FOUND_RPC="1" ;;
    esac
    local parent
    parent="$(dirname "$root")"
    if find "$parent" -maxdepth 2 \( -name 'rpc-server' -o -name 'libggml-rpc.so' -o -name 'libggml-rpc.dylib' \) 2>/dev/null | grep -q .; then
      FOUND_RPC="1"
    fi
  fi
}

# 1) Search obvious locations for binaries and RPC artifacts.
while IFS= read -r path; do
  [ -n "$path" ] || continue
  check_rpc_presence "$path"

  if [ -d "$path" ]; then
    for name in llama-cli llama-server rpc-server main; do
      if [ -x "$path/$name" ] && [ -z "$FOUND_BUILD" ]; then
        probe_binary "$path/$name" || true
      fi
    done
  elif [ -f "$path" ]; then
    probe_binary "$path" || true
  fi
done < <(collect_candidates)

# 2) PATH fallback.
if [ -z "$FOUND_BUILD" ]; then
  for cmd in llama-cli llama-server rpc-server; do
    if have "$cmd"; then
      check_rpc_presence "$(command -v "$cmd")"
      probe_binary "$(command -v "$cmd")" || true
      [ -n "$FOUND_BUILD" ] && break
    fi
  done
fi

# 3) Debian package fallback for distro builds.
if [ -z "$FOUND_BUILD" ] && have dpkg-query; then
  if pkgver="$(dpkg-query -W -f='${Version}' llama.cpp 2>/dev/null || true)" && [ -n "$pkgver" ]; then
    # Debian tracker marks 8611+dfsg-1 as fixed.
    if dpkg --compare-versions "$pkgver" ge "8611+dfsg-1"; then
      echo PATCHED
      exit 0
    fi
    # Package present but version path is ambiguous without RPC build evidence.
    if [ "$FOUND_RPC" = "1" ]; then
      echo VULNERABLE
      exit 1
    else
      echo UNKNOWN
      exit 2
    fi
  fi
fi

# 4) Decide based on discovered upstream build number.
if [ -n "$FOUND_BUILD" ]; then
  if [ "$FOUND_BUILD" -ge 8492 ]; then
    echo PATCHED
    exit 0
  fi
  if [ "$FOUND_RPC" = "1" ]; then
    echo VULNERABLE
    exit 1
  fi
  echo UNKNOWN
  exit 2
fi

echo UNKNOWN
exit 2

07 · Bottom Line

If you remember one thing.

TL;DR

Monday morning, inventory every host and container image that ships rpc-server or libggml-rpc and check for listeners on port 50052, then immediately sort them into three groups: internet-exposed, internal-only, and dormant/unused. For this HIGH verdict, the noisgate mitigation SLA is ≤30 days: within that window, disable unused RPC, block the port to allowlisted peers only, and keep it off public interfaces. The noisgate remediation SLA is ≤180 days: upgrade upstream to b8492 or later or the appropriate distro-fixed package such as Debian 8611+dfsg-1. If you discover any externally reachable RPC node, treat that subset as an out-of-band sprint item rather than waiting for the full 180-day window.

Sources

Peer Review

What defenders are saying.

Submit a review attribution: handle + country only

0 flags selected · stored anonymously

Validation Results

Crowdsourced verification outputs.

Results submitted by users who ran the verification payload against their environment.