This is a loaded nail gun left in the AI cluster, not a landmine buried across your whole fleet
CVE-2026-34159 is an unauthenticated remote code execution bug in the ggml-rpc backend used by llama.cpp for distributed inference. In vulnerable builds, deserialize_tensor() skips bounds validation when an incoming tensor sets buffer=0, letting a remote client turn crafted GRAPH_COMPUTE messages into arbitrary process memory read/write and then code execution. Upstream/NVD track the vulnerable range as prior to b8492; the initial GitHub advisory was published before the fix landed and still shows <= b7991, so use the later NVD/Debian fixed boundary for patch decisions.
The vendor's 9.8/CRITICAL score is technically fair for a reachable target: no auth, low complexity, full RCE. But for enterprise prioritization it overstates population risk because exploitation requires an optional RPC build flag (-DGGML_RPC=ON) and a service that operators have actually exposed beyond localhost. That narrows affected hosts sharply compared with a normal internet-facing daemon, so this drops one bucket to HIGH unless you already know you run reachable rpc-server nodes.
4 steps from start to impact.
Find exposed RPC nodes with nmap
rpc-server listeners on TCP 50052 or a custom port used for distributed inference. In real environments this is usually east-west discovery inside a flat AI segment, though some operators also publish it externally for multi-host inference or debugging.- Attacker has TCP reachability to the
rpc-serverport - Target was built with
-DGGML_RPC=ONand is actually runningrpc-server
- Most
llama.cppinstalls do not use the RPC backend at all - Official RPC examples default to
127.0.0.1:50052, so external reachability often requires deliberate reconfiguration - Many enterprises isolate GPU nodes behind internal-only VLANs, SGs, or Kubernetes network policy
ggml-rpc. NetFlow and host firewall telemetry on port 50052 are more reliable than signature-based scanning.Speak raw RPC with a custom PoC client
- Attacker can send arbitrary TCP payloads to the RPC listener
- The target accepts RPC traffic from the attacker's source network
- This is not commodity HTTP exploitation; the attacker needs protocol knowledge or a ready-made write-up
- Middleboxes that only proxy HTTP will not help the attacker reach this service
ggml-rpc logs. IDS coverage is likely thin unless you write your own decoder.Leak process pointers via ALLOC_BUFFER and BUFFER_GET_BASE
- Target permits the normal RPC buffer-management commands
- Attacker can complete enough protocol exchanges to harvest address information
- Some deployments may log or rate-limit repeated buffer allocation activity
- EDR on the host may notice follow-on abnormal memory behavior even if it misses the protocol abuse itself
rpc-server buffer operations or abnormal crash dumps is more useful than perimeter signatures.Trigger GRAPH_COMPUTE null-buffer deserialization for arbitrary R/W and RCE
GRAPH_COMPUTE tensors with buffer=0, causing deserialize_tensor() to skip validation and trust attacker-controlled pointers. From there they gain arbitrary read/write in the server process and can hijack function pointers for code execution as the service user, which the advisory notes is often root in Docker deployments.- Vulnerable build earlier than
b8492or distro package lacking the backport - Service process runs with permissions valuable enough to matter
- A non-root runtime and tight container isolation reduce blast radius after code execution
- Aggressive seccomp/AppArmor/SELinux profiles can limit post-exploit actions even if the crash-to-RCE step succeeds
ggml-rpc errors, core dumps, or EDR memory-corruption alerts if the exploit is noisy. There is little off-the-shelf scanner coverage for the exact vulnerable code path.The supporting signals.
| In-the-wild status | No confirmed active exploitation found in the authoritative sources reviewed; not listed in CISA KEV. |
|---|---|
| Proof-of-concept availability | High confidence exploitability. The GitHub advisory states a full chain to RCE, and the public fix/PR (#20908, author las7) explains the vulnerable path clearly enough for reproduction. |
| EPSS | 0.00534 (0.534%) from the intel you supplied; a third-party CVE mirror reports roughly P67-P68 percentile, which is low-to-middle rather than hotly exploited. |
| KEV status | Not KEV-listed as of the sources reviewed; no CISA due date applies. |
| CVSS vector reality check | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H is accurate only after reachability exists. In practice, AV:N is narrowed by an optional build flag and frequent localhost-only binding. |
| Affected versions | Use upstream/NVD boundary: all versions before b8492. The original GHSA page still shows <= b7991, reflecting publication before the fix landed. |
| Fixed versions | Upstream fix is commit 39bf0d3... / build b8492. Debian marks the issue fixed in package 8611+dfsg-1; Ubuntu says 26.04 LTS not affected, older maintained releases mostly do not ship the package. |
| Exposure reality | Public scan evidence is weak, but official docs show the service defaults to 127.0.0.1:50052 and must be explicitly built with -DGGML_RPC=ON. That is meaningful downward pressure on broad-fleet urgency. |
| Disclosure timeline | GitHub advisory published 2026-03-26; CVE/NVD published 2026-04-01; NVD later added the b8492 fix boundary on 2026-04-30. |
| Reporter / research context | Public reporting and the patch trail point to las7 as the researcher/fix author; the advisory says the issue was reported to CERT/CC on 2026-02-08 before direct disclosure to maintainers. |
noisgate verdict.
The single biggest reason this is not CRITICAL fleet-wide is that exploitation depends on an optional RPC backend that must be intentionally built and reachable, which sharply limits exposed population compared with a default network service. It still lands in HIGH because once that prerequisite is met, the chain is pre-auth, low-friction, and ends in full process compromise on high-value GPU hosts.
Why this verdict
- Downgrade: optional feature gate — this is not reachable on a normal
llama.cppinstall; the vulnerable path requires a build with-DGGML_RPC=ONand an activerpc-serverdeployment. - Downgrade: attacker position is narrower than CVSS implies — the service defaults to
127.0.0.1:50052, so the attacker usually needs prior east-west foothold or an operator who deliberately exposed the port. - Upgrade pressure: full unauthenticated RCE on valuable nodes — if you do run reachable RPC nodes, the exploit chain needs no credentials and lands on GPU hosts that often have broad internal trust and expensive compute attached.
- Downgrade: low threat telemetry — no KEV entry and the supplied EPSS is low, which argues against treating every mention of
llama.cppas an emergency across a 10,000-host estate. - Upgrade pressure: container practice can amplify impact — the advisory explicitly notes the process often runs as
rootin Docker, which turns a single service bug into a host/container-control event faster than many web-tier RCEs.
Why not higher?
This is not a ubiquitous listener like SSH, a browser, or a default enterprise management plane. The exploit population is trimmed by two compounding prerequisites: an explicit RPC build and network reachability to a service that commonly binds localhost by default. Those are real-world narrowing factors, not theoretical edge cases.
Why not lower?
Once the vulnerable service is reachable, there is very little defender friction left: no auth, no user action, and a clear path to arbitrary read/write then RCE. The target class also matters — GPU inference nodes often sit on trusted internal segments and may run privileged containers, so compromise value is high even if population is small.
What to do — in priority order.
- Disable unused RPC services — Stop and remove
rpc-serveranywhere distributed inference is not actively required. For a HIGH verdict, deploy this compensating control within 30 days; it removes the reachable attack surface instead of trying to detect malformed protocol traffic. - Allowlist port 50052 — Restrict the RPC port to explicit peer IPs or cluster subnets with host firewalls, security groups, or Kubernetes NetworkPolicy. Apply within 30 days so only known inference peers can reach the service, cutting off opportunistic east-west abuse.
- Keep RPC off public interfaces — Do not bind
rpc-serverto0.0.0.0unless you have an explicit private transport design around it; prefer loopback or tightly scoped internal addresses. Enforce within 30 days because the whole vendor risk model becomes accurate the moment you make the port broadly reachable. - Run as non-root with confinement — Move the service to a non-root UID and apply container/runtime restrictions such as seccomp, AppArmor, SELinux, read-only mounts, and minimal capabilities. Put this in place within 30 days to reduce post-exploit blast radius on the nodes you cannot patch immediately.
- Segment GPU inference nodes — Treat multi-host LLM inference as a dedicated trust zone instead of a flat server LAN. Implement within 30 days so an initial foothold elsewhere in the environment does not automatically become reachability to every
rpc-server.
- A WAF or API gateway in front of
llama-serverdoes not protect the rawggml-rpclistener on 50052. - Relying on the fact that the service defaults to localhost does not help if your deployment scripts, Docker publish flags, or Kubernetes Services have already widened exposure.
- MFA is irrelevant because the bug is pre-auth and hits a custom TCP service, not an interactive login flow.
- EDR alone is not a preventive control here; it may catch the memory-corruption aftermath, but it does not stop the protocol flaw from being reachable.
Crowdsourced verification payload.
Run this on the target Linux host or container image that ships llama.cpp/rpc-server. Invoke it as bash verify-cve-2026-34159.sh /path/to/llama/binaries or just bash verify-cve-2026-34159.sh; no root is required, but local filesystem access to the installed binaries makes detection more reliable.
#!/usr/bin/env bash
# verify-cve-2026-34159.sh
# Checks whether a local llama.cpp installation is likely vulnerable to CVE-2026-34159.
# Logic:
# - If upstream build number >= 8492, report PATCHED.
# - If upstream build number < 8492 AND RPC components are present, report VULNERABLE.
# - If version cannot be mapped cleanly, or build is old but no RPC component is found, report UNKNOWN.
# Exit codes: 0=PATCHED, 1=VULNERABLE, 2=UNKNOWN
set -euo pipefail
TARGET_ROOT="${1:-}"
FOUND_VERSION=""
FOUND_BUILD=""
FOUND_RPC="0"
have() { command -v "$1" >/dev/null 2>&1; }
add_candidate() {
local p="$1"
[ -n "$p" ] || return 0
[ -e "$p" ] || return 0
printf '%s\n' "$p"
}
collect_candidates() {
{
[ -n "$TARGET_ROOT" ] && add_candidate "$TARGET_ROOT"
[ -n "$TARGET_ROOT" ] && add_candidate "$TARGET_ROOT/bin"
add_candidate "$(pwd)"
add_candidate "$(pwd)/build/bin"
add_candidate "/usr/local/bin"
add_candidate "/usr/bin"
add_candidate "/opt"
add_candidate "/app"
add_candidate "/app/build/bin"
} | awk '!seen[$0]++'
}
extract_build() {
local text="$1"
# Match upstream-style bNNNN first.
if [[ "$text" =~ (^|[^A-Za-z0-9])b([0-9]{4,})([^A-Za-z0-9]|$) ]]; then
printf '%s' "${BASH_REMATCH[2]}"
return 0
fi
# Fallback: look for 'build 8492' or 'version 8492'.
if [[ "$text" =~ (build|version)[^0-9]{0,8}([0-9]{4,}) ]]; then
printf '%s' "${BASH_REMATCH[2]}"
return 0
fi
return 1
}
probe_binary() {
local bin="$1"
local out=""
if [ ! -x "$bin" ]; then
return 1
fi
out="$({ "$bin" --version || "$bin" -v || true; } 2>&1 | head -n 5)"
[ -n "$out" ] || return 1
if build="$(extract_build "$out")"; then
FOUND_VERSION="$out"
FOUND_BUILD="$build"
return 0
fi
return 1
}
check_rpc_presence() {
local root="$1"
[ -e "$root" ] || return 0
if [ -d "$root" ]; then
if find "$root" -maxdepth 4 \( -name 'rpc-server' -o -name 'libggml-rpc.so' -o -name 'libggml-rpc.dylib' \) 2>/dev/null | grep -q .; then
FOUND_RPC="1"
return 0
fi
elif [ -f "$root" ]; then
case "$(basename "$root")" in
rpc-server|libggml-rpc.so|libggml-rpc.dylib) FOUND_RPC="1" ;;
esac
local parent
parent="$(dirname "$root")"
if find "$parent" -maxdepth 2 \( -name 'rpc-server' -o -name 'libggml-rpc.so' -o -name 'libggml-rpc.dylib' \) 2>/dev/null | grep -q .; then
FOUND_RPC="1"
fi
fi
}
# 1) Search obvious locations for binaries and RPC artifacts.
while IFS= read -r path; do
[ -n "$path" ] || continue
check_rpc_presence "$path"
if [ -d "$path" ]; then
for name in llama-cli llama-server rpc-server main; do
if [ -x "$path/$name" ] && [ -z "$FOUND_BUILD" ]; then
probe_binary "$path/$name" || true
fi
done
elif [ -f "$path" ]; then
probe_binary "$path" || true
fi
done < <(collect_candidates)
# 2) PATH fallback.
if [ -z "$FOUND_BUILD" ]; then
for cmd in llama-cli llama-server rpc-server; do
if have "$cmd"; then
check_rpc_presence "$(command -v "$cmd")"
probe_binary "$(command -v "$cmd")" || true
[ -n "$FOUND_BUILD" ] && break
fi
done
fi
# 3) Debian package fallback for distro builds.
if [ -z "$FOUND_BUILD" ] && have dpkg-query; then
if pkgver="$(dpkg-query -W -f='${Version}' llama.cpp 2>/dev/null || true)" && [ -n "$pkgver" ]; then
# Debian tracker marks 8611+dfsg-1 as fixed.
if dpkg --compare-versions "$pkgver" ge "8611+dfsg-1"; then
echo PATCHED
exit 0
fi
# Package present but version path is ambiguous without RPC build evidence.
if [ "$FOUND_RPC" = "1" ]; then
echo VULNERABLE
exit 1
else
echo UNKNOWN
exit 2
fi
fi
fi
# 4) Decide based on discovered upstream build number.
if [ -n "$FOUND_BUILD" ]; then
if [ "$FOUND_BUILD" -ge 8492 ]; then
echo PATCHED
exit 0
fi
if [ "$FOUND_RPC" = "1" ]; then
echo VULNERABLE
exit 1
fi
echo UNKNOWN
exit 2
fi
echo UNKNOWN
exit 2
If you remember one thing.
rpc-server or libggml-rpc and check for listeners on port 50052, then immediately sort them into three groups: internet-exposed, internal-only, and dormant/unused. For this HIGH verdict, the noisgate mitigation SLA is ≤30 days: within that window, disable unused RPC, block the port to allowlisted peers only, and keep it off public interfaces. The noisgate remediation SLA is ≤180 days: upgrade upstream to b8492 or later or the appropriate distro-fixed package such as Debian 8611+dfsg-1. If you discover any externally reachable RPC node, treat that subset as an out-of-band sprint item rather than waiting for the full 180-day window.Sources
What defenders are saying.
Crowdsourced verification outputs.
Results submitted by users who ran the verification payload against their environment.