CVE-2026-41481 · LangChain is a framework for building agents and LLM-powered applic…

01 · The Real Story

This is a mailroom clerk who checks the envelope but follows whatever forwarding address is scribbled inside

In langchain-text-splitters versions before 1.1.2, HTMLHeaderTextSplitter.split_text_from_url() validates the *initial* URL, then fetches it with redirect following enabled. An attacker can submit a public URL they control, return a 302 to localhost, RFC1918 space, or headerless metadata endpoints, and make the application perform a server-side fetch that the original validation was supposed to block.

The vendor's 6.5 / MEDIUM is technically fair in a vacuum, but it overstates the enterprise-wide patch urgency. This is a library helper flaw in one specific method, not a product-wide pre-auth takeover: exploitation requires an application feature that accepts attacker-supplied URLs, meaningful impact usually requires the app to expose the fetched Document content back to the requester, and several major cloud metadata services are blunted by required headers.

"Real bug, but it only bites apps that ingest attacker URLs and hand the fetched content back out."

02 · The Attack Path

4 steps from start to impact.

STEP 01

Reach a URL-ingestion path

The attacker must find an application path that passes untrusted URLs into HTMLHeaderTextSplitter.split_text_from_url(). The weaponized component is the vulnerable LangChain helper itself, usually embedded in custom RAG ingestion, scraping, or document-processing code rather than exposed as a named product endpoint.

Conditions required:

The target application uses langchain-text-splitters < 1.1.2
Developer code calls HTMLHeaderTextSplitter.split_text_from_url()
An external or untrusted user can influence the URL argument

Where this breaks in practice:

Most LangChain deployments do not expose this exact helper directly to the internet
Many pipelines ingest files or trusted feeds, not arbitrary user URLs
SCA will find the package version, but not whether this specific method is reachable

Detection/coverage: Dependency scanners can flag the vulnerable package/version reliably. Reachability is the hard part: you need code search, SAST, or runtime tracing to prove the method is invoked with tainted input.

STEP 02

Use a public URL that bounces inward

The attacker hosts a benign-looking public URL and returns a 302 or 301 to an internal target. The weaponized tool is ordinary HTTP redirect handling in Python requests, which follows the redirect after the initial validate_safe_url() check has already passed.

Conditions required:

The application can make outbound HTTP(S) requests
Redirects are not disabled in the call path
The attacker controls the initial web server response

Where this breaks in practice:

Egress filtering, proxy policy, or deny-by-default outbound rules can stop the callback
Some apps sanitize or normalize URLs before passing them into the splitter
Apps behind strict SSRF proxy wrappers may never hit the vulnerable flow

Detection/coverage: Network telemetry may show unexpected requests from app hosts to localhost, RFC1918, or 169.254.169.254 immediately after a request to an attacker domain, but many shops do not log outbound app-layer redirects well.

STEP 03

Fetch an internal resource that does not need extra auth headers

Once redirected, the vulnerable code fetches the internal target and parses the body into Document objects. The practical weaponized target is a headerless internal HTTP service such as a local admin panel, unauthenticated health/debug endpoint, or AWS IMDSv1; services requiring special headers like AWS IMDSv2, GCP metadata, or Azure metadata are materially harder through this bug.

Conditions required:

The redirected endpoint is reachable from the app host
The endpoint responds without attacker-controlled custom headers
The response body contains useful data

Where this breaks in practice:

Modern cloud metadata protections reduce the cloud-credential jackpot
Internal admin APIs commonly require auth, mTLS, or source restrictions
The impact is confidentiality-focused; there is no native code execution path here

Detection/coverage: Host or VPC flow logs can catch calls to link-local or internal addresses from app nodes. Runtime application security tools that model SSRF sinks may catch the fetch, but generic EDR usually will not.

STEP 04

Get the data back out

The final step is application-dependent exfiltration. The weaponized behavior is the application's own handling of the returned Document objects: if the app echoes raw content, summarizes it back to the requester, or stores it where the attacker can read it, the SSRF turns into meaningful data leakage.

Conditions required:

The application returns or exposes the parsed content or its derivatives
The attacker can observe the response, downstream artifact, or retrieval result

Where this breaks in practice:

Many ingestion workflows store documents internally and never reflect them to the caller
Chunking, parsing, or downstream prompt logic may discard or mangle the sensitive payload
Security reviews often block obvious echo-back patterns in production apps

Detection/coverage: This is poorly covered by network scanners. You need app logs, tracing of ingestion jobs, and content-level telemetry to see whether fetched internal data is being reflected or stored in attacker-visible places.

03 · Intelligence Metadata

The supporting signals.

In-the-wild status	No public exploitation evidence located in the sources reviewed, and not listed in CISA KEV.
Proof-of-concept availability	No widely circulated standalone PoC repo found. The GitHub advisory itself includes a complete attack scenario using an attacker-controlled redirect.
EPSS	`0.00042` (0.042%) from the provided intel — very low predicted exploitation likelihood; treat it as weak threat signal, not a risk score.
KEV status	Not in CISA KEV; no KEV add date or federal due date applies.
CVSS vector	`CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:N` — network reachable, no auth needed, but user/app interaction is required and the impact is confidentiality-only.
Affected versions	`langchain-text-splitters` < 1.1.2.
Fixed versions	Upgrade to `langchain-text-splitters` >= 1.1.2; the advisory notes this patched line requires `langchain-core` >= 1.2.31.
Exposure reality	This is a library issue, so Shodan/Censys-style internet counts are mostly irrelevant. Exposure lives in custom application code paths that accept attacker URLs and route them into this helper.
Deployment footprint	The package is common in the ecosystem — PyPI Stats shows tens of millions of monthly downloads — but that does not mean tens of millions of exposed attack surfaces.
Disclosure / reporter	Disclosed 2026-04-24 via GitHub CNA / advisory publication window; GitHub credits reporter Aeg1sx.

04 · The Call

noisgate verdict.

Final Verdict

↓ DOWNGRADED to LOW (4.2/10)

The decisive downward pressure is reachability: this only matters where a developer wired a specific LangChain helper to attacker-controlled URLs. Even then, the worst outcomes usually require a second application design mistake — reflecting fetched Document content back to the user — so the exposed population is much smaller than the package install base.

HIGH Affected version range and patch version

MEDIUM Real-world exploitability across enterprise deployments

MEDIUM Absence of public exploitation / PoC evidence

Why this verdict

Vendor baseline starts at 6.5, but reachability is narrow: this is not every LangChain app; it is apps that specifically call HTMLHeaderTextSplitter.split_text_from_url() with attacker influence.
Downward adjustment for attacker position reality: the attacker is unauthenticated *only if* the application already offers a user-controlled URL ingestion feature. In most enterprises, that is a niche workflow, not a ubiquitous exposed surface.
Downward adjustment for impact dependency: meaningful exfiltration often requires the application to return the fetched Document content or derivatives to the attacker. SSRF without observable output is much less valuable.
Downward adjustment for cloud hardening: the advisory explicitly notes services requiring special headers — including IMDSv2/GCP/Azure metadata patterns — are not reachable through this bug, shrinking the common 'easy credential theft' path.
Kept above IGNORE because the control bypass is real: it defeats an explicit SSRF guard and can still hit localhost, RFC1918 services, and AWS IMDSv1 where those are reachable.

Why not higher?

There is no product-wide internet-exposed endpoint here, no built-in auth bypass, and no direct RCE or integrity impact. Every important step after package presence depends on local application design choices: user-controlled URL input, outbound access, a useful internal target, and some path for the attacker to see the content.

Why not lower?

This is still a real SSRF protection bypass, not a theoretical code smell. If you have a public-facing ingestion endpoint using this helper, the attacker can coerce server-side fetches to sensitive internal locations and may recover useful data, so writing it off entirely would be sloppy.

05 · Compensating Control

What to do — in priority order.

Find the call sites — Code-search for split_text_from_url( and inventory every app path that can pass untrusted URLs into it. For a LOW verdict there is no SLA (treat as backlog hygiene), but you still want this mapped during the next dependency review because reachability decides everything.
Block attacker-controlled URL ingestion — If the feature is internet-facing, stop accepting arbitrary URLs or move fetching to a tightly controlled allowlist/proxy model. For LOW, there is no formal mitigation deadline, so fold this into normal hardening work rather than an emergency change.
Enforce outbound egress policy — Deny app hosts from reaching 127.0.0.0/8, RFC1918 ranges they do not need, and 169.254.169.254 unless there is a documented requirement. This reduces SSRF blast radius regardless of package version and belongs in standard platform guardrails.
Disable or strip redirects in wrappers — If you cannot patch quickly, wrap URL fetches so redirects are disabled or every redirect target is revalidated before follow. Treat that as normal engineering backlog for a LOW issue unless your own exposure review shows an internet-facing reachable path.
Stop echoing fetched content to requesters — Do not return raw fetched documents, summaries, or chunk previews from untrusted URL ingestion without a security review. That breaks the exfiltration leg of the chain even if the SSRF fetch still happens.

What doesn't work

Relying on the old validate_safe_url() check alone does not work; the bug is specifically that the redirect target was not revalidated.
A perimeter WAF usually does not help much because the dangerous request is outbound from your application to internal targets, not inbound exploit syntax hitting a public web tier.
Blocking only 169.254.169.254 is too narrow; localhost and other internal HTTP services remain viable SSRF targets.
Generic EDR on the host is not a strong preventive control here; this is normal application HTTP behavior, not malware execution.

06 · Verification

Crowdsourced verification payload.

Run this inside the same Python environment as the target application — on the host, in the container, or in CI against the built image. Invoke it with python3 verify_cve_2026_41481.py; it needs no elevated privileges and checks installed package versions for langchain-text-splitters and langchain-core.

noisgate-verify.py

PYTHONREAD-ONLYSAFE

#!/usr/bin/env python3
"""
Verify exposure to CVE-2026-41481 in langchain-text-splitters.
Outputs one of: VULNERABLE / PATCHED / UNKNOWN
Exit codes: 0=PATCHED, 1=VULNERABLE, 2=UNKNOWN
"""

from __future__ import annotations
import sys
from importlib import metadata


def parse_version(v: str):
    parts = []
    for token in v.replace('-', '.').split('.'):
        num = ''
        for ch in token:
            if ch.isdigit():
                num += ch
            else:
                break
        parts.append(int(num) if num else 0)
    return tuple(parts)


def get_version(dist_name: str):
    try:
        return metadata.version(dist_name)
    except metadata.PackageNotFoundError:
        return None
    except Exception:
        return 'ERROR'


def main() -> int:
    splitters = get_version('langchain-text-splitters')
    core = get_version('langchain-core')

    if splitters is None:
        print('UNKNOWN: langchain-text-splitters is not installed in this Python environment')
        return 2
    if splitters == 'ERROR':
        print('UNKNOWN: failed to read langchain-text-splitters package metadata')
        return 2

    if parse_version(splitters) < parse_version('1.1.2'):
        print(f'VULNERABLE: langchain-text-splitters {splitters} < 1.1.2')
        return 1

    if core is None:
        print(f'UNKNOWN: langchain-text-splitters {splitters} is >= 1.1.2 but langchain-core is not installed/readable')
        return 2
    if core == 'ERROR':
        print(f'UNKNOWN: langchain-text-splitters {splitters} is >= 1.1.2 but langchain-core metadata could not be read')
        return 2

    if parse_version(core) < parse_version('1.2.31'):
        print(f'UNKNOWN: langchain-text-splitters {splitters} is >= 1.1.2 but langchain-core {core} < 1.2.31 (advisory says patched line requires >= 1.2.31)')
        return 2

    print(f'PATCHED: langchain-text-splitters {splitters} and langchain-core {core} satisfy advisory minimums')
    return 0


if __name__ == '__main__':
    sys.exit(main())

07 · Bottom Line

If you remember one thing.

TL;DR

Monday morning, do not burn an emergency change window on package presence alone. First, identify whether any internet- or partner-reachable application actually feeds attacker-controlled URLs into HTMLHeaderTextSplitter.split_text_from_url(); if you find a reachable path, remove arbitrary URL ingestion or constrain egress as a temporary control during normal engineering work, but for this LOW reassessment there is no noisgate mitigation SLA and no noisgate remediation SLA — treat it as backlog hygiene unless your own reachability review proves a public exploit path, in which case patching langchain-text-splitters to 1.1.2+ should move into the next planned dependency release cycle rather than waiting indefinitely.

Sources

Peer Review

What defenders are saying.

Submit a review attribution: handle + country only

0 flags selected · stored anonymously

Validation Results

Crowdsourced verification outputs.

Results submitted by users who ran the verification payload against their environment.