MosaicLeaks: Your AI Research Agent Is Leaking Company Secrets

Brutalist still life: matte black ceramic tile stack viewed dead-on, the upper tile engraved with the single word 'LEAK' in thin sans-serif letters, a small cobalt-blue glass paperweight shaped like a key with a thin red thread running through it resting off-center on the rim, pure black background with a single thin cyan horizontal stripe across the upper third, hard overhead institutional light casting the key's shadow straight down onto the tile face, no people, no warmth, monochrome with cyan accent — Generated via ComfyUI / SDXL Base 1.0 (seed 20260619)

A single web query looks harmless. The cumulative query log does not. Deep research agents are quietly broadcasting private documents into public traffic — and the breach is in the query string, not the answer.

ServiceNow and HuggingFace published MosaicLeaks on Wednesday. The paper measures a privacy risk nobody has been tracking: the outbound query log of a deep-research agent running over private enterprise documents. The number is 34.0%. That is the share of private, fact-bearing questions that an adversary LLM, watching only the queries the agent emits to the public web, can answer correctly. No access to the private documents. No access to the agent's reasoning chain. Just the queries. The queries are enough.

Every company that has wired a deep-research agent into a private corpus and a public search engine is exposed to this attack. The exposure is not theoretical. The exposure is in the query log of every agent that does RAG over internal documents and then searches the web to verify, cite, or extend. The leakage channel is one nobody has been monitoring as a security surface: the outbound query stream.

What MosaicLeaks Actually Measures

The benchmark is one thousand and one multi-hop research chains. Each chain interleaves passages from a private enterprise corpus with retrieval over a public web corpus, the way a real deep-research agent works. The adversary is a separate LLM that observes only the agent's outbound queries — never the private documents, never the intermediate reasoning, never the agent's tool calls against the private corpus. The adversary's job is to infer, from the queries alone, what the agent is researching and what private questions it is trying to answer.

The leakage is graded at three levels. Intent leakage: the adversary can reconstruct the topic of the research chain — say, "internal compensation bands" or "Q3 pipeline forecast" — well enough to confirm the agent is working on a sensitive subject. Answer leakage: the adversary can answer the underlying private question, given only the queries. Full-information leakage: the adversary can state, with verifiable truth, specific private facts from the corpus — names, numbers, contract terms — drawn entirely from the query trail.

The baseline model in the paper is Qwen3-4B running a standard deep-research agent loop. On the strictest chain-success metric it scores 48.7%. On the combined answer and full-information leakage metric it scores 34.0%. A third of the time, the queries alone give the adversary the answer. That is the floor. Everything else in the paper makes the floor move.

Reinforcement Learning Makes It Worse

The natural reaction to a privacy paper is to train the agent harder on the task and hope it becomes more careful. The paper tested exactly that. With task-only reinforcement learning, chain success climbs to 59.3% — a real improvement. Leakage climbs to 51.7%. The model is doing more with each query, packing more context into the retrieval requests so the answers come back faster and cleaner. The side effect is that each query now carries more of the private document into the public stream. The adversary gets more fragments per query. The fragments reassemble better. Privacy degrades as capability improves.

This is the central finding. Privacy and capability are not independent axes in deep-research agents. They trade against each other in the loss landscape, and the trade gets worse as you push the agent harder on the task. The agent that wins your benchmark is the agent most likely to leak your documents. The agent that protects your documents is the agent that loses your benchmark. The paper names this tension explicitly. Most enterprise procurement documents will not.

The Privacy-Aware Fix That Actually Works

The paper proposes a method called Privacy-Aware Deep Research, PA-DR. The training signal rewards the agent not just for task success but for queries that do not let an adversary reconstruct the private context. The result is a model that scores 58.7% on chain success — almost identical to the task-only RL baseline — and 9.9% on leakage. A greater than three-times reduction in privacy loss with essentially no capability cost.

The mechanism is straightforward. The agent is trained to keep its queries generic, paraphrased, and information-sparse at the retrieval interface, even when its internal reasoning chain is dense with private detail. The model learns to use the private corpus as scratchpad and the public web as a citation layer that does not echo the scratchpad. The queries become shorter and less identifying. The leakage drops. The task holds. The benchmark result is reproducible from the open-source release on HuggingFace.

The intervention is not a prompt. Telling the agent "do not leak" drops leakage from 34.0% to 25.5% and degrades task success from 48.7% to 44.5%. The agent just makes fewer queries. The queries it does make are not safer in any reconstructive sense — they are simply rarer. Prompt-level interventions cannot solve this. Architectural and training-level interventions can.

Why the Query Log Is the Leakage Surface

Most enterprise security architectures treat the query log as a performance artifact, not as a data channel. Search engines log queries for ranking and observability. Vector databases log queries for debugging. The deep-research agent layer sits on top and emits a stream of public-web queries whose content is conditioned on whatever the agent has just read in the private corpus. Nobody is monitoring that stream as a privacy boundary because nobody thought it was one.

It is one. Every query the agent emits is observable to the public search provider, to any network observer in the middle, and to any adversary who can replay the query log. The privacy model for a deep-research agent has to include the query stream as an outbound data channel. The data that travels on it is conditioned on private documents. The conditioning is the leak.

Three enterprise consequences follow. First, deep-research agents deployed today over private corpora need query-log monitoring, query redaction, or query-shape constraints before they are safe to operate. The default configuration is leaking. Second, the legal exposure of an enterprise that has deployed a leaking agent against regulated data is not theoretical. Privacy regulators do not care whether the leak was a model artifact or a network packet — they care whether private data left the boundary. Third, the open-weights PA-DR checkpoint is available. The fix does not require a vendor relationship or a model retraining cycle. The fix requires reading the paper.

What to Watch Next

Two short-term signals and one longer-term one. Short term: enterprise vendors shipping deep-research agents will need to publish privacy benchmarks alongside capability benchmarks. MosaicLeaks gives them a yardstick. Agents that score 50% on leakage are not safe to deploy against regulated data. Agents that score under 10% are. The procurement question just got a quantitative answer. Short term: the open-source PA-DR checkpoint will get replicated, attacked, and benchmarked against closed frontier agents. If the privacy-cost-of-capability curve holds, the agent vendors will copy the training approach under their own names.

Longer term: the broader category of "agent + private corpus + public retrieval" will need a security architecture. The current pattern — private documents in, public queries out, no monitoring on the query boundary — is the equivalent of an enterprise that firewalls its database but broadcasts every SELECT to a public mirror. MosaicLeaks named the mirror. The remediation roadmap will run from query-log monitoring, through retrieval-shape constraints, into training-time privacy alignment. The companies that deploy agents without understanding the curve will be the companies that learn it from a regulator.

The paper was published Wednesday. The benchmark is on HuggingFace. The PA-DR checkpoint is open. The query log is the leak. The query log is fixable.

Sources & Links

This post was generated by New Horizon's autonomous editorial pipeline: topic selected from the daily news digest (source digest date 2026-06-19) for viral potential, drafted from the primary research source (HuggingFace / ServiceNow's MosaicLeaks writeup) and corroborating coverage from arXiv and HyperAI, and reviewed for factual accuracy and house style. Hero image generated via ComfyUI (SDXL Base 1.0, seed 20260619). The arguments and predictions are editorial — not investment advice, not vendor endorsement, not a consulting engagement.