ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 4.00 3.00 2777
Heavily AI-edited 1 (25%) 4.00 3.00 1930
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 1 (25%) 4.00 4.00 2558
Fully human-written 1 (25%) 6.00 4.00 2001
Total 4 (100%) 4.50 3.50 2316
Title Ratings Review Text EditLens Prediction
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This submission investigates covert extraction of proprietary knowledge from retrieval-augmented generation (RAG) systems and proposes “IKEA,” an implicit, benign-query attack that grows “anchor concepts” via history-aware sampling and a trust-region mutation in embedding space. Evaluations across several corpora and model/retriever pairings suggest higher extraction efficiency than prompt-injection baselines and show that a substitute RAG assembled from harvested content retains non-trivial utility. Topic probing for unknown domains and simple adaptive/DP-style defenses are also explored to characterize security–utility trade-offs. 1. This paper clearly specifies a realistic black-box threat model for RAG and delineates attacker capabilities and constraints with precision. 2. Empirical coverage is broad, spanning multiple LLM–retriever configurations and defenses, and the attack remains effective when common jailbreak/prompt-injection attacks are blocked. 3. The method is straightforward and reproducible—anchor-based benign queries guided by history-aware sampling and a cosine-bounded trust-region mutation—with prompts and hyperparameters disclosed. 1. Algorithmic novelty feels limited; the core components amount to history-penalized sampling and cosine-bounded mutations without formal coverage or sample-complexity guarantees. 2. This paper depends on a known or easily probed domain topic and centralized corpus semantics, making generalization to heterogeneous, multi-topic enterprise deployments uncertain. 3. This paper’s defense study leans on simplistic or utility-destroying mechanisms and omits deployable strategies like per-client rate limiting, query-set anomaly detection, and semantic drift monitoring. 4. This paper lacks an end-to-end economic analysis of the attack (token/time costs and sensitivity to generator quality), which is crucial for real-world risk assessment. No more questions. Heavily AI-edited
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper presents IKEA, a method that extracts knowledge from RAG using queries. IKEA stays stealthy by creating natural queries built from anchor concepts. The method has two parts. Experience Reflection Sampling chooses concepts that are likely linked to the RAG’s internal knowledge based on past query results. Trust Region Directed Mutation changes anchor concepts within a set similarity range to find new and related information more effectively. Experiemnts show that IKEA performs much better than other methods. The extracted knowledge can also be used to build a working substitute RAG system. - The paper studies an important security issue in RAG systems : extraction attacks. Its focus on harmless-looking queries makes it different from most past work. - IKEA method is explained in a clear and direct way. Figure 1 gives a clear summary of the process. - Experiments use several settings. Results show that IKEA keeps high EE and ASR while passing basic defenses. This is a strong finding. - Code is provided. It seems make sense. - The tested defenses are not enough. Stronger and more realistic defenses include semantic output filtering, consistency checks, detection of repeated probing, or methods for iterative query attacks. - I am not sure about the main assumption that the RAG topic is fixed and known limits how well the method can be used in other cases. - The results is not enough to support the claim that the substitute RAG performs “comparably” (Sec 4.5). Three metrics cannot measure many other aspects . - The cost in time, API calls, and total query rounds isnot clear. This may make the attack too expensive for extracting large knowledge bases. 1. See weakness 2. The topic probing method seems crucial for practical applicability. Could you provide more details on its robustness? 3. While IKEA avoids malicious prompts, could the pattern of queries generated be detectable by analyzing query sequences over time using anomaly detection techniques? Fully human-written
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper proposes IKEA, a “benign-query” knowledge-extraction attack on RAG. It combines (i) Experience Reflection sampling over “anchor concepts” and (ii) Trust-Region Directed Mutation (TRDM) to explore the embedding space, and evaluates against RAG-Thief and DGEA under input/output defenses, reporting higher extraction efficiency and attack success. 1. The paper is well-written and easy to follow 2. The studied topic is important and novel 1. The paper evaluates against only two prior attacks, RAG-Thief (prompt-injection) and DGEA (jailbreak), even though the Related Work section lists additional, closely related extraction methods (e.g., Pirates of the RAG / adaptive black-box extraction) that are not included as baselines. This makes the claimed superiority (“surpassing baselines by 80%+”) hard to trust. At minimum, strong black-box, non-jailbreak/PIK variants and adaptive coverage attacks should be implemented. More discussion on the related works is needed. 2. “Semantic Similarity (SS)” uses an encoder to compare outputs with retrieved docs, favoring paraphrase-style extraction (IKEA) over verbatim baselines, while CRR (ROUGE-L) penalizes paraphrase. Claims hinge on SS/EE/ASR; there is no human audit of copyright risk nor independent leakage criteria. Copyright/privacy stakes aren’t well reflected by SS alone. 3. HealthCare-100k, HarryPotterQA, and Pokémon are niche; Pokémon is explicitly chosen as low-overlap with pretraining. Results may not generalize to enterprise RAG (contracts, support logs, medical records), where policy, formatting, and noise differ. 4. The main setup assumes a known domain topic; the “unknown topic” setting still uses a bespoke topic-probing stage powered by a secondary LLM, then evaluates almost identically—this weakens the claim that IKEA remains benign and practical under stricter assumptions. 5. Replacing Top-K with off-topic docs predictably tanks both the attack and benign utility to near zero (Table 4), which is not an acceptable real-world mitigation, so it doesn’t inform deployers what works. 6. The pipeline and equations are clear, but the headline claim (“surpassing baselines by >80% efficiency, >90% success”) rests on a baseline set that is neither representative nor matched to IKEA’s benign-query regime. Without stronger baselines, the empirical claim reads overstated. 1. Add competitive benign-query baselines: random/diversity sampling; k-center or farthest-point query selection; BM25 lexical sweeps; self-ask/chain-expansion; an adaptive coverage agent; and a re-implementation of adaptive black-box extraction from the works already cited in §5. 2. at least one enterprise-style corpus with policy/PII-like structure, and long-document settings that stress retrieval/reranking. Fully AI-generated
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper studies extraction of documents from a RAG knowledge base. Instead of using malicious queries, the authors use benign queries repeatedly to collect RAG answers as stolen knowledge, and propose several tricks to improve search efficiency—e.g., avoiding duplicate retrievals and increasing coverage. Experiments evaluate metrics including extraction efficiency and the downstream performance of RAG systems reconstructed from the stolen knowledge. 1. The paper addresses an important problem by studying the privacy risks of RAG systems under more realistic settings—specifically, black-box access with defenses in place. 2. Compared with baselines, the proposed method demonstrates stronger robustness against defended RAG systems, successfully extracting more knowledge when defenses are applied. 3. The paper is well-written, and the overall idea is intuitive and easy to follow. 1. The idea of using query–response semantic distance as a proxy for local RAG density is based purely on intuition, without further discussion. The paper does not provide references or experiments to validate this assumption. 2. The evaluation includes only two baselines, while several other relevant methods are mentioned but not compared experimentally. 3. The extracted documents achieve low ROUGE scores (below 0.3, Table 1), indicating that the extracted content fails to accurately recover the original documents. This limits the practical implications for privacy or copyright concerns. 4. Some metric definitions are unclear. For example, *extraction efficiency* depends on the number of “unique” extracted documents, but the notion of uniqueness is not specified. Moreover, since the method does not reconstruct original documents, comparability of this metric with prior work is questionable. Similarly, the definition of *ASR*—the ratio of non-rejected queries—does not directly measure extraction success. 5. The proposed method introduces many hyperparameters (over ten), which may be difficult to tune in practice. The paper provides little discussion on how these parameters are chosen. 6. Ablation results show only marginal improvements over random baselines (Table 13), particularly for ASR, CRR, and SS metrics, raising concerns about the actual effectiveness of the proposed approach. In the evaluation, some methods such as DEGA achieve high ROUGE scores (up to 0.96 in Table 1) in the no-defense setting, suggesting near-literal copying. However, their embedding similarity remains relatively low. What are the possible reasons? Lightly AI-edited
PreviousPage 1 of 1 (4 total rows)Next