ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 6.00 3.00 1958
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 2 (50%) 3.00 3.00 2272
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 1 (25%) 4.00 4.00 3258
Total 4 (100%) 4.00 3.25 2440
Title Ratings Review Text EditLens Prediction
Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper addresses the inefficiency of KV cache reuse in multi-agent LLM systems. Existing methods rely on strict prefix matching, making cache reuse rare under heterogeneous prompts. The authors propose a Segment-Level KV Cache Sharing mechanism that decomposes the cache into semantically coherent segments, enabling agents to reuse KV segments across contexts without prefix alignment. They implement a high-performance prototype, CrossKV, on top of the vLLM engine using PageAttention, which introduces a memory table for segment-level KV aliasing and retrieval. Extensive experiments on multiple models (Qwen2.5, Llama3) and agentic workflows (AutoGen, MAD, Solver) demonstrate up to 4.6× TTFT speedup and even performance gains in several reasoning benchmarks. The paper also analyzes the effect of positional encoding (RoPE) and proposes adaptive partial recomputation strategies. 1. Segment-level KV cache sharing breaks the prefix-matching bottleneck in LLM inference. 2. CrossKV is a complete, model-agnostic prototype built on vLLM. 3. Includes in-depth discussion on positional encoding, recomputation, and memory overhead. 1. Lack of formal theory explaining semantic stability of reused KV segments. 2. Limited comparison to other advanced caching methods (e.g., KVShare, CacheBlend). 3. RoPE correction introduces extra memory and computation overhead, and long-segment reuse may require partial recomputation, increasing system complexity. 4. Absence of large-scale real-world multi-agent case studies. 1. How are the semantic boundaries of segments detected or defined in practice? 2. Could segment aliasing introduce semantic drift or context leakage across agents? 3. How would CrossKV behave in cross-lingual or high-noise agent communication? 4. Is there any safeguard to prevent incorrect segment matching (hash collision or semantic mismatch)? 5. Could this approach be integrated with retrieval-augmented or external-memory systems? Fully AI-generated
Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing Soundness: 2: fair Presentation: 2: fair Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper tackles the issue of redundant computation in multi-agent LLM workflows, where agents often repeat chunks of text but prefix caching (strict reuse of KV cache based on prompt) rarely helps because prompts differ across roles and turns. The authors propose CrossKV, which shares segment-level KV cache. When a new query contains a previously generated span (i.e. continuous tokens), the system looks up that span’s hash in a "memory table" and aliases the current logical hashes to the same physical KV blocks in vLLM. This enables reuse even when prefixes don’t match. The evaluation take-away is: this content-driven reuse reduces redundant prefill/decoding across agents while remaining compatible with standard attention; the authors claim that positional (RoPE) shifts are usually negligible, and for rare long/complex cases the system caps sharing length and partially recompute to re-anchor positions. 1. Very timely and important research problem. Most prior work focuses on lossless and full KV cache reuse, e.g. prefix caching. CrossKV is a type of lossy and partial KV cache reuse, and we are already seeing this paradigm emerge in popularity recently. I believe this is an under-appreciated topic in KV cache related research that deserve more attention, given the rise of agentic workload; in agentic AI, strict prefix caching becomes less useful due to increased prompt diversity, and I appreciate the authors for motivating this emerging scenario and issue with existing cache reuse methods. 2. Solid and efficient system design. The proposed design of segment-level KV cache reuse eliminates additional I/O or data movement (e.g. duplicating KV cache blocks in GPU memory), which happens a lot in related work when positional encoding alignment is needed for partial KV cache reuse. The memory table design is efficient and scalable, as there is no need to store actual KV cache related tensors for efficient lookups. Microbenchmarks in the evaluation section are extensive and contribute to the point being made by the authors regarding efficiency. 1. As the authors acknowledge (e.g., in Section 3.3), misalignment in positional encoding can be a significant concern that leads to issues such as accuracy degradation; this also seems to be the main reason why CrossKV accuracy is often lower than "vanilla" in Table 1. While I agree that understanding its impact theoretically may be challenging, a more thorough empirical analysis is essential before this framework can be safely adopted in practice. Specifically, it would be helpful to examine the potential consequences when positional encodings are misaligned: What are the worst-case scenarios? Could LLMs produce irrelevant or even harmful outputs? Which types of queries or benchmarks are most sensitive to positional misalignment? Is it possible to identify when (segment-level) KV cache reuse would lead to accuracy degradation and switch to recompute instead? 2. The evaluation section of this paper lacks comparison to major baselines in the direction of lossy or partial KV cache reuse, like CacheBlend and KVShare, even though these work are mentioned in "related work". Thank you for submitting this paper to ICLR! Please refer to "weaknesses" for my questions. Fully human-written
Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper proposes a method to address a key inefficiency in LLM-based multi-agent systems (MAS): the reliance of KV cache reuse systems on rigid prefix matching. This mechanism fails in MAS environments where agents have diverse prompt templates and contexts, leading to rare cache hits and significant redundant computation. To solve this, the authors propose a Segment-Level KV Cache Sharing mechanism. This approach decomposes the KV cache into fine-grained "semantic segments". It allows any agent to reuse a cached segment from any other agent, regardless of its position in the new query, thereby enabling a collaborative working memory. The paper also investigates the critical technical challenge of positional encoding (RoPE) mismatches and proposes an adaptive recomputation strategy as a solution. Experiments show the method significantly increases inference speed and, in some cases, improves task performance by enabling this working memory sharing. * The paper correctly identifies a highly relevant and practical problem. As multi-agent systems become more common, the limitations of prefix-only caching become a critical bottleneck. The idea of a "collaborative memory" is a strong conceptual framing. * The proposed mechanism has the intended effect of improving the cache hit rates and prefill speeds. * The paper itself notes a lack of theoretical justification as a limitation. The fundamental assumption that a segment's KV cache is locally concentrated and largely context-independent is justified by citing sparsity literature and a single visual analysis (Fig. 1), but it's not deeply explored. * The results are presented only with dense architectures, not with other more prevalent architectures like MoE. In general, such a technique, without theoretical justification, is hard to take for granted that it will generalize broadly. * It is also not clear how the semantic segments are identified in practice. The LLM identifying reusable parts of its context upon prompting seems brittle. * In the Table 1, it is not clear why certain results will actually improve upon reusing the cached KVs. Reusing cached KVs should strictly be a inference time win, not qualitative win. * The manuscript in its current form is quite repetitive. The problem, solution, and contributions are restated in similar terms across the Abstract, Introduction, and Conclusion, diluting the paper's impact. The paper could be substantially shorter without losing any of its core technical merit. None Moderately AI-edited
Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces CrossKV, a segment-level KV cache sharing mechanism that enables flexible reuse of intermediate computations across agents in multi-agent workflows without requiring prefix alignment. Built on vLLM, it decouples cache reuse from rigid prefix matching, allowing agents to share semantic segments at arbitrary positions. 1. The segment-level sharing mechanism effectively overcomes the limitations of prefix-based caching in multi-agent systems. 2. The implementation demonstrates practical system-level gains, with notable improvements in both throughput and task performance. 1. Section 3.1.2 directly presents the attention map visualization comparing cases with and without segment-level sharing in Figure 1. However, at this point, the mechanism of segment-level KV sharing remains unclear—under what conditions (e.g., a certain degree of token similarity) does sharing occur, and what are the typical patterns of such segments? Section 3.1.1 also lacks rigorous and interpretable formulations, relying solely on textual descriptions, which makes Figure 1 difficult to understand. 2. It is still unclear how CrossKV is integrated into existing multi-agent frameworks. Does it replace the original natural language communication among agents, or do agents still communicate via natural language while additionally performing KV sharing? 3. CrossKV seems inapplicable to heterogeneous LLM-based multi-agent systems, as it does not address potential discrepancies in hidden state dimensions or distributions across different backbone LLMs. Restricting the method to homogeneous MASs substantially limits its contribution. 4. How are the `<reuse begin>` and `<reuse end>` tags obtained? If they are generated by the model itself, how is the correctness or reasonableness of their positions ensured? 5. When these KV caches are directly reused, does this lead to semantic discontinuity? Directly embedding the cached segments from one agent into another agent’s input without adaptation seems problematic. See Weakness Moderately AI-edited
PreviousPage 1 of 1 (4 total rows)Next