|
Batch Speculative Decoding Done Right |
Speculative decoding speeds up LLM inference by using a small draft model to propose multiple tokens that a target model verifies in parallel. Extending this idea to batches is essential for productio... |
4.67 |
49% |
See Reviews |
View AI Dashboard |
|
Disentangling Primitive Representation Structures for Image Generation |
This paper explains a neural network for image generation from a new perspective, i.e., explaining representation structures for image generation. We propose a set of desirable properties to define th... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Pulp Motion: Framing-aware multimodal camera and human motion generation |
Treating human motion and camera trajectory generation separately overlooks a core principle of cinematography: the tight interplay between actor performance and camera work in the screen space.
In t... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Adaptive Decoding via Latent Preference Optimization |
During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonl... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Openhelix: Empirical Analysis of Dual-System VLA Models for Robotic Manipulation |
Dual-system vision-language-action (VLA) architectures are emerging as a promising approach in embodied intelligence. However, current works lack consistency in training and evaluation protocols acros... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture |
Point cloud-based motion capture leverages rich spatial geometry and privacy-preserving sensing, but learning robust representations from noisy, unstructured point clouds remains challenging. Existing... |
6.00 |
27% |
See Reviews |
View AI Dashboard |
|
Joint Optimization for 4D Human-Scene Reconstruction in the Wild |
Reconstructing human motion and its surrounding environment is crucial for understanding human-scene interaction and predicting human movements in the scene. While much progress has been made in captu... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
Quantum Learning from Label Proportion |
Learning from Label Proportions (LLP) is a weakly supervised learning method in which training data are provided as bags of instances annotated only with class proportions.
We introduce Q-LLP, a quan... |
1.67 |
5% |
See Reviews |
View AI Dashboard |
|
Doubly Robust Monte Carlo Tree Search |
We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates doubly robust off-policy estimation into MCTS to improve sample efficiency in computationally expensive en... |
3.33 |
88% |
See Reviews |
View AI Dashboard |
|
Unified Single Transformer for Multimodal Video Understanding and Generation |
With the advancement of language models, unified multimodal understanding and generation have made significant strides, with model architectures evolving from separated components to unified single-mo... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents |
The utility of Role-Playing Language Agents in sociological research is growing alongside the adoption of Large Language Models. For realism in social simulation, these agents must adhere to their per... |
6.50 |
12% |
See Reviews |
View AI Dashboard |
|
In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior |
In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods either cannot improve beyond the training distribution or requir... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Evolving Graph Structured Programs for Circuit Generation with Large Language Models |
Logic synthesis (LS), which aims to generate a *compact* logic circuit graph with minimized size while *accurately* satisfying a given functionality, plays an important role in chip design. However, e... |
5.50 |
2% |
See Reviews |
View AI Dashboard |
|
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs |
Recently, efficient Multimodal Large Language Models (MLLMs) have gained significant attention as a solution to their high computational complexity, making them more practical for real-world applicati... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
Pivot-Centric Trajectory Prediction: Bridging Long Horizons via Dynamical Guidance |
Forecasting precise future motion of surrounding agents is essential for reliable autonomous vehicles. However, as the demand for longer prediction horizons increases, existing endpoint-completion or ... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity |
LLM-judged benchmarks are increasingly used to evaluate complex model behaviors, yet their design introduces failure modes absent in conventional, ground-truth–based benchmarks. We argue that, without... |
N/A |
0% |
See Reviews |
View AI Dashboard |
|
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation |
Deep generative models have advanced text-to-online handwriting generation (TOHG), which aims to synthesize realistic pen trajectories conditioned on textual input and style references. However, most ... |
5.50 |
15% |
See Reviews |
View AI Dashboard |
|
Decoding Layer by Layer: Uncovering Hierarchical Reasoning in Language Models |
Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by humans' hierarchical reasoning capability, we propose that a hierarchical decoder architecture cou... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval |
Composed Image Retrieval (CIR) task aims to retrieve target images based on reference images and modification texts. Current CIR methods primarily rely on fine-tuning vision-language pre-trained model... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
From Contextual Distributions to Messages: Entropy-Guided GNNs |
The Message Passing Neural Networks (MPNNs) have emerged as the dominant framework for learning on graphs. However, their expressive power is fundamentally restricted by the 1-dimensional Weisfeiler-L... |
3.00 |
8% |
See Reviews |
View AI Dashboard |
|
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing |
Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts, with applications spanning robotics, assistive technologies, and interactive AI systems. While r... |
4.50 |
44% |
See Reviews |
View AI Dashboard |
|
LUCID: Attention with Preconditioned Representations |
Softmax-based dot-product attention is a cornerstone of Transformer architectures, enabling remarkable capabilities such as in-context learning. However, as context lengths increase, a fundamental lim... |
4.00 |
10% |
See Reviews |
View AI Dashboard |
|
Semantic-Aware and Self-Transformative Function Name Recovery for Binaries |
Reverse engineers aim to analyze stripped binaries in order to identify and mitigate software vulnerabilities. Unlike source code, real-world binaries contain limited semantic information, as companie... |
3.00 |
8% |
See Reviews |
View AI Dashboard |
|
Scaling Curriculum Learning for Autonomous Driving |
Batched simulators for autonomous driving have recently enabled the training of reinforcement learning agents on a massive scale, encompassing thousands of traffic scenarios and billions of interactio... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Recursive Autoregressive Depth Estimation with Continuous Token Modeling |
Monocular depth estimation is a cornerstone of robotic perception and computer vision, yet reconstructing 3-D structure from a single RGB image suffers from severe geometric ambiguity and uncertainty.... |
4.00 |
37% |
See Reviews |
View AI Dashboard |
|
Controllable diffusion-based generation for multi-channel biological data |
Biological profiling technologies, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate multi-channel data with strong spatial alignment and complex inter-channel relationsh... |
3.50 |
11% |
See Reviews |
View AI Dashboard |
|
Verl-Tool: Towards Holistic Agentic Reinforcement Learning with Tool Use |
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While ... |
4.50 |
25% |
See Reviews |
View AI Dashboard |
|
Transductive and Learning-Augmented Online Regression |
Motivated by the predictability of real-life data streams, we study online regression when the online learner has access to predictions about future examples. In the extreme case, called transductive ... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks |
Graph Neural Networks (GNNs) are increasingly integrated with federated learning (FL) to protect data locality in domains such as social networks, finance, and biology. While membership inference atta... |
4.50 |
14% |
See Reviews |
View AI Dashboard |
|
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs |
Sparse attention mechanisms aim to reduce computational overhead with minimal accuracy loss by selectively processing salient tokens. Despite their effectiveness, most methods merely exploit a model’s... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems |
Large Language Models (LLMs) have demonstrated impressive capabilities in solving a wide range of tasks. However, their ability to iteratively optimize complex solutions by learning from previous feed... |
3.00 |
34% |
See Reviews |
View AI Dashboard |
|
Refusal Degrades with Token-Form Drift: Limits of Token-Level Alignment |
Safety alignment of large language models (LLMs) is typically learned through supervised fine-tuning and preference optimization on a fixed distribution of token sequences. We show that this process c... |
5.00 |
48% |
See Reviews |
View AI Dashboard |
|
Factor Graph Optimization for Belief Propagation Decoding |
Belief Propagation (BP) is a highly efficient message-passing algorithm for inference on graphical models, famously applied to the decoding of sparse codes. The performance of BP, however, is critical... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Incremental Learning of Vision-Language Models via Task Subspace Projection and Dynamic LoRA |
Recent pre-trained vision-language models usually face a Multi-Domain Task-Incremental Learning (MTIL) benchmark in practice, where a set of classes of multi-modal tasks arrive incrementally. Due to p... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models |
Evaluation using Large Language Model (LLM) judges has been widely adopted in English and shown to be effective for automatic evaluation. However, their performance does not generalize well to non-Eng... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Finite‑Time Bounds for Distributionally Robust TD Learning with Linear Function Approximation |
Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. In particular, we are interested in maximizing the worst-ca... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Steering Language Models with Weight Arithmetic |
Providing high-quality feedback to Large language models (LLMs) on a diverse training distribution can be difficult and expensive, and providing feedback only on a narrow distribution can result in un... |
5.33 |
0% |
See Reviews |
View AI Dashboard |
|
How Transformers Get Rich: Approximation and Dynamics Analysis |
Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remains limited. A recent work (Elhage et al., 2021) identif... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Policy Optimization Prefers The Path Of Least Resistance |
Policy optimization (PO) algorithms are used to refine Large Language Models (LLMs) for complex, multi-step reasoning. Current state-of-the-art pipelines enforce a strict think-then-answer format to e... |
2.50 |
66% |
See Reviews |
View AI Dashboard |
|
OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation |
Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, document layout generation,... |
3.50 |
2% |
See Reviews |
View AI Dashboard |
|
Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective |
Large language models have revolutionized natural language processing but face significant challenges of high storage and runtime costs, due to the transformer architecture's reliance on self-attentio... |
5.00 |
N/A |
See Reviews |
|
|
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement |
Retrieval-augmented generation (RAG) improves performance on knowledge-intensive tasks but can be derailed by wrong, irrelevant, or conflicting retrieved text, causing models to rely on inaccurate evi... |
5.50 |
18% |
See Reviews |
View AI Dashboard |
|
CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models |
We propose a novel framework designed to improve both the training efficiency and generation quality of multi-view diffusion models. While these models have emerged as a powerful paradigm for novel vi... |
3.00 |
6% |
See Reviews |
View AI Dashboard |
|
Agentic reinforcement learning for search is unsafe |
Agentic reinforcement learning (RL) trains large language models to autonomously call external tools during reasoning, with search as the most common application. These models perform well on multi-st... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
IPOD:Inverse-Problem-Driven Meta-Learning for Fast Generalizable Neural Representations in MRI Reconstruction |
Implicit neural representation (INR) demonstrates strong performance in magnetic resonance imaging (MRI) reconstructions by learning continuous mappings from spatial coordinates to signal intensities.... |
4.67 |
8% |
See Reviews |
View AI Dashboard |
|
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution |
High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld cameras, suc... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
Membership Inference Attacks for Unseen Classes |
The state-of-the-art for membership inference attacks on machine learning models is a class of attacks based on \emph{shadow models} that mimic the behavior of the target model on subsets of held-out ... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change |
This paper introduces the Behavioral Ambivalence/Hesitancy (BAH) dataset collected for the Ambivalence/Hesitancy (A/H) recognition task in videos. In particular, this task involves recognizing conflic... |
5.50 |
4% |
See Reviews |
View AI Dashboard |
|
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition |
The Convolutional Neural Tangent Kernel (CNTK) offers a principled framework for understanding convolutional architectures in the infinite-width regime. However, a comprehensive spectral comparison be... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
Dynamic Speculative Agent Planning |
Despite their remarkable success in complex tasks propelling widespread adoption, large language model based agents still face critical deployment challenges due to prohibitive latency and inference c... |
5.50 |
N/A |
See Reviews |
|