|
Deep Neural Networks Divide and Conquer Dihedral Multiplication |
We find multilayer perceptrons and transformers both learn an instantiation of the same divide-and-conquer algorithm and solve dihedral multiplication with logarithmic feature efficiency. Applying pri... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Is Extending Modality The Right Path Towards Omni-Modality? |
Omni-modal language models (OLMs) aim to integrate and reason over diverse input modalities—such as text, images, video, and audio—while maintaining strong language capabilities. Despite recent advanc... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Welfarist Formulations for Diverse Similarity Search |
Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generation... |
6.50 |
0% |
See Reviews |
View AI Dashboard |
|
Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures |
Large language models (LLMs) increasingly generate intermediate reasoning structures --- rubrics, checklists, proof graphs --- to make their decisions more interpretable. But are these structures caus... |
4.50 |
24% |
See Reviews |
View AI Dashboard |
|
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining |
The task adaptation and alignment of Large Multimodal Models (LMMs) have been significantly advanced by instruction tuning and further strengthened by recent preference optimization. Yet, most LMMs st... |
3.00 |
4% |
See Reviews |
View AI Dashboard |
|
An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse |
Model merging unifies independently fine-tuned LLMs from the same base, enabling reuse and integration of parallel development efforts without retraining.
However, in practice we observe that merging ... |
4.00 |
15% |
See Reviews |
View AI Dashboard |
|
StoryAlign: Evaluating and Training Reward Models for Story Generation |
Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LL... |
6.50 |
16% |
See Reviews |
View AI Dashboard |
|
Logarithmic Regret in Preference Learning via Optimistic PAC-Bayesian Particle Ensembles |
The remarkable sample efficiency of preference-based reinforcement learning, which underpins the alignment of large language models with human feedback (RLHF), presents a significant theoretical puzzl... |
4.00 |
65% |
See Reviews |
View AI Dashboard |
|
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimization |
The application of Reinforcement Learning with Verifiable Rewards (RLVR) to mathematical and coding domains has demonstrated significant improvements in the reasoning and problem-solving abilities of ... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Null-Space Filtering for Data-free Continual Model Merging: Preserving Transparency, Promoting Fidelity |
Data-free continual model merging (DFCMM) aims to fuse independently fine-tuned models into a single backbone that evolves with incoming tasks without accessing task data. This paper formulate two fun... |
5.50 |
32% |
See Reviews |
View AI Dashboard |
|
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective |
In federated learning (FL), model aggregation plays a central role in enabling decentralized knowledge sharing.
However, it is often observed that the aggregated model underperforms on local data unti... |
4.50 |
7% |
See Reviews |
View AI Dashboard |
|
On Uniformly Scaling Flows: A Density-Aligned Approach to Deep One-Class Classification |
Unsupervised anomaly detection is often framed around two widely studied paradigms. Deep one-class classification, exemplified by Deep SVDD, learns compact latent representations of normality, while d... |
4.00 |
7% |
See Reviews |
View AI Dashboard |
|
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes |
There are two prevalent ways to constructing 3D scenes: procedural generation and 2D lifting. Among them, panorama-based 2D lifting has emerged as a promising technique, leveraging powerful 2D generat... |
4.50 |
4% |
See Reviews |
View AI Dashboard |
|
PairUni: Pairwise Training for Unified Multimodal Language Models |
Unified Vision-Language Models (UVLMs) must perform both understanding and generation within a single architecture, but these tasks rely on heterogeneous data and supervision, making it difficult to b... |
4.50 |
17% |
See Reviews |
View AI Dashboard |
|
GTA1: GUI Test-time Scaling Agent |
Graphical user interface (GUI) agents autonomously complete tasks across platforms (\eg, Linux) by sequentially decomposing user instructions into action proposals that iteratively interact with visua... |
5.50 |
5% |
See Reviews |
View AI Dashboard |
|
Reasoning at the Right Length: Adaptive Budget Forcing for Efficient and Accurate LLM Inference |
Large Language Models (LLMs) face persistent challenges in domain-specific reasoning tasks, particularly in fields such as mathematics, telecommunications, and scientific problem-solving, where struct... |
3.00 |
79% |
See Reviews |
View AI Dashboard |
|
Entrophy: User Interaction Data from Live Enterprise Workflows for Realistic Model Evaluation |
AI-driven automation for complex enterprise workflows faces significant hurdles due to the lack of publicly available datasets that realistically capture how business processes unfold - interaction by... |
4.00 |
N/A |
See Reviews |
|
|
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness |
Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods genera... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Guided Navigation in Knowledge-Dense Environments: Structured Semantic Exploration with Guidance Graphs |
While Large Language Models (LLMs) exhibit strong linguistic capabilities, their reliance on static knowledge and opaque reasoning processes limits their performance in knowledge-intensive tasks. Alth... |
4.00 |
38% |
See Reviews |
View AI Dashboard |
|
SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES |
Image generation models often struggle with safety-critical edits, especially detoxifying harmful visual content without losing semantic context. We introduce SPADE, a novel dataset for *controlled, g... |
2.50 |
43% |
See Reviews |
View AI Dashboard |
|
VBA: Vector Bundle Attention for Intrinsically Geometry-Aware Learning |
Learning from geometrically structured data is fundamental in biology, physics, and computer vision. Graph Neural Networks capture local structure but are limited by message passing, while Transformer... |
4.67 |
64% |
See Reviews |
View AI Dashboard |
|
Evading Overlapping Community Detection via Proxy Node Injection |
Protecting privacy in social graphs requires preventing sensitive information, such as community affiliations, from being inferred by graph analysis, without substantially altering the graph topology.... |
4.00 |
26% |
See Reviews |
View AI Dashboard |
|
All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs |
Vision Large Language Models (VLLMs) usually incur high computational costs due to their reliance on hundreds of visual tokens to represent images. While token pruning offers a promising solution for ... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
COGITAO: A Procedural Object-Centric Framework to Study Compositional Generalization |
The ability to compose learned concepts and apply them in novel settings is key to human intelligence, but remains a key challenge in state-of-the-art machine learning models. To address this issue, w... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation |
Rotary Position Embeddings (RoPE) are widely adopted in LLMs, and it is commonly believed that larger base frequencies $\theta$ yield better long-context performance.
In this paper, we show that a hig... |
5.20 |
0% |
See Reviews |
View AI Dashboard |
|
Advanced Image Forensics: Detecting Tampered and AI-Generated Images with Adversarial Learning |
Detecting image tampering and Artificial Intelligence Generated Images are vital challenges in the fields of computer vision. The primary difficulty in identifying tampered images lies in uncovering m... |
2.50 |
70% |
See Reviews |
View AI Dashboard |
|
PLUMAGE: probablistic low-rank unbiased min variance gradient estimation framework for efficient large model training |
Accelerator memory and network constraints are dominant bottlenecks when training large language models (LLMs) with billions of parameters. Low-rank gradient estimators have been successfully applied ... |
4.67 |
0% |
See Reviews |
View AI Dashboard |
|
How is Occam's Razor Realized in Symbolic Regression?: An Adaptive LLM-Enhanced Genetic Programming Approach for Efficient, Versatile, and Interpretable Representation Discovery through Simplification and Evolution |
Symbolic regression aims to discover mathematical expressions that capture underlying data relationships, but genetic programming (GP) approaches commonly encounter bloat, premature convergence, and i... |
3.50 |
58% |
See Reviews |
View AI Dashboard |
|
TokenDrop: Efficient Image Editing by Source Token Drop with Consistency Regularization |
Text-based image editing has recently been reinterpreted in large multimodal transformers as conditional generation, where source image tokens are concatenated with text and noise tokens as conditioni... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
HieraQuery: Bridging Multimodal Understanding and High-Quality Generation through Multi-Scale Query Learning |
Unified multi-modal LLMs enable the integration of visual understanding and generation in a single framework. Recent study shows that a set of learnable queries can serve as an effective interface bet... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models |
Large Reasoning Models (LRMs) like Claude 3.7 Sonnet and OpenAI o1 achieve strong performance on mathematical tasks via long Chain-of-Thought (CoT), but often generate unnecessarily verbose reasoning ... |
2.67 |
47% |
See Reviews |
View AI Dashboard |
|
Process-Verified Reinforcement Learning for Theorem Proving via Lean |
While reinforcement learning from verifiable rewards (RLVR) typically has relied on a single binary verification signal, symbolic proof assistants in formal reasoning offer rich, fine-grained structur... |
5.00 |
7% |
See Reviews |
View AI Dashboard |
|
Adapting Vision-Language Models for Evaluating World Models |
World models -- generative models that simulate environment dynamics conditioned on past observations and actions -- are gaining prominence in planning, simulation, and embodied AI. However, evaluatin... |
4.67 |
41% |
See Reviews |
View AI Dashboard |
|
Explicit Conditional Consistency Diffusion: Towards Precise Semantic Alignment in Multimodal Face Generation |
With the collaborative guidance of multimodal conditions (e.g., semantic masks as structural visual guidance and text descriptions as linguistic guidance), diffusion models have significantly improved... |
3.50 |
6% |
See Reviews |
View AI Dashboard |
|
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design |
Low-discrepancy point sets and digital sequences underpin quasi-Monte Carlo
(QMC) methods for high-dimensional integration. We cast two long-standing
QMC design problems as program synthesis and solve... |
4.80 |
37% |
See Reviews |
View AI Dashboard |
|
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos? |
Publicly available biomedical videos, such as those on YouTube, serve as valuable educational resources for medical students. Unlike standard machine learning datasets, these videos are designed for h... |
4.50 |
33% |
See Reviews |
View AI Dashboard |
|
Structure-Aware Graph Hypernetworks for Neural Program Synthesis |
We study the neural program synthesis of $\textit{parameterized}$ function families through the lens of meta-learning with hypernetworks. Given a user intent $U$, a meta-learner $M_{\phi}$ produces a ... |
4.67 |
12% |
See Reviews |
View AI Dashboard |
|
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper |
In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial inf... |
4.50 |
24% |
See Reviews |
View AI Dashboard |
|
REVEAL: Advancing Relation-based Video Understanding for Video-Question-Answering |
Video Question-Answering (Video-QA) comprises the capturing of complex visual relation changes over time, remaining a challenge even for advanced Vision-Language Models (VLM), i.a., because of the nee... |
4.50 |
8% |
See Reviews |
View AI Dashboard |
|
The Agent's Marathon: Probing the Limits of Endurance in Long-Horizon Tasks |
Large Language Model (LLM) agents, augmented with diverse tools, have shown impressive progress in domains such as scientific discovery and enterprise automation. Yet they remain brittle in long-horiz... |
3.00 |
16% |
See Reviews |
View AI Dashboard |
|
Blade: A Derivative-free Bayesian Inversion Method using Diffusion Prior |
Derivative-free Bayesian inversion is an important task in many science and engineering applications, particularly when computing the forward model derivative is computationally and practically challe... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time |
We propose a neural physics system for real-time, interactive fluid simulations. Traditional physics-based methods, while accurate, are computationally intensive and suffer from latency issues. Recent... |
3.00 |
4% |
See Reviews |
View AI Dashboard |
|
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget |
This work tackles a critical challenge in AI safety research under limited compute: given a fixed computation budget, how can one maximize the strength of iterative adversarial attacks? Coarsely reduc... |
6.00 |
9% |
See Reviews |
View AI Dashboard |
|
COMPOL: A Unified Neural Operator Framework for Scalable Multi-Physics Simulations |
Multi-physics simulations play an essential role in accurately modeling complex interactions across diverse scientific and engineering domains. Although neural operators, especially the Fourier Neural... |
4.00 |
47% |
See Reviews |
View AI Dashboard |
|
Neurosymbolic Object-Centric Learning with Distant Supervision |
Relational learning enables models to generalize across structured domains by reasoning over objects and their interactions. While recent advances in neurosymbolic reasoning and object-centric learnin... |
5.50 |
8% |
See Reviews |
View AI Dashboard |
|
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding |
Enabling natural communication through brain–computer interfaces (BCIs) remains one of the most profound challenges in neuroscience and neurotechnology. While existing frameworks offer partial solutio... |
6.00 |
55% |
See Reviews |
View AI Dashboard |
|
From Crowds to Codes: Minimizing Review Burden in Conference Review Protocols |
Conference peer review aims to accurately assess paper quality while minimizing review load. This paper explores optimal conference protocols --- rules for designing review tasks to reviewers and inf... |
3.33 |
0% |
See Reviews |
View AI Dashboard |
|
Where Redundancy Lives: Stage-Aware Block Saliency in Skip-Connected Models |
Residual (skip-connected) architectures such as ResNets are widely used, yet the extent and structure of their inference-time redundancy remain unclear. We repurpose post-training block ablation as a ... |
2.50 |
25% |
See Reviews |
View AI Dashboard |
|
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies |
Masked diffusion models (MDMs) have recently emerged as a novel framework for language modeling. MDMs generate sentences by iteratively denoising masked sequences, filling in [MASK] tokens step by ste... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Counterfactual Digital Twin: Generating What-If Trajectories with Uncertainty |
Answering \textit{what-if} questions is crucial in many decision-making domains, especially in time-sensitive areas such as healthcare, strategy, and policy. Generating counterfactual trajectories req... |
3.33 |
10% |
See Reviews |
View AI Dashboard |