|
MVP: Memory-enhanced Vision-Language-Action Policy with Feedback Learning |
Recent advances in Vision-Language-Action (VLA) models have enabled robots to perform a wide range of manipulation tasks conditioned on language instructions, offering strong generalization across tas... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration |
We propose DIVERSE, a framework for systematically exploring the Rashomon set of deep neural networks, the collection of models that match a reference model’s accuracy while differing in their predict... |
6.00 |
44% |
See Reviews |
View AI Dashboard |
|
Intra-Trajectory Consistency for Reward Modeling |
Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) and inference-time verification. Due to the prohibitive cost of... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Never Saddle: Reparameterized Steepest Descent as Mirror Flow |
How does the choice of optimization algorithm shape a model’s ability to learn features? To address this question for steepest descent methods—including sign descent, which is closely related to Adam—... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression |
We study instrumental variable regression (IVaR) under differential privacy constraints.
Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directl... |
5.50 |
27% |
See Reviews |
View AI Dashboard |
|
From Compression to Specialization: An Information-Preserving Approach for Dense to Mixture-of-Experts Construction |
The high cost of training Mixture-of-Experts (MoE) models from scratch has spurred interest in converting pre-trained dense models into sparse MoE models.
However, existing dense-to-sparse MoE methods... |
2.67 |
33% |
See Reviews |
View AI Dashboard |
|
Real-Aware Residual Model Merging for Deepfake Detection |
Deepfake generators evolve quickly, making exhaustive data collection and repeated retraining impractical. We argue that model merging is a natural fit for deepfake detection: unlike generic multi-tas... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
Aligning News and Prices: A Cross-Modal LLM-Enhanced Transformer DRL Framework for Volatility-Adaptive Stock Trading |
While Deep Reinforcement Learning (DRL) has shown promise for stock trading, its practical application is constrained by critical gaps that undermine performance in real-world volatile markets, most n... |
2.00 |
32% |
See Reviews |
View AI Dashboard |
|
The Blind Spot of LLM Security: Time-Sensitive Backdoors Activated by Inherent Features |
With the widespread adoption of Large Language Models (LLMs), backdoor attacks against pre-trained LLMs have become a notable security issue. Without control over end-user inputs, the trigger conditio... |
4.50 |
4% |
See Reviews |
View AI Dashboard |
|
Variational Model Merging for Pareto Front Estimation in Multitask Finetuning |
We propose a new variational model merging method that can yield arbitrarily accurate Pareto fronts in multitask finetuning. The idea is to first compute posterior-approximations on each task separate... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision |
We propose a method for inferring an egocentric dense depth map from an RGB image and a sparse point cloud.
The crux of our method lies in modeling the 3D scene implicitly within the latent space and... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
When Can You Get Away with Low Memory Adam? |
Adam is the go-to optimizer for training modern machine learning models, but it requires additional memory to maintain the moving averages of the gradients and their squares. While various low-memory ... |
3.20 |
0% |
See Reviews |
View AI Dashboard |
|
MFCL: A Multi-modal Function Calling Evaluation for Large Language Models |
Large language models are evolving into multi-modal agents that call tools directly from raw speech or images. Yet we still lack a principled metric for how well they convert perception into accurate ... |
4.50 |
7% |
See Reviews |
View AI Dashboard |
|
CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation |
Legal case documents play a critical role in judicial proceedings. As the number of cases continues to rise, the reliance on manual drafting of legal case documents is facing increasing pressure and c... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
Chain of Time: In-Context Physical Simulation with Image Generation Models |
We propose a novel method to improve the physical simulation ability of vision-language models. This Chain-of-Time simulation is motivated by in-context reasoning in machine learning, and mental simul... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization |
While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open question whether there is a mathematical framework to characterize what kind of feature... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
DualTune: Decoupled Fine-tuning for On-Device Agentic Systems |
The deployment of Large Language Models (LLMs) as agentic orchestrators has revolutionized task automation, but the need for privacy-preserving, cost-effective solutions demands on-device inference ca... |
2.50 |
0% |
See Reviews |
View AI Dashboard |
|
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration |
Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. However, exis... |
5.50 |
44% |
See Reviews |
View AI Dashboard |
|
TCMAgent: A Multi-Agent Framework for General Traditional Chinese Medicine |
A central challenge in artificial intelligence is designing systems that replicate expert cognition in domains where decisions require holistic data synthesis and deliberative reasoning. While large l... |
2.50 |
48% |
See Reviews |
View AI Dashboard |
|
DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation |
Personalized Text-to-Image (PT2I) generation aims to produce customized images based on reference images. A prominent interest pertains to the integration of an image prompt adapter to facilitate zero... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Towards a more Holistic Evaluation of Object-Centric Learning |
Object-centric learning (OCL) methods were developed by taking inspiration from how humans perceive a scene. It is conjectured that they achieve compositional generalisation by decomposing the scene i... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making |
Decision-focused learning (DFL) integrates predictive models with downstream optimization, directly training machine learning models to minimize decision errors. While DFL has been shown to provide su... |
3.50 |
28% |
See Reviews |
View AI Dashboard |
|
GAMBIT: A Graph-structured and Decision-Aware Benchmark for MoBile GUI Tasks |
Mobile GUI agents powered by LMMs can perceive screens and follow instructions, yet existing benchmarks largely target short, linear workflows and step-level accuracy, offering limited insight into lo... |
4.00 |
N/A |
See Reviews |
|
|
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery |
The rapid advancement of large language models has fundamentally shifted the bottleneck in AI development from computational power to data availability—with countless valuable datasets remaining hidde... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
SafeCoop: Unravelling Full Stack Safety in Agentic Cooperative Driving |
Collaborative driving systems leverage vehicle-to-everything (V2X) communication across multiple agents to enhance driving safety and efficiency. Traditional V2X systems take raw sensor data, neural f... |
3.50 |
17% |
See Reviews |
View AI Dashboard |
|
Neurosymbolic Language Reasoning as Satisfiability Modulo Theory |
Natural language (NL) contains extensive logical structure, finely meshed with ''gestalt'' content best interpreted statistically. LLMs are indispensable for interpreting the gestalt content but known... |
4.50 |
14% |
See Reviews |
View AI Dashboard |
|
How Confident are Video Models? Empowering Video Models to Express their Uncertainty |
Generative video models demonstrate impressive text-to-video capabilities,
spurring widespread adoption in many real-world applications. However, like
large language models (LLMs), video generation mo... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability |
Spectral Graph Neural Networks (GNNs) achieve strong performance in node classification, yet their node distinguishability remains poorly understood. We analyze how graph matrices and node features jo... |
5.50 |
7% |
See Reviews |
View AI Dashboard |
|
Self-Knowledge Without a Self? Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns |
Generating reliable, calibrated confidence estimates is critical for deploying LLMs in high-stakes or user-facing applications, and remains an open challenge. Prior research has often framed confidenc... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
ReCAP: Recursive Prompting for Self-Supervised Category-Level Articulated Pose Estimation from an Image |
Estimating category-level articulated object poses is crucial for robotics and virtual reality.
Prior works either rely on costly annotations, limiting scalability, or depend on auxiliary signals suc... |
4.67 |
11% |
See Reviews |
View AI Dashboard |
|
CausalAffect: Causal Discovery for Facial Affective Understanding |
Understanding human affect from facial behavior requires not only accurate recognition but also structured reasoning over the latent dependencies that drive muscle activations and their expressive out... |
5.00 |
17% |
See Reviews |
View AI Dashboard |
|
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models |
Retrieval-Augmented Generation (RAG) mitigates hallucinations in Large Language Models (LLMs) by integrating external knowledge. However, conflicts between parametric knowledge and retrieved context p... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
LLM Unlearning with LLM Beliefs |
Large language models trained on vast corpora inherently risk memorizing sensitive or harmful content, which may later resurface in their outputs.
Prevailing unlearning methods generally rely on gradi... |
6.00 |
9% |
See Reviews |
View AI Dashboard |
|
Layer-wise Sensitivity-aware Sparsity Allocation for Efficient LLM Inference |
Large Language Model (LLM) inference presents substantial computational challenges when executed on commodity hardware, thereby necessitating the development of efficient acceleration techniques. Whil... |
5.33 |
59% |
See Reviews |
View AI Dashboard |
|
In-Context Clustering with Large Language Models |
We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity mea... |
2.50 |
0% |
See Reviews |
View AI Dashboard |
|
A Theoretical Analysis of Discrete Flow Matching Generative Models |
We provide a theoretical analysis for end-to-end training Discrete Flow Matching (DFM) generative models.
DFM is a promising discrete generative modeling framework that learns the underlying generati... |
4.50 |
5% |
See Reviews |
View AI Dashboard |
|
Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation |
Retrieval-augmented generation (RAG) has become a widely recognized paradigm to combine parametric memory with non-parametric memory. An RAG model consists of two serial connecting components (retriev... |
3.33 |
0% |
See Reviews |
View AI Dashboard |
|
scCMIA: Self-supervised Dual Model for Mitigating Information Loss in Single-cell Cross-Modal Alignment |
Recent technological advances in single-cell sequencing have enabled simultaneous profiling of multiple omics modalities within individual cells. Despite these advancements, challenges such as high no... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
WebRAGent: Retrieval-Augmented Generation for Multimodal Web Agent Planning |
Trajectory data, capturing multimodal human actions and states, are pivotal for building autonomous GUI agents and transferring skills across tasks, encoding knowledge by compressing past experience i... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Contrastive Residual Energy Test-time Adaptation |
Test-Time Adaptation (TTA) enhances model robustness by enabling adaptation to target distributions that differ from training distributions, improving real-world generalizability. However, most existi... |
4.50 |
9% |
See Reviews |
View AI Dashboard |
|
SpintBench: Evaluating LLMs' Complex \\ Reasoning via Spatial Integration Challenges |
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains, yet their comprehensive spatial reasoning competencies remain underexplored. This paper propose... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Label-Free Attribution for Interpretability |
The importance of attribution algorithms in the AI field lies in enhancing model transparency, diagnosing and improving models, ensuring fairness, and increasing user understanding. Gradient-based att... |
4.80 |
7% |
See Reviews |
View AI Dashboard |
|
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization |
The theory of discrete-time reinforcement learning (RL) has advanced rapidly over the past decades. Although primarily designed for discrete environments, many real-world RL applications are inherentl... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
HIGH-AVATAR: Hierarchical Representation for One-shot Gaussian Head Avatar |
We propose HIGH-Avatar, a novel one-shot method that leverages a $\textbf{HI}$erarchical representation for animatable 3D $\textbf{G}$aussian $\textbf{H}$ead reconstruction from a single image. In con... |
3.50 |
26% |
See Reviews |
View AI Dashboard |
|
Element2Vec: Build Chemical Element Representation from Text for Property Prediction |
Accurate property data for chemical elements is crucial for materials design and
manufacturing, but many of them are difficult to measure directly due to equip-
ment constraint. While traditional meth... |
2.50 |
0% |
See Reviews |
View AI Dashboard |
|
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads |
Solving complex tasks usually requires LLMs to generate long multi-step reasoning chains. Previous work has shown that verifying the correctness of individual reasoning steps can further improve the p... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
DeepHA: Scaling Action Chains Elicits Deep Hierarchical Agents |
Prevailing autonomous agents are often constrained by a single, predefined action space, which limits their generalization capabilities across diverse tasks and can introduce compounding errors throug... |
3.50 |
35% |
See Reviews |
View AI Dashboard |
|
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE |
Native multimodal large language models (MLLMs) restructure a single large language model (LLM) into a spoken language model (SLM) capable of both speech and text generation. Compared to modular and a... |
4.50 |
4% |
See Reviews |
View AI Dashboard |
|
Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World |
Current evaluations of large language models (LLMs) often treat datasets and models in isolation, obscuring phenomena that only emerge from their collective interaction. Items in datasets are reduced ... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Exploring weightless neural networks: From logic gates to convolutional lookup tables |
Increasing the intelligence of everyday objects is facilitated by miniaturized machine learning (ML) models which operate accurately in resource-constrained environments. Applications abound across th... |
4.00 |
0% |
See Reviews |
View AI Dashboard |