|
Benchmarking Open-Set Recognition Beyond Vision-Language Pre-training |
Vision-language models (VLMs) with open-vocabulary pre-training can still fail in classification tasks, especially when the granularity of downstream labels misaligns with the supervision during pre-t... |
5.50 |
5% |
See Reviews |
View AI Dashboard |
|
Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging |
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high... |
4.00 |
37% |
See Reviews |
View AI Dashboard |
|
Brain-Informed Language Model Training Enables Scalable and Generalizable Alignment with Human Brain Activity |
Language models (LMs) provide rich representational spaces that partially align with neural activity during naturalistic experiences such as movie watching. Yet leveraging brain recordings to actively... |
3.00 |
22% |
See Reviews |
View AI Dashboard |
|
Privacy \textit{Déjà Vu} Effect: Resurfacing Sensitive Samples in Continual Fine-tuning |
Continual fine-tuning of large pre-trained models is now ubiquitous in industry for adapting a model to freshly collected user data. Existing privacy protection practices assume earlier training data ... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
DRDFL: Divide-and-conquer Collaboration for Efficient Ring-topology Decentralized Federated Learning |
Federated learning traditionally relies on server-based architecture, which often incur high communication costs and suffer from single points of failure. To avoid these limitations, we explore Ring-t... |
4.50 |
11% |
See Reviews |
View AI Dashboard |
|
DyBraSS: Dynamic Brain State Modeling with State-Space Model |
Brain states, observable through resting-state functional magnetic resonance imaging (rs-fMRI), represent dynamic transitions between recurring connectivity patterns and are closely linked to neurolog... |
3.50 |
26% |
See Reviews |
View AI Dashboard |
|
Better STEP, a format and dataset for boundary representation |
Boundary representation (B-rep) serves as the primary format for 3D geometry in computer-aided design (CAD), integrating parametric geometry with explicit topology to model complex components and asse... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
AsseslyAI: AI–Powered Assessment Framework for Skill-Oriented Engineering Lab Education |
Practical lab education in computer science often faces challenges like plagiarism, lack of proper lab records, inadequate execution and assessment, limited student engagement, and absence of progress... |
0.50 |
55% |
See Reviews |
View AI Dashboard |
|
Q-learning Penalized Transformer for Safe Offline Reinforcement Learning |
This paper addresses the problem of safe offline reinforcement learning, which involves training a policy to satisfy safety constraints using an offline dataset. This problem is inherently challenging... |
3.50 |
7% |
See Reviews |
View AI Dashboard |
|
MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction |
3D Gaussian Splatting (3DGS) has recently emerged as a powerful approach for high-quality dense surface reconstruction of unknown scenes. However, existing methods are limited by the assumption of sta... |
4.00 |
9% |
See Reviews |
View AI Dashboard |
|
Better LMO-based Momentum Methods with Second-Order Information |
The use of momentum in stochastic optimization algorithms has shown empirical success across a range of machine learning tasks.
Recently, a new class of momentum-based stochastic algorithms has emerg... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
FoundationMotion: Auto-labeling and Reasoning about Spatial Movement in Videos |
Motion understanding is fundamental to physical reasoning, enabling models to infer dynamics and predict future states. However, state-of-the-art models still struggle on recent motion benchmarks, pri... |
3.50 |
20% |
See Reviews |
View AI Dashboard |
|
ChainGeo: Enabling Effective Geometric Reasoning in Small VLMs through Interleaved Visual-Text Chains |
Solving geometric problems requires linking visual perception with symbolic reasoning. However, small Vision-Language Models (VLMs) often fail to keep this connection. We introduce ChainGeo, a novel f... |
3.00 |
90% |
See Reviews |
View AI Dashboard |
|
BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning |
Electroencephalography (EEG) is a non-invasive technique for recording brain electrical activity, widely used in brain-computer interface (BCI) and healthcare. Recent EEG foundation models trained on ... |
3.00 |
28% |
See Reviews |
View AI Dashboard |
|
Generative Point Tracking with Flow Matching |
Tracking a point through a video can be a challenging task due to uncertainty arising from visual obfuscations, such as appearance changes and occlusions. Although current state-of-the-art discriminat... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Whitened Self-Attention |
Self-attention in Transformer architectures is formulated as a function of the
pairwise contributions between target vectors and their context vectors. This con-
struction implicitly assumes ternary a... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
EHR2Path: Scalable Modeling of Longitudinal Health Trajectories with LLMs |
Healthcare systems face significant challenges in managing and interpreting vast, heterogeneous patient data for personalized care. Existing approaches often focus on narrow use cases with a limited f... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
ConfRAG: Confidence-Guided Retrieval-Augmenting Generation |
Can Large Language Models (LLMs) be trained to avoid hallucinating factual statements, and can Retrieval-Augmented Generation (RAG) be triggered only when necessary to reduce retrieval and computation... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
Visual Sparse Steering (VS2): Unsupervised Adaptation for Image Classification via Sparsity-Guided Steering Vectors |
Steering vision foundation models at test time, without retraining or access to large labeled datasets, is a desirable yet challenging goal, particularly in dynamic or resource-constrained settings. W... |
4.50 |
10% |
See Reviews |
View AI Dashboard |
|
AICrypto: A Comprehensive Benchmark for Evaluating Cryptography Capabilities of Large Language Models |
Large language models (LLMs) have demonstrated remarkable capabilities across a variety of domains. However, their applications in cryptography, which serves as a foundational pillar of cybersecurity,... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Advancing Multimodal Fusion on Heterogeneous Data with Physics-inspired Attention |
Multimodal fusion learning paradigm has shown great potential in various fields such as Medicine, Science, Engineering, etc. as it offers a framework to jointly learn from heterogeneous data sources. ... |
3.33 |
0% |
See Reviews |
View AI Dashboard |
|
The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity |
Accurate uncertainty quantification (UQ) in Large Language Models (LLMs) is
critical for trustworthy deployment. While real-world language is inherently am-
biguous, existing UQ methods are typically ... |
4.67 |
0% |
See Reviews |
View AI Dashboard |
|
Sparse-Smooth Decomposition for Nonlinear Industrial Time Series Forecasting |
Industrial time series forecasting faces unique challenges: hundreds of correlated sensors, complex nonlinear dynamics, and the critical need for interpretable models that engineers can trust. We intr... |
2.50 |
100% |
See Reviews |
View AI Dashboard |
|
EmoDialogCN: A Multimodal Mandarin Dyadic Dialogue Dataset of Emotions |
Face-to-face audiovisual interaction is fundamental to human communication, conveying rich and spontaneous emotional expressions. However, existing multimodal dialogue datasets suffer from irregular f... |
3.33 |
22% |
See Reviews |
View AI Dashboard |
|
Yet Another Scaling Axis with Some Free Lunch: Enlarging Token-indexed Parameters |
The scaling laws of large language models have driven remarkable progress, yet they reveal a fundamental bottleneck: performance gains from adding parameters diminish while computational costs grow ne... |
4.50 |
43% |
See Reviews |
View AI Dashboard |
|
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs |
We present SpinBench, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs).
SpinBench is designed around the core challenge of spatial reasoni... |
5.60 |
42% |
See Reviews |
View AI Dashboard |
|
2D Quantization for Ultra‑low‑bit Optimizers |
Optimizer states used to accelerate neural network training become a significant memory bottleneck as model size grows. A common mitigation is to compress these high-precision states to low-bit repres... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering |
Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in ... |
6.67 |
14% |
See Reviews |
View AI Dashboard |
|
Supervised Fine-Tuning on Ambiguous Preference Pairs Boosts LLM Alignment |
Preference learning constitutes a fundamental component in aligning large language models (LLMs) with human values and ethical expectations, where the quality of preference data plays a critical role.... |
2.50 |
4% |
See Reviews |
View AI Dashboard |
|
SeMa3D: Lifting Vision-Language Models for Unsupervised 3D Semantic Correspondence |
We tackle unsupervised dense semantic correspondence for 3D shapes, focusing on severe \textbf{non-isometric} deformations and \textbf{inter-class} matching--a regime where conventional functional map... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
Leveraging Label Dependencies for Calibration in Multi-Label Classification through Proper Scoring Rule |
Modern Deep Neural Networks(DNNs) trained by using cross entropy for binary or multi-class classification are known to produce poorly calibrated probability estimates. While various calibration method... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models |
Complex visual narratives, such as comics, present a significant challenge to Vision-Language Models (VLMs). Despite excelling on natural images, VLMs often struggle with stylized line art, onomatopoe... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Confident and Adaptive Generative Speech Recognition via Conformal Risk Control |
Automatic Speech Recognition (ASR) systems frequently produce transcription errors due to acoustic variability, which require post-processing correction methods. Recent approaches leverage Large Langu... |
4.67 |
69% |
See Reviews |
View AI Dashboard |
|
Efficient Generative Models Personalization via Optimal Experimental Design |
Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-eff... |
5.00 |
4% |
See Reviews |
View AI Dashboard |
|
Self-Attention-Guided Genetic Programming: Leveraging BERT for Enhanced Tree-Structured Data Operations |
This study investigates the application of BERT to tree-structured data which presents a significant challenge due to its lack of explicit sequential order and complex topological dependencies. While ... |
2.50 |
6% |
See Reviews |
View AI Dashboard |
|
Meta-Optimizing ML Model Training |
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In ... |
2.67 |
0% |
See Reviews |
View AI Dashboard |
|
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving |
Recent advances towards End-to-End Autonomous Driving (E2E-AD) focus on integrating modular designs into a unified framework for joint optimization. Most of these advances follow a sequential paradigm... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora |
Retrieval-Augmented Generation (RAG) is widely used to mitigate hallucinations of Large Language Models (LLMs) by leveraging external knowledge. While effective for simple queries, traditional RAG sys... |
6.00 |
18% |
See Reviews |
View AI Dashboard |
|
Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity |
Recently, Ainsworth et al. empirically demonstrated that, given two independently trained models, applying a parameter permutation that preserves the input–output behavior allows the two models to be ... |
5.00 |
10% |
See Reviews |
View AI Dashboard |
|
DIVER : Large Language Model Decoding with Span-Level Mutual Information Verification |
Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often strug... |
3.33 |
0% |
See Reviews |
View AI Dashboard |
|
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models |
In recent years, 2D Vision-Language Models (VLMs) have made significant strides in image-text understanding tasks. However, their performance in 3D spatial comprehension, which is critical for embodie... |
5.00 |
9% |
See Reviews |
View AI Dashboard |
|
IAgent: A Web Search Framework for Noise Isolation and Extended Information Access |
Web search agent frameworks built on Large Language Models (LLMs) can leverage multi-source, real-time external information, demonstrating strong potential. However, for deep research tasks that requi... |
3.50 |
4% |
See Reviews |
View AI Dashboard |
|
Convergence Theory of Decentralized Diffusion Models via Pseudo-Non-Markov Analysis |
Diffusion probabilistic models (DPMs) have demonstrated remarkable success in generative tasks, supported by a solid foundation of convergence analysis.
Recently, decentralized DPMs have been propo... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning |
We present MA-RAG, a Multi-Agent framework for Retrieval-Augmented Generation (RAG) that addresses the inherent ambiguities and reasoning challenges in complex information-seeking tasks. Unlike conven... |
3.00 |
64% |
See Reviews |
View AI Dashboard |
|
Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity |
Standard machine learning models optimized for average performance often fail on minority subgroups and lack robustness to distribution shifts. This challenge worsens when subgroups are latent and aff... |
4.00 |
48% |
See Reviews |
View AI Dashboard |
|
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier |
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions t... |
5.50 |
26% |
See Reviews |
View AI Dashboard |
|
From Single to Dual Reference: Reinforcement-Aligned Multi-Image Instruction-Guided Editing |
Most instruction-guided image editing models assume a single reference image. However, many real-world tasks—such as combining people into a group portrait, integrating a subject into a scene, or tran... |
2.50 |
24% |
See Reviews |
View AI Dashboard |
|
Shepherd: Pattern-Guided Trajectory Selection for Coding Agents on SWE-Bench |
Despite major improvements in LLM coding agents, their performance on complex software engineering tasks is still limited—leading models to solve only about half of the software engineering tasks in b... |
3.50 |
7% |
See Reviews |
View AI Dashboard |
|
Be Affective, Not Just Cognitive - Towards Imparting Pertinent Empathy in Dialogue Agents |
Empathetic Response Generation (ERG) has gained significant attention in diverse areas but still faces challenges that hinder its effectiveness. These challenges include $1$) the lack of affective emp... |
4.67 |
N/A |
See Reviews |
|
|
Layer-wise dynamic rank for compressing large language models |
Large language models (LLMs) have rapidly scaled in size, bringing severe memory and computational challenges that hinder their deployment. Singular Value Decomposition (SVD)-based compression has eme... |
4.00 |
5% |
See Reviews |
View AI Dashboard |