|
CLAMP: A Chebyshev-Weighted Multi-Gradient Approach for Multi-Objective LLM Alignment |
Alignment in large language models (LLMs) is crucial for enhancing their capabilities to align with human preferences.
To date, many existing alignment approaches, such as reinforcement learning from... |
3.33 |
0% |
See Reviews |
View AI Dashboard |
|
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward |
Enhancing the multimodal reasoning capabilities of Multimodal Large Language Models (MLLMs) is a challenging task that has attracted increasing attention in the community. Recently, several studies ha... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
CMPS: Constrained Mixed Precision Search |
The increasing complexity of deep neural networks (DNNs) requires effective model compression to reduce their computational and memory footprints for deployment on resource-constrained hardware. Mixed... |
2.67 |
24% |
See Reviews |
View AI Dashboard |
|
Unmasking the Tiny: Foreground Probing for Small Object Detection |
Detecting small objects in high-resolution images is challenging, as small targets are often overwhelmed by the surrounding background and thus prone to being missed or misclassified. To address this ... |
4.00 |
27% |
See Reviews |
View AI Dashboard |
|
SpEmoC: Large-Scale Multimodal Dataset for Speaking Segment Emotion Insights |
Understanding human emotions in spoken conversations is a key challenge in affective computing, with applications in empathetic AI, human-computer interaction, and mental health monitoring. Existing d... |
4.67 |
68% |
See Reviews |
View AI Dashboard |
|
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models |
Transformers have achieved great success across a wide range of applications, yet the theoretical foundations underlying their success remain largely unexplored. To demystify the strong capacities of ... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Generalizable and Consistent Granular Edge Prediction |
We introduce a new task in edge detection: Granular Edge Prediction. Unlike traditional binary edge maps, this task aims to predict a categorical edge map, where each edge pixel is assigned a granular... |
5.00 |
5% |
See Reviews |
View AI Dashboard |
|
None to Optima in Few Shots: Bayesian Optimization with MDP Priors |
Bayesian Optimization (BO) is an efficient tool for optimizing black-box functions, but its theoretical guarantees typically hold in the asymptotic regime. In many critical real-world applications suc... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Constructing coherent spatial memory in LLM agents through graph rectification |
Given a map description through global traversal navigation instructions (e.g., visiting each room sequentially with action signals such as north, west, etc.), an LLM can often infer the implicit spat... |
3.33 |
66% |
See Reviews |
View AI Dashboard |
|
REFLEX-Med: Reinforcement for Label-Free Explainability in Unified Medical Reasoning |
Clinicians urgently need explanations they can audit, not merely fluent chains. Yet prevailing practices conflate interpretability with subjective human/LLM rationales, with post-hoc visuals loosely a... |
3.67 |
17% |
See Reviews |
View AI Dashboard |
|
TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting |
Recent progress in time series forecasting has produced large foundation models with strong generalization across domains. However, many of these models rely on transformer backbones, making their eff... |
4.29 |
24% |
See Reviews |
View AI Dashboard |
|
GenCompositor: Generative Video Compositing with Diffusion Transformer |
Video compositing combines live-action footage to create video production, serving as a crucial technique in video creation and film production. Traditional pipelines require intensive labor efforts a... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
CooperTrim: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception |
Cooperative perception enables autonomous agents to share encoded representations over wireless communication to enhance each other’s live situational awareness. However, the tension between the limit... |
5.33 |
7% |
See Reviews |
View AI Dashboard |
|
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion |
Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled mo... |
5.67 |
22% |
See Reviews |
View AI Dashboard |
|
BridgeRAG: A Framework for Reasoning over Partitioned Knowledge Graphs |
Existing Knowledge Graph-based RAG (Retrieval-Augmented Generation) systems face a fundamental dilemma in multi-document scenarios. They either treat each document as an isolated knowledge graph, whic... |
2.67 |
46% |
See Reviews |
View AI Dashboard |
|
Robust Detection of Directional Adversarial Attacks in Deep Neural Networks for Radiological Imaging |
Deep learning is now central to radiology, helping detect changes on X-rays, CTs, and MRIs. However, these systems are highly vulnerable to adversarial attacks - small, crafted perturbations that misl... |
2.50 |
1% |
See Reviews |
View AI Dashboard |
|
Towards Generalizable Implicit In-Context Learning with Attention Routing |
Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of Large Language Models (LLMs), aiming to attain few-shot perform... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Bridging ML and algorithms: comparison of hyperbolic embeddings |
Hyperbolic embeddings are well-studied both in the machine learning and algorithm community. However, as the research proceeds independently in those two communities, comparisons and even awareness se... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Large Language Model Guided Dynamic Branching Rule Scheduling in Branch-and-Bound |
Branch-and-bound (B\&B) is a core technique in state-of-the-art mixed integer linear program (MILP) solvers. It reformulates an MILP into a systematic tree search and recursively partitions it into su... |
4.00 |
20% |
See Reviews |
View AI Dashboard |
|
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time |
We study in-context learning (ICL) of linear regression in a deep linear self-attention model, characterizing how performance depends on various computational and statistical resources (width, depth, ... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models |
Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robot control through natural language instructions. However, their high inference cost—stemming from large-scale token comp... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception |
Multimodal Large Language Models (MLLMs) require high-resolution visual information to perform fine-grained perception, yet processing entire high-resolution images is computationally prohibitive.
Wh... |
5.50 |
7% |
See Reviews |
View AI Dashboard |
|
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization |
While foundation models (FMs), such as diffusion models and large vision-language models (LVLMs), have been widely applied in educational contexts, their ability to generate pedagogically effective vi... |
4.40 |
41% |
See Reviews |
View AI Dashboard |
|
Align Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning |
Current multi-modal models exhibit a notable misalignment with the human visual system when identifying objects that are visually assimilated into the background. Our observations reveal that these mu... |
4.50 |
28% |
See Reviews |
View AI Dashboard |
|
Multimodal Policy Internalization for Conversational Agents |
Modern conversational agents such as ChatGPT and Alexa+ have become indispensable in everyday life. To handle diverse business requirements and enable agentic capabilities, these LLM-based systems oft... |
7.33 |
0% |
See Reviews |
View AI Dashboard |
|
FATE: Focal-modulated Attention Encoder for Multivariate Time-series Forecasting |
Accurate multivariate time-series forecasting is crucial for understanding and mitigating the effects of climate change, as reliable long-horizon predictions support effective monitoring and informed ... |
4.00 |
N/A |
See Reviews |
|
|
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability |
Spiking Neural Networks (SNNs) process information via discrete spikes, enabling them to operate at remarkably low energy levels. However, our experimental observations reveal a striking vulnerability... |
5.50 |
4% |
See Reviews |
View AI Dashboard |
|
An Optimal Algorithm for Marginalization in Bayesian Networks |
We study the problem of marginalization in Bayesian networks: given a Bayesian network $G=(V, E)$ and nodes $S$ we wish to marginalize, what is the most compact Bayesian network $G'$ over nodes $V \se... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Background Matters: Robust 3D Human Pose Estimation via Controllable Video Generation |
Deep learning models for 3D human pose estimation (HPE) often fail to generalize across domains with varying environments, camera setups, or data distributions. We address this challenge with a contro... |
3.33 |
21% |
See Reviews |
View AI Dashboard |
|
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models |
Spoken Language Models (SLMs) are designed to take speech inputs and produce
spoken responses. However, current SLMs lack the ability to perform an internal,
unspoken thinking process before respondin... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Beyond Accuracy: Measuring Reward Variance as a Predictive Benchmark for RLHF |
Reward models (RMs) provide the core signal in reinforcement learning from human feedback (RLHF). However, most evaluations focus on pairwise accuracy and overlook how the separability and concentrati... |
4.00 |
21% |
See Reviews |
View AI Dashboard |
|
Monitoring Decomposition Attacks with Lightweight Sequential Monitors |
As LLMs become more agentic, a critical risk emerges: attackers can decompose harmful goals into stateful, benign subtasks that trick LLM agents into executing them without realizing the harmful inten... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities |
Reasoning on Temporal Knowledge Graphs (TKGs) is essential for predicting future events and time-aware facts. While existing methods are effective at capturing relational dynamics, their performance i... |
6.00 |
69% |
See Reviews |
View AI Dashboard |
|
EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences |
As large language models (LLMs) are deployed globally, creating pluralistic systems that can accommodate the diverse preferences and values of users worldwide becomes essential. We introduce EVALUESTE... |
2.67 |
6% |
See Reviews |
View AI Dashboard |
|
G-Verifier: Geometric Verifier for Robust 3D Point Cloud Semantic Search with Spatial Relation Reasoning |
Semantic search in 3D point clouds is a fundamental task for Spatial Intelligence and embodied AI, yet it becomes particularly challenging when queries involve precise spatial relationships and curren... |
2.67 |
13% |
See Reviews |
View AI Dashboard |
|
IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset |
To address the challenges in the lack of data for evaluating identity document fraud detection models provided by vendors or merchants, we propose IDSPACE, a cost-effective framework for generating hi... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism |
In recent years, Transformer-based architectures have become the dominant method for Computer Vision applications. While Transformers are explainable and scale well with dataset size, they lack the in... |
0.50 |
0% |
See Reviews |
View AI Dashboard |
|
AALawyer: A Generative Retrieval-Augmented Large Language Model System for Legal Reasoning |
With the growing potential of large language models (LLMs) in the legal domain, an increasing number of specialized legal models are being developed and introduced. Among them, domain-specific finetun... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? |
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-esta... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation |
Knowledge Distillation (KD) can transfer the reasoning abilities of large models to smaller ones, which can reduce the costs to generate Chain-of-Thoughts for reasoning tasks. KD methods typically ask... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Revisiting the Role of Homophily in Fair Graph Representation Learning |
Graph Neural Networks (GNNs) can propagate sensitive signals via message passing, especially on homophilous graphs where edges preferentially connect nodes sharing sensitive attributes. We revisit fai... |
3.50 |
28% |
See Reviews |
View AI Dashboard |
|
Geometric and Information Compression of Representations in Deep Learning |
Deep neural networks transform input data into latent representations that support a
wide range of downstream tasks. These representations can be characterized along
information-theoretic and geometri... |
4.00 |
8% |
See Reviews |
View AI Dashboard |
|
STDACN: a Spatiotemporal Prediction Framework based on Dynamic and Adaptive Convolution Networks |
With the rapid advancement of sensor technologies, analyzing and modeling large spatiotemporal datasets has become crucial, enabling system state predictions for intelligent transportation, urban plan... |
3.00 |
6% |
See Reviews |
View AI Dashboard |
|
Simple Stepsizes for Quasi-Newton Methods with Global Convergence Guarantees |
Quasi-Newton methods are widely used for solving convex optimization problems due to their ease of implementation, practical efficiency, and strong local convergence guarantees. However, their global ... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models |
Short Answer Scoring (SAS) is a critical task in automated subjective answer grading, playing an essential role in education, standardized testing, and large-scale assessment systems. However, existin... |
4.50 |
6% |
See Reviews |
View AI Dashboard |
|
PSC: Efficient Grammar-Constrained Decoding via Parser Stack Classification |
LLMs are widely used to generate structured output like source code or JSON. Grammar-constrained decoding (GCD) can guarantee the syntactic validity of the generated output, by masking out tokens that... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
APEX: One-Step High-Resolution Image Synthesis |
The pursuit of efficient text-to-image synthesis has driven the field toward a few-step generation paradigm, yet this endeavor is hampered by a persistent trilemma: achieving high fidelity, inference ... |
5.50 |
22% |
See Reviews |
View AI Dashboard |
|
Beyond Pairwise Modeling: Towards Efficient and Robust Trajectory Similarity Computation via Representation Learning |
Accurate trajectory similarity computation is crucial in ride-sharing applications, where trajectories of varying lengths need to be aligned into a uniform representation. Existing methods suffer from... |
4.00 |
31% |
See Reviews |
View AI Dashboard |
|
Shape-Adaptive Guidance Signal for Interactive Cortical Sulcal Labeling |
Image segmentation is a fundamental task in image data analysis that assigns a semantic label to enhance the understanding of imaging data. In the context of neuroimaging data, the accurate labeling o... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Libra-Emo: A Large Dataset for Multimodal Fine-grained Negative Emotion Detection |
The recognition of negative emotions is pivotal in numerous real-world applications, including public opinion analysis, customer service, emotional attribution, and emotional support systems, where th... |
5.00 |
15% |
See Reviews |
View AI Dashboard |