|
A Diffusion-Based Data Augmentation Approach for Synthetic Human Portraits Dataset |
Deep learning models have achieved remarkable success in computer vision. However, their generalizability remains limited when applied to new tasks. Data augmentation can help mitigate this issue, but... |
0.00 |
23% |
See Reviews |
View AI Dashboard |
|
Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing |
In LLMs-based multi-agent systems, the Key-Value (KV) cache serves as a critical carrier of agents' working memory, and its efficient reuse is paramount for enhancing the service throughput and infere... |
4.00 |
31% |
See Reviews |
View AI Dashboard |
|
Flow Matching with Semidiscrete Couplings |
Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE.
These models are often trained using flow matching, i.e. by sampling random pairs of noi... |
5.00 |
0% |
See Reviews |
View AI Dashboard |
|
Towards Omnidirectional Reasoning: A Dataset, Benchmark, and GRPO-based Method |
Omnidirectional images (ODIs), with their 360° field of view, provide unparalleled spatial awareness for immersive applications like augmented reality and embodied AI. However, the capability of exist... |
2.50 |
25% |
See Reviews |
View AI Dashboard |
|
Conformal Non-Coverage Risk Control (CNCRC): Risk-Centric Guarantees for Predictive Safety in High-Stakes Settings |
Standard Conformal Prediction (CP) guarantees that prediction sets contain the true label with high probability, but it is *cost-blind*, treating all errors as equally important---a critical limitatio... |
4.50 |
31% |
See Reviews |
View AI Dashboard |
|
Revisiting Spectral Representations in Generative Diffusion Models |
Diffusion models have shown remarkable performance on diverse generation tasks. Recent work finds that imposing representation alignment on the hidden states of diffusion networks can both facilitate ... |
3.50 |
4% |
See Reviews |
View AI Dashboard |
|
Outrageously Large Context Windows via RACE Attention -- A Family of Non-Linear Attention that can be calculated in Strictly Linear-Time |
Quadratic attention strains memory and time, and even FlashAttention on a GH200 (96 GB) cannot complete a a single forward–backward step of a multi-head attention layer at sequence lengths over one mi... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
CamoDocs: Poisoning Attack against Retrieval-Augmented Language Models |
As retrieval-augmented generation (RAG) grows in popularity for compensating the knowledge cutoff of pretrained language models, its security concerns have also increased: RAG retrieves external docum... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
The Sword of DamocleSpeech: Demystifying Jailbreaking Attack in Discrete Token-based Speech Large Language Models |
Speech Large Language Models (SpeechLLMs) and Omni models have recently achieved remarkable progress in human-like dialogue, prosody, and expressive emotion. How- ever, due to fragmented architectures... |
2.50 |
0% |
See Reviews |
View AI Dashboard |
|
Co-Evolving Agents: Learning from Failures as Hard Negatives |
The rapid progress of large foundation models has accelerated the development of task-specialized agents across diverse domains. However, the effectiveness of agents remains tightly coupled with the q... |
4.00 |
24% |
See Reviews |
View AI Dashboard |
|
Estimating Markov Chain Transition Probabilities for Steady Aging Models from $n$-step Data |
Usage-dependent aging processes of products such as batteries have become increasingly important. However, in general, revealing aging dynamics of discrete time Markov Decision Processes (MDP) from sp... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Bayesian Combinatorial Lottery Ticket Machine: Bayes Meets Extremal Combinatorics |
Inspired by the lottery ticket hypothesis (LTH) suggesting that the redundancy of neural networks (NNs) is useful for regression tasks, this paper demonstrates that the redundancy suggested by \emph{e... |
5.50 |
0% |
See Reviews |
View AI Dashboard |
|
WeFT: Weighted Entropy-driven Fine-Tuning for dLLMs |
Diffusion models have recently shown strong potential in language modeling, offering faster generation compared to traditional autoregressive approaches. However, applying supervised fine-tuning (SFT)... |
4.00 |
6% |
See Reviews |
View AI Dashboard |
|
The Spatial Blindspot of Vision-Language Models |
Vision-language models (VLMs) have advanced rapidly, but their ability to capture spatial relationships remains a critical blindspot. Current VLMs are typically built with contrastive language-image p... |
3.20 |
5% |
See Reviews |
View AI Dashboard |
|
LOSI: Improving Multi-agent Reinforcement Learning via Latent Opponent Strategy Identification |
In collaborative Multi-Agent Reinforcement Learning (MARL), agents must contend with non-stationarity introduced not only by teammates’ concurrent decisions but also by partially observable and divers... |
3.00 |
52% |
See Reviews |
View AI Dashboard |
|
Kaleidoscopic Teaming in Multi Agent Simulations |
Warning: This paper contains content that may be inappropriate or offensive.
AI agents have gained significant recent attention due to their autonomous tool usage capabilities and their integration i... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
PairBench: Are Vision-Language Models Reliable at Comparing What They See? |
Understanding how effectively large vision language models (VLMs) compare visual inputs is crucial across numerous applications, yet this fundamental capability remains insufficiently assessed. While ... |
3.00 |
8% |
See Reviews |
View AI Dashboard |
|
Less Is More: Generating Time Series with LLaMA-Style Autoregression in Simple Factorized Latent Spaces |
Generative models for multivariate time series are essential for data augmentation, simulation, and privacy preservation, yet current state-of-the-art diffusion-based approaches are slow and limited t... |
3.00 |
34% |
See Reviews |
View AI Dashboard |
|
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models |
Large Vision-Language Models (LVLMs) demonstrate exceptional performance across multimodal tasks, yet remain vulnerable to jailbreak attacks that bypass safety mechanisms. Existing jailbreak methods s... |
2.67 |
55% |
See Reviews |
View AI Dashboard |
|
GaussianTrim3R: Controllable 3D Gaussians Pruning for Feedforward models |
Feed-forward methods offer a promising paradigm for novel-view synthesis, eliminating computationally expensive per-scene optimization. However, current feed-forward approaches typically predict a fix... |
5.50 |
10% |
See Reviews |
View AI Dashboard |
|
Simplify In-Context Learning |
Traditional in-context learning (ICL) enhances the performance and capability of large language models (LLMs) primarily by optimizing decomposition strategies, reformatting, and ordering. However, wh... |
3.00 |
6% |
See Reviews |
View AI Dashboard |
|
One-Token Verification for Reasoning LLMs, Anytime, Anywhere |
Reasoning large language models (LLMs) have recently achieved breakthrough performance on complex tasks such as mathematical problem solving. A widely used strategy to further improve their performanc... |
3.50 |
11% |
See Reviews |
View AI Dashboard |
|
More Than a Snapshot: Forcing Temporal Reasoning in Video Segmentation |
Video Reasoning Segmentation (VRS) inherits the settings of reasoning based on world knowledge and spatial contents, lacking queries demanding temporal reasoning according to the unique temporal dynam... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
DTR: Towards optimal token compression with data-driven token ranking for efficient visual-language model inference |
Token compression is crucial for vision-language models (VLMs) inference due to its tremendous computational complexity. Although substantial works with various model-driven methods have been done to ... |
4.67 |
0% |
See Reviews |
View AI Dashboard |
|
improving time complexity of sparsification algorithms |
We improve time complexity of spectral sparsification algorithms, such as Batson, Spielman and Srivastava (BSS-2009), used for iteratively computing spectral sparsifiers of n-vertex graphs or, more ge... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers |
Diffusion models have demonstrated strong capabilities in generating high-quality images. However, as model size increases, the growing memory footprint and inference latency pose significant challeng... |
3.00 |
8% |
See Reviews |
View AI Dashboard |
|
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks |
Fake news undermines societal trust and decision-making across politics, economics, health, and international relations, and in extreme cases threatens human lives and societal safety.
Because fake ne... |
4.50 |
4% |
See Reviews |
View AI Dashboard |
|
LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text |
The widespread use of human-like text from Large Language Models (LLMs) necessitates the development of robust detection systems. However, progress is limited by a critical lack of suitable training d... |
3.00 |
23% |
See Reviews |
View AI Dashboard |
|
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning |
Continual learning (CL) aims to continuously learn new tasks from data streams. While most CL research focuses on mitigating catastrophic forgetting, memorizing outdated knowledge can cause negative t... |
5.00 |
12% |
See Reviews |
View AI Dashboard |
|
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning |
Vision language models (VLMs) are increasingly deployed as controllers with access to external tools for complex reasoning and decision-making, yet their effectiveness remains limited by the scarcity ... |
N/A |
43% |
See Reviews |
View AI Dashboard |
|
Universal Model Routing for Efficient LLM Inference |
Model routing is a simple technique for reducing the inference cost of large language models (LLMs), wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest fea... |
6.50 |
0% |
See Reviews |
View AI Dashboard |
|
LLM-HFR-RL: Large Language Model (LLM)-Driven Cross-Modal Fine-Grained Alignment and Reinforcement Learning for the Prediction of Heart Failure Risk |
Predicting Heart Failure Risk (HFR) using electronic health records (EHR) and generating actionable clinical decisions face significant challenges, including integrating multimodal data, modeling long... |
1.50 |
39% |
See Reviews |
View AI Dashboard |
|
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains |
"Read the room," or the ability to infer others’ mental states from subtle social cues, is a hallmark of human social intelligence but remains a major challenge for current AI systems. Existing social... |
5.50 |
11% |
See Reviews |
View AI Dashboard |
|
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation |
Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning in vision-... |
5.50 |
9% |
See Reviews |
View AI Dashboard |
|
ECHO: Efficient Coarse-Grained Hybrid Optimization — Clip at Batch, Learn at Token |
Reinforcement learning (RL) for large language models (LLMs) typically employs token-level clipping of importance sampling ratios to ensure training stability. While effective at preventing catastroph... |
3.50 |
61% |
See Reviews |
View AI Dashboard |
|
Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning |
While LLMs have revolutionized the field of machine learning due to their high performance on a strikingly wide range of problems, they are also known to hallucinate false answers and underperform on ... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
AgentXploit: End-to-End Red-Teaming for AI Agents Powdered by Multi-Agent Systems |
AI agents, powered by Large Language Model (LLM), are vulnerable to indirect prompt injection attacks, where malicious data from external tools and data sources can manipulate agent behavior.
Existing... |
3.33 |
34% |
See Reviews |
View AI Dashboard |
|
Through BabyAI Steps: Understanding and Evaluating Grounded Intelligence in LLMs |
Does spatial prediction translate to spatial planning in LLMs? We investigate this question through a controlled experimental test bed using a textual adaptation of the procedurally generated BabyAI g... |
3.50 |
36% |
See Reviews |
View AI Dashboard |
|
ViRL-TSC: Enhancing Reinforcement Learning with Vision-Language Models for Context-Aware Traffic Signal Control |
In real-world urban environments, traffic signal control (TSC) must maintain stability and efficiency under highly uncertain and dynamically changing traffic conditions. Although reinforcement learnin... |
4.00 |
28% |
See Reviews |
View AI Dashboard |
|
IUT-Plug: A Plug-in tool for Interleaved Image-Text Generation |
Existing vision language models (VLMs), including GPT-4 and DALL·E, often struggle to preserve logic, object identity, and style in multimodal image-text generation. This limitation significantly hind... |
4.00 |
64% |
See Reviews |
View AI Dashboard |
|
Latent Feature Alignment: Discovering Biased and Interpretable Subpopulations in Face Recognition Models |
Modern face recognition models achieve high overall accuracy but continue to exhibit systematic biases that disproportionately affect certain subpopulations. Conventional bias evaluation frameworks re... |
4.50 |
35% |
See Reviews |
View AI Dashboard |
|
Allusive Adversarial Examples via Latent Space in Multimodal Large Language Models |
Multimodal large language models (MLLMs) generate text by conditioning on heterogeneous inputs such as images and text. We present allusive adversarial examples, a new class of attacks that impercepti... |
4.50 |
5% |
See Reviews |
View AI Dashboard |
|
Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding |
Diagrams convey symbolic information in a visual format rather than a linear stream of words, making them especially challenging for AI models to process. While recent evaluations suggest that vision-... |
3.00 |
26% |
See Reviews |
View AI Dashboard |
|
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression |
Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a significant bottl... |
7.33 |
0% |
See Reviews |
View AI Dashboard |
|
PDE Solvers Should Be Local: Fast, Stable Rollouts with Learned Local Stencils |
Neural operator models for solving partial differential equations (PDEs) often rely on global mixing mechanisms—such as spectral convolutions or attention—which tend to oversmooth sharp local dynamics... |
2.50 |
56% |
See Reviews |
View AI Dashboard |
|
Collaborative Deterministic–Probabilistic Forecasting for Diverse Spatiotemporal Systems |
Probabilistic forecasting is crucial for real-world spatiotemporal systems, such as climate, energy, and urban environments, where quantifying uncertainty is essential for informed, risk-aware decisio... |
4.00 |
9% |
See Reviews |
View AI Dashboard |
|
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing |
In this paper, we study the Kronecker compressed sensing problem, which focuses on recovering sparse vectors using linear measurements obtained using the Kronecker product of two or more matrices. We ... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
Efficient and Sharp Off-Policy Learning under Unobserved Confounding |
We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assu... |
5.33 |
0% |
See Reviews |
View AI Dashboard |
|
HELLoRA: Hot Experts Layer-level Low-Rank Adaptation for MOE Model |
Low-Rank Adaptation (LoRA) has become the dominant paradigm for Parameter-Efficient Fine-Tuning (PEFT) of large language models. However, most prior work focuses on dense architectures. In contrast, M... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Credal Graph Neural Networks for Robust Uncertainty Quantification |
Uncertainty quantification is essential for deploying reliable Graph Neural Networks (GNNs), where existing approaches primarily rely on Bayesian inference or ensembles. In this paper, we introduce th... |
4.50 |
8% |
See Reviews |
View AI Dashboard |