|
Learning from Few Samples with Language-Model Guidance |
We consider the problem of learning a classifier from a small set of high-dimensional datapoints, with access to domain knowledge from a language model or human expert. How should such domain knowledg... |
4.67 |
2% |
See Reviews |
View AI Dashboard |
|
VUGEN: Visual Understanding priors for GENeration |
Recent advances in Vision-Language Models (VLMs) have enabled unified understanding across text and images, yet equipping these models with robust image generation capabilities remains challenging. E... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
PRISON: Unmasking the Criminal Potential of Large Language Models |
As large language models (LLMs) advance, concerns about their misconduct in complex social contexts intensify. Existing research has overlooked the systematic assessment of LLMs’ criminal potential in... |
5.33 |
34% |
See Reviews |
View AI Dashboard |
|
Towards One-step Causal Video Generation via Adversarial Self-Distillation |
Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising, but their sequential, iterative nature leads to error accumulation and long infer... |
6.00 |
5% |
See Reviews |
View AI Dashboard |
|
LEMUR: Leveraging Vision-Language Models for Fine-Grained Multimodal Retrieval |
Fine-grained multimodal retrieval is crucial for many real-world applications. For example, E-commerce product search demands retrieving the product with the most relevant image and description based ... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models |
As Text-to-Image (TTI) diffusion models become increasingly influential in content creation, growing attention is being directed toward their societal and cultural implications. While prior research h... |
6.00 |
36% |
See Reviews |
View AI Dashboard |
|
SFedPO: Streaming Federated Learning with a Prediction Oracle under Temporal Shifts |
Federated Learning (FL) enables decentralized clients to collaboratively train a global model without sharing raw data. However, most existing FL frameworks assume that clients train on static local d... |
4.50 |
16% |
See Reviews |
View AI Dashboard |
|
Memory Makes The Poison: Over Memorization Drives Visual Poisoning in LVLMs |
**The poison is not the pixels.** Large Vision–Language Models (LVLMs) excel across tasks, yet their safety and security remain underexplored. Among threats, \textit{visual perturbation–based data poi... |
2.00 |
0% |
See Reviews |
View AI Dashboard |
|
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training |
Digital agents require diverse, large-scale UI trajectories to generalize across real-world tasks, yet collecting such data is prohibitively expensive in both human annotation, infra and engineering p... |
3.00 |
0% |
See Reviews |
View AI Dashboard |
|
Rethinking Transformer Inputs for Time-Series via Neural Temporal Embedding |
Transformer-based models, originally introduced in the field of natural language processing (NLP), have recently demonstrated strong performance in time-series forecasting. Due to the order-agnostic n... |
3.00 |
5% |
See Reviews |
View AI Dashboard |
|
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference |
Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs). Despite the empirical success of methods such as best-of-$n$ (BoN) sampling and sequen... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Automated Architecture Synthesis for Arbitrarily Structured Neural Networks |
This paper proposes a novel perspective on the architecture of Artificial Neural Networks (ANNs). Conventional ANNs often adopt predefined tree-like or Directed Acyclic Graph (DAG) structures for simp... |
3.00 |
4% |
See Reviews |
View AI Dashboard |
|
WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models |
In the era of large language models (LLMs), watermarking serves as a crucial safeguard for ensuring accountability, authenticity, and trust in machine-generated text. Text generated by LLMs can be ide... |
4.00 |
5% |
See Reviews |
View AI Dashboard |
|
Bridge Policy: Visuomotor Policy Learning via Stochastic Optimal Control |
Imitation learning has been widely used in robotic learning, where policies are derived from expert demonstrations. Recent advances leverage generative models, such as diffusion and flow-based methods... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning |
Large vision-language models have achieved remarkable progress in visual reasoning, yet most existing systems rely on single-step or text-only reasoning, limiting their ability to iteratively refine u... |
5.00 |
18% |
See Reviews |
View AI Dashboard |
|
Generative Model via Quantile Assignment |
Deep Generative models (DGMs) play two central roles in modern machine learning: (i) producing new information (e.g., image synthesis, data augmentation, and creative content generation) and (ii) redu... |
5.00 |
8% |
See Reviews |
View AI Dashboard |
|
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations |
Large Language Model (LLM) based agents are increasingly being deployed as user-friendly front-ends on online platforms, where they filter, prioritize, and recommend information retrieved from the pla... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity |
Multimodal large language models (MLLMs) are widely used in vision-language reasoning tasks. However, their vulnerability to adversarial prompts remains a serious concern, as safety mechanisms often f... |
4.50 |
15% |
See Reviews |
View AI Dashboard |
|
RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models |
Real-world databases are predominantly relational, comprising multiple interlinked tables that contain complex structural and statistical dependencies.
Learning generative models on relational data h... |
4.67 |
3% |
See Reviews |
View AI Dashboard |
|
Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation |
While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-... |
4.00 |
5% |
See Reviews |
View AI Dashboard |
|
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models |
Pre-trained language models have achieved remarkable success across a wide range of natural language processing (NLP) tasks, particularly when fine-tuned on large, domain-relevant datasets. However, t... |
5.33 |
24% |
See Reviews |
View AI Dashboard |
|
Optimal Pricing for Bundles: Using Submodularity in Offline and Online Settings |
We study revenue-maximizing bundle pricing under a cardinality constraint: in each offer the seller chooses a bundle $S\subseteq[n]$ with $|S|\le k$ and posts a single price $p(S)$. Buyers have unknow... |
4.00 |
4% |
See Reviews |
View AI Dashboard |
|
ProSAR: Prototype-Guided Semantic Augmentation and Refinement for Time Series Contrastive Learning |
Contrastive learning has advanced the representation learning of vision, language, and graphs, yet its success hinges greatly on the data augmentation that helps preserve semantic contents while provi... |
4.50 |
22% |
See Reviews |
View AI Dashboard |
|
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs |
Large Language Models (LLMs) demonstrate strong performance on mathematical problems when prompted with Chain-of-Thought (CoT), yet it remains unclear whether this success stems from search, rote proc... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
Evaluating LLM In-Context Few-Shot Learning on Legal Entity Annotation Task |
The emergence of Large Language Models (LLMs) has attracted attention due to their powerful in-context few-shot learning capability. Recent studies present significant results regarding its usage in d... |
2.50 |
0% |
See Reviews |
View AI Dashboard |
|
UMCI: A Unified Counterfactual Framework for Robust Vision-Language Reasoning |
Integrating Large Language Models into vision-language frameworks has led to the rise of powerful Large Vision-Language Models (LVLMs). However, this integration introduces two critical robustness cha... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
RM-R1: Reward Modeling as Reasoning |
Reward modeling is essential for aligning large language models with human preferences through reinforcement learning. To provide accurate reward signals, a reward model (RM) should stimulate deep thi... |
5.00 |
5% |
See Reviews |
View AI Dashboard |
|
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization |
Reinforcement learning in combinatorial action spaces requires searching over exponentially many joint actions to simultaneously select multiple sub-actions that form coherent combinations. Existing a... |
5.00 |
5% |
See Reviews |
View AI Dashboard |
|
Robust Strength Behavior Modeling of Coarse-Grained Soils Using HSIC-Guided Stable Learning |
Coarse-grained soils are widely employed in infrastructure construction, and capturing their strength behavior is vital for ensuring the structural integrity of engineering systems. In recent years, a... |
2.40 |
27% |
See Reviews |
View AI Dashboard |
|
GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-Guided Latent Diffusion Model? |
Image super‑resolution (SR) is fundamental to many vision systems—from surveillance and autonomy to document analysis and retail analytics—because recovering high‑frequency details, especially scene-t... |
5.00 |
10% |
See Reviews |
View AI Dashboard |
|
Value-Anchored Group Policy Optimization for Flow Models |
Group Relative Policy Optimization (GRPO) has proven highly effective in enhancing the alignment capabilities of Large Language Models (LLMs). However, current adaptations of GRPO for the flow matchin... |
3.50 |
15% |
See Reviews |
View AI Dashboard |
|
LLMs Can Generate a Better Answer by Aggregating Their Own Responses |
Large Language Models (LLMs) have shown remarkable capabilities across tasks, yet they often require additional prompting techniques when facing complex problems. While approaches like self-correction... |
2.00 |
16% |
See Reviews |
View AI Dashboard |
|
Self-Guidance: Training VQ-VAE Decoders to be Robust to Quantization Artifacts for High-Fidelity Neural Speech Codec |
Neural speech codecs, predominantly based on Vector-Quantized Variational Autoencoders (VQ-VAEs), serve as fundamental audio tokenizers for speech large language models (SLLMs). However, their reconst... |
5.00 |
32% |
See Reviews |
View AI Dashboard |
|
Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search |
We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of $Q$ queries.
For the high-dimensional regime $d = \omega(\... |
4.80 |
0% |
See Reviews |
View AI Dashboard |
|
$\textbf{SDPose}$: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation |
Pre-trained diffusion models provide rich multi-scale latent features and are emerging as powerful vision backbones. While recent works such as Marigold and Lotus adapt diffusion priors for dense pred... |
4.00 |
5% |
See Reviews |
View AI Dashboard |
|
Unsupervised Behavioral Tokenization and Action Quantization via Maximum Entropy Mixture Policies with Minimum Entropy Components |
A fundamental problem in reinforcement learning is how to learn a concise discrete set of behaviors that can be easily composed to solve any downstream task. An effective "tokenization" of behavior re... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
CAREFL: Context-Aware Recognition of Emotions with Federated Learning |
Emotion recognition from images is a challenging task due to its dependence on subtle visual cues and contextual information. Recent advances in Vision-Language Models (VLMs) have demonstrated strong ... |
4.00 |
66% |
See Reviews |
View AI Dashboard |
|
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training |
While Large Language Models (LLMs) hold promise to become autonomous agents, they often explore suboptimally in sequential decision-making. Recent work has sought to enhance this capability via superv... |
6.00 |
0% |
See Reviews |
View AI Dashboard |
|
Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models |
Multimodal pre-trained models (e.g., ImageBind), which align distinct data modalities into a shared embedding space, have shown remarkable success across downstream tasks. However, their increasing ad... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models |
Multimodal Large Language Models have demonstrated impressive capabilities across tasks, yet they often exhibit difficulty in distinguishing task-relevant from irrelevant signals—particularly in tasks... |
5.00 |
16% |
See Reviews |
View AI Dashboard |
|
Optimizing the Ineffable: Generative Policy Learning for Human-Centered Decision-Making |
Algorithmic decision-making is widely adopted in high-stakes applications affecting our daily lives but often requires human decision-makers to exercise their discretion within the process to ensure a... |
4.50 |
0% |
See Reviews |
View AI Dashboard |
|
Certifying Robustness of Agent Tool-Selection Under Adversarial Attacks |
Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A critical step in this process is tool selection, w... |
4.50 |
41% |
See Reviews |
View AI Dashboard |
|
Gaussian Belief Propagation Network for Depth Completion |
Depth completion aims to predict a dense depth map from a color image with sparse depth measurements. Although deep learning methods have achieved state-of-the-art (SOTA), effectively handling the spa... |
4.50 |
10% |
See Reviews |
View AI Dashboard |
|
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning |
The rapid proliferation of AI-generated images (AIGIs) highlights the pressing demand for generalizable detection methods. In this paper, we establish two key principles for AIGI detection task throug... |
5.50 |
3% |
See Reviews |
View AI Dashboard |
|
IQA-Octopus: Unified Multi-Granularity Image Quality Assessment with Reasoning, Grounding and Referring |
We present IQA-Octopus, the first image quality assessment (IQA) framework that unifies reasoning, grounding, and referring. Built upon large multi-modality models (LMMs), IQA-Octopus is designed to p... |
4.00 |
0% |
See Reviews |
View AI Dashboard |
|
Hallucination Mitigation in Large Vision-Language Models via Adaptive Multi-Subspace Projection |
Recent advances in large vision-language models (LVLMs) have enabled powerful multimodal reasoning by integrating visual encoders with large language models (LLMs). However, their reliability is frequ... |
4.00 |
53% |
See Reviews |
View AI Dashboard |
|
Deformable Contact-Aware 3D Object Placement |
We study language-guided object placement in real 3D scenes when contact is \emph{deformable and frictional}. Rather than guessing a rigid pose that “looks right,” we cast placement as a \emph{drop-to... |
3.00 |
25% |
See Reviews |
View AI Dashboard |
|
Reframing Dense Action Detection (RefDense): A New Perspective on Problem Solving and a Novel Optimization Strategy |
In dense action detection, we aim to detect multiple co-occurring actions. However, action classes are often ambiguous, as they share overlapping sub-components. We argue that the dual challenges of t... |
3.50 |
0% |
See Reviews |
View AI Dashboard |
|
Grid-Based Evolutionary Algorithm for Multi-Objective Molecule Generation Enhanced by Reinforcement Learning |
Fragment-based drug discovery (FBDD) is limited by the need to construct and maintain static fragment libraries. To overcome these challenges, we propose a novel evolutionary framework. Our method sta... |
4.00 |
25% |
See Reviews |
View AI Dashboard |
|
ChartAlignBench: A Benchmark for Chart Grounding & Dense Alignment |
Charts play important roles in visualization, reasoning, and communication in data analysis and idea exchange between humans. However, vision-language models (VLMs) still lack accurate understanding o... |
2.00 |
0% |
See Reviews |
View AI Dashboard |