From the arXiv
Monday, 18 May 2026 · 20 papers
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser addresses decision-making by treating it as sequence modeling with diffusion models, but crucially incorporates evolving latent dynamics. The core method is a causal diffusion model that simultaneously learns observed interaction patterns and underlying latent processes from minimal observations. This allo…
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
This paper investigates how different design choices for compound LLM agents impact performance and cost in adversarial, partially observable environments. The core method involves a controlled study in a cyber defense simulation, systematically varying agent perception, reasoning strategies, and task decomposition. Th…
DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation
DebiasRAG is a novel, tuning-free framework that uses retrieval-augmented generation (RAG) to dynamically debias large language models (LLMs) without requiring additional training. By retrieving relevant and unbiased information, it mitigates social biases in LLM outputs while preserving their original generative capab…
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
FORGE is a novel method for improving LLM agent decision-making by evolving natural-language memory without gradient updates. It uses a population-based approach where failed experiences are converted into reusable knowledge (heuristics or demonstrations) by a reflection agent. This memory is then propagated to the pop…
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
This paper bridges formal methods and LLMs to address AI governance. It proposes techniques for auditing and monitoring LLM behavior throughout their lifecycle, enabling the verification of complex, temporally extended constraints like safety and regulatory compliance. The work introduces practical methods for predicti…
Look Before You Leap: Autonomous Exploration for LLM Agents
This paper addresses LLM agents' failure in new environments due to premature action. It introduces "Exploration Checkpoint Coverage" to measure how well agents discover key environmental elements. The core contribution is a training strategy that balances task execution and exploration, leading to the "Explore-then-Ac…
paper.json: A Coordination Convention for LLM-Agent-Actionable Papers
This paper introduces `paper.json`, a companion JSON file to academic PDFs, designed to improve LLM agent comprehension. Its core method is a set of lightweight conventions for stable claim IDs, explicit "does-not-claim" lists, per-figure shell commands, and stable definition IDs. The main contribution is enabling LLM …
RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents
RecMem addresses the inefficiency of LLM agents' memory systems by delaying memory consolidation. Instead of processing every interaction, it stores them in a lightweight "subconscious" layer and only invokes the LLM to extract episodic and semantic memories when recurring, semantically similar interactions are detecte…
Argus: Evidence Assembly for Scalable Deep Research Agents
Argus addresses the inefficiency of current deep research agents by treating evidence gathering as a jigsaw puzzle. Instead of parallelizing redundant searches, its Searcher collects evidence for sub-queries, while a Navigator manages a shared graph, identifying missing pieces and synthesizing the final, source-traced …
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
This paper evaluates LLM tutoring agents' ability to distinguish between correct, suboptimal, and incorrect student reasoning in propositional logic. The core method involves a benchmark with knowledge-graph ground truth, revealing that LLMs excel at identifying optimal steps but struggle significantly with valid-but-s…
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
This paper investigates whether LLMs truly reason in tax law or simply regurgitate contaminated training data. They introduce a contamination detection protocol and a novel test suite to evaluate LLMs against neuro-symbolic systems. The findings suggest that legal reasoning is compositional, and neuro-symbolic approach…
Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks
This paper proposes a hybrid deep learning architecture that combines a fine-tuned BART language model with a GraphSAGE-based Graph Neural Network (GNN) to process relational databases. The core method injects relational context from entity graphs into BART's row embeddings, overcoming limitations of previous task-spec…
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
VideoSeeker addresses the limitations of text-based prompts in video understanding by introducing a novel paradigm that uses **visual prompts** for instance-level localization. Its core method involves an **agentic reasoning framework** that allows the model to proactively perceive and retrieve relevant video segments …
Who Owns This Agent? Tracing AI Agents Back to Their Owners
This paper addresses the critical problem of **agent attribution**, which is the inability to trace harmful AI agents back to their deploying accounts. The core method involves formalizing this gap and proposing techniques to link observed agent behavior to the responsible account at the hosting vendor. The main contri…
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation
SGR enhances LLM reasoning by generating query-specific subgraphs from external knowledge bases. This framework grounds intermediate reasoning steps in structured knowledge, helping LLMs focus on relevant entities and evidence for more accurate and consistent complex inferences.
AI-Mediated Communication Can Steer Collective Opinion
This paper investigates how AI, specifically LLMs, influences collective opinion when mediating human-to-human communication. The core method involves empirical analysis showing LLMs introduce directional biases when editing texts on contested topics, and a theoretical model demonstrating how an AI intermediary can ste…
Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation
This paper proposes a Decomposed Vision-Language Alignment framework to improve open-vocabulary segmentation. It addresses the challenge of unseen attribute-category combinations by factorizing text prompts into concept and attribute tokens, allowing for separate cross-modal interactions. The core contribution lies in …
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
This paper proposes a bilevel policy approach for long-horizon planning in embodied AI. It combines low-level imitation learning for manipulation with high-level symbolic planning, creating a hierarchical system where a symbolic policy guides a neural policy. This method aims to overcome the limitations of pure imitati…
PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control
PAGER addresses the challenge of precise geometric control in GUI agents, where actions require pixel-level accuracy rather than region tolerance. Its core method involves a topology-aware agent that decomposes construction tasks into dependent steps, ensuring geometric correctness and robustness against cascading erro…
ScreenSearch: Uncertainty-Aware OS Exploration
ScreenSearch tackles the challenge of GUI agents exploring operating system states by addressing partial observability. Its core method combines structural screen retrieval and deduplication with an uncertainty-aware graph-bandit algorithm. The key contribution is a novel ambiguity signal that prioritizes exploring sta…