№01
cs.AI arxiv:2605.16054v1

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Fan Feng, Selena Ge, Minghao Fu et al.

Ada-Diffuser addresses decision-making by treating it as sequence modeling with diffusion models, but crucially incorporates evolving latent dynamics. The core method is a causal diffusion model that simultaneously learns observed interaction patterns and underlying latent processes from minimal observations. This allo…

9
№02
cs.AI arxiv:2605.16205v1

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.

This paper investigates how different design choices for compound LLM agents impact performance and cost in adversarial, partially observable environments. The core method involves a controlled study in a cyber defense simulation, systematically varying agent perception, reasoning strategies, and task decomposition. Th…

9
№03
cs.AI arxiv:2605.16113v1

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

Rui Chu, Bingyin Zhao, Thanh Quoc Hung Le et al.

DebiasRAG is a novel, tuning-free framework that uses retrieval-augmented generation (RAG) to dynamically debias large language models (LLMs) without requiring additional training. By retrieving relevant and unbiased information, it mitigates social biases in LLM outputs while preserving their original generative capab…

9
№04
cs.AI arxiv:2605.16233v1

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.

FORGE is a novel method for improving LLM agent decision-making by evolving natural-language memory without gradient updates. It uses a population-based approach where failed experiences are converted into reusable knowledge (heuristics or demonstrations) by a reflection agent. This memory is then propagated to the pop…

9
№05
cs.AI arxiv:2605.16198v1

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith

This paper bridges formal methods and LLMs to address AI governance. It proposes techniques for auditing and monitoring LLM behavior throughout their lifecycle, enabling the verification of complex, temporally extended constraints like safety and regulatory compliance. The work introduces practical methods for predicti…

9
№06
cs.AI arxiv:2605.16143v1

Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye, Wentao Shi, Yuxin Liu et al.

This paper addresses LLM agents' failure in new environments due to premature action. It introduces "Exploration Checkpoint Coverage" to measure how well agents discover key environmental elements. The core contribution is a training strategy that balances task execution and exploration, leading to the "Explore-then-Ac…

9
№07
cs.AI arxiv:2605.16194v1

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

Arquimedes Canedo

This paper introduces `paper.json`, a companion JSON file to academic PDFs, designed to improve LLM agent comprehension. Its core method is a set of lightweight conventions for stable claim IDs, explicit "does-not-claim" lists, per-figure shell commands, and stable definition IDs. The main contribution is enabling LLM …

9
№08
cs.AI arxiv:2605.16045v1

RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

Zijie Dai, Shiyuan Deng, Sheng Guan et al.

RecMem addresses the inefficiency of LLM agents' memory systems by delaying memory consolidation. Instead of processing every interaction, it stores them in a lightweight "subconscious" layer and only invokes the LLM to extract episodic and semantic memories when recurring, semantically similar interactions are detecte…

9
№09
cs.AI arxiv:2605.16217v1

Argus: Evidence Assembly for Scalable Deep Research Agents

Zhen Zhang, Liangcai Su, Zhuo Chen et al.

Argus addresses the inefficiency of current deep research agents by treating evidence gathering as a jigsaw puzzle. Instead of parallelizing redundant searches, its Searcher collects evidence for sub-queries, while a Navigator manages a shared graph, identifying missing pieces and synthesizing the final, source-traced …

8
№10
cs.AI arxiv:2605.16207v1

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Tahreem Yasir, Wenbo Li, Sam Gilson et al.

This paper evaluates LLM tutoring agents' ability to distinguish between correct, suboptimal, and incorrect student reasoning in propositional logic. The core method involves a benchmark with knowledge-graph ground truth, revealing that LLMs excel at identifying optimal steps but struggle significantly with valid-but-s…

8
№11
cs.AI arxiv:2605.16052v1

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

Parisa Kordjamshidi, Samer Aslan, Madhavan Seshadri et al.

This paper investigates whether LLMs truly reason in tax law or simply regurgitate contaminated training data. They introduce a contamination detection protocol and a novel test suite to evaluate LLMs against neuro-symbolic systems. The findings suggest that legal reasoning is compositional, and neuro-symbolic approach…

8
№12
cs.AI arxiv:2605.16085v1

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Jingcheng Wu, Ratan Bahadur Thapa, Mojtaba Nayyeri et al.

This paper proposes a hybrid deep learning architecture that combines a fine-tuned BART language model with a GraphSAGE-based Graph Neural Network (GNN) to process relational databases. The core method injects relational context from entity graphs into BART's row embeddings, overcoming limitations of previous task-spec…

8
№13
cs.AI arxiv:2605.16079v1

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

Yiming Zhao, Yu Zeng, Wenxuan Huang et al.

VideoSeeker addresses the limitations of text-based prompts in video understanding by introducing a novel paradigm that uses **visual prompts** for instance-level localization. Its core method involves an **agentic reasoning framework** that allows the model to proactively perceive and retrieve relevant video segments …

8
№14
cs.AI arxiv:2605.16035v1

Who Owns This Agent? Tracing AI Agents Back to Their Owners

Ruben Chocron, Doron Jonathan Ben Chayim, Eyal Lenga et al.

This paper addresses the critical problem of **agent attribution**, which is the inability to trace harmful AI agents back to their deploying accounts. The core method involves formalizing this gap and proposing techniques to link observed agent behavior to the responsible account at the hosting vendor. The main contri…

8
№15
cs.CL arxiv:2605.16117v1

SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation

Xin Zhang, Yang Cao, Baoxing Wu et al.

SGR enhances LLM reasoning by generating query-specific subgraphs from external knowledge bases. This framework grounds intermediate reasoning steps in structured knowledge, helping LLMs focus on relevant entities and evidence for more accurate and consistent complex inferences.

8
№16
cs.AI arxiv:2605.16245v1

AI-Mediated Communication Can Steer Collective Opinion

Stratis Tsirtsis, Kai Rawal, Chris Russell et al.

This paper investigates how AI, specifically LLMs, influences collective opinion when mediating human-to-human communication. The core method involves empirical analysis showing LLMs introduce directional biases when editing texts on contested topics, and a theoretical model demonstrating how an AI intermediary can ste…

7
№17
cs.AI arxiv:2605.15942v1

Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation

Chenhao Wang, Yingrui Ji, Yu Meng et al.

This paper proposes a Decomposed Vision-Language Alignment framework to improve open-vocabulary segmentation. It addresses the challenge of unseen attribute-category combinations by factorizing text prompts into concept and attribute tokens, allowing for separate cross-modal interactions. The core contribution lies in …

7
№18
cs.AI arxiv:2605.15975v1

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

Dillon Z. Chen, Till Hofmann, Toryn Q. Klassen et al.

This paper proposes a bilevel policy approach for long-horizon planning in embodied AI. It combines low-level imitation learning for manipulation with high-level symbolic planning, creating a hierarchical system where a symbolic policy guides a neural policy. This method aims to overcome the limitations of pure imitati…

7
№19
cs.AI arxiv:2605.15963v1

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

Jingxuan Wei, Xi Bai, Shan Liu et al.

PAGER addresses the challenge of precise geometric control in GUI agents, where actions require pixel-level accuracy rather than region tolerance. Its core method involves a topology-aware agent that decomposes construction tasks into dependent steps, ensuring geometric correctness and robustness against cascading erro…

7
№20
cs.AI arxiv:2605.16024v1

ScreenSearch: Uncertainty-Aware OS Exploration

Michael Solodko, Justin Wagle

ScreenSearch tackles the challenge of GUI agents exploring operating system states by addressing partial observability. Its core method combines structural screen retrieval and deduplication with an uncertainty-aware graph-bandit algorithm. The key contribution is a novel ambiguity signal that prioritizes exploring sta…

7