№01
cs.AI arxiv:2605.07926v1

AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

Zhengkang Guo, Yiyang Li, Lin Qiu et al.

This paper introduces AgentEscapeBench, a novel benchmark designed to evaluate LLM agents' ability to perform out-of-domain, tool-grounded reasoning with long-range dependencies. The benchmark uses escape-room-style tasks requiring agents to infer and execute complex tool-use procedures, demonstrating a significant per…

9
№02
cs.AI arxiv:2605.08037v1

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

Ning Liu, Chuanneng Sun, Kristina Klinkner et al.

This paper introduces GraphDPO, a generalization of Direct Preference Optimization (DPO) that handles preference data structured as graphs, rather than just pairs. By optimizing a graph-structured objective, GraphDPO leverages richer preference information, enforces transitivity, and avoids issues arising from collapsi…

9
№03
cs.AI arxiv:2605.08060v1

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Jiayuan Liu, Tianqin Li, Shiyi Du et al.

This paper introduces the "memory curse," demonstrating that expanding LLM agents' context windows can paradoxically *decrease* cooperation in multi-agent social dilemmas. The core method involves extensive testing across various LLMs and games, revealing that increased memory leads to a decline in forward-looking coop…

9
№04
cs.AI arxiv:2605.07990v1

Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu, Ze Wang, Seonglae Cho et al.

This paper demonstrates that language models' tool-calling decisions are linearly encoded within their internal activations. By manipulating the difference in average activations between tool representations, researchers can reliably steer the model to select a different tool. This discovery also allows for pre-executi…

9
№05
cs.LG arxiv:2605.07840v1

RelAgent: LLM Agents as Data Scientists for Relational Learning

Xingyue Huang, Louis Tichelman, Jinwoo Kim et al.

RelAgent is an LLM-based autonomous data scientist for relational learning. It first uses LLM agents with workspace tools to automatically generate SQL feature programs and select a predictive model. The contribution is a two-phase approach that results in fast, interpretable, and scalable predictors composed of SQL qu…

9
№06
cs.CL arxiv:2605.07982v1

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney et al.

GLiGuard reformulates LLM content moderation as a classification problem, moving away from slow, generation-based guardrails. Its core method uses a small, schema-conditioned bidirectional encoder to process task definitions and label semantics directly as structured tokens. This allows for efficient, simultaneous eval…

9
№07
cs.CL arxiv:2605.07933v1

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov et al.

This paper introduces the Latent Diffusion Language Model (LDLM), which jointly trains an encoder, diffusion model, and decoder for non-autoregressive text generation. The core method involves reshaping pre-trained language model representations into a latent space suitable for denoising and decoding. The key contribut…

9
№08
cs.CL arxiv:2605.07925v1

How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter, Katherine Metcalf et al.

This paper investigates how fine-tuning Large Language Models (LLMs) with specific values impacts their behavior. The core method involves fine-tuning models on curated value subsets and measuring changes in other value expressions, safety, and performance. The key contribution is demonstrating that value induction can…

9
№09
cs.AI arxiv:2605.07830v1

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim et al.

This paper introduces CyBiasBench, a benchmark designed to quantify attack-selection bias in LLM agents used for cybersecurity. The core method involves evaluating five LLM agents across various scenarios to reveal their tendency to disproportionately focus on specific attack families, independent of prompt variations.…

8
№10
cs.AI arxiv:2605.08063v1

Flow-OPD: On-Policy Distillation for Flow Matching Models

Zhen Fang, Wenxuan Huang, Yu Zeng et al.

Flow-OPD addresses bottlenecks in multi-task flow matching models by using on-policy distillation. It first trains specialized "teacher" models for individual tasks, then distills their expertise into a single "student" model through a novel two-stage alignment process. This approach aims to overcome reward sparsity an…

8
№11
cs.AI arxiv:2605.07865v1

KL for a KL: On-Policy Distillation with Control Variate Baseline

Minjae Oh, Sangjun Song, Gyubin Choi et al.

This paper introduces vOPD, a method to stabilize On-Policy Distillation (OPD) for large language models. It achieves this by framing OPD as policy-gradient reinforcement learning and incorporating a control variate baseline, specifically a value function. The key contribution is that this value function has a closed-f…

8
№12
cs.AI arxiv:2605.08013v1

Learning CLI Agents with Structured Action Credit under Selective Observation

Haoyang Su, Ying Wen

This paper introduces a novel approach for training command-line interface (CLI) agents by leveraging the inherent structure of CLI actions. To address challenges of partial observation and sparse rewards, it proposes $σ$-Reveal to selectively extract relevant context and Action Advantage Assignment to better attribute…

8
№13
cs.AI arxiv:2605.08019v1

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews et al.

This paper investigates whether advanced Large Reasoning Models (LRMs) can replicate human learning and planning in novel video games. By analyzing human gameplay with fMRI data, the study finds that LRMs better match human learning behaviors and predict brain activity compared to reinforcement learning agents. This su…

8
№14
cs.AI arxiv:2605.07935v1

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

Shuren Xia, Qiwei Li, Taqiya Ehsan et al.

TraceFix is a verification-first pipeline that uses TLA+ model checking to automatically repair LLM multi-agent coordination protocols. An LLM agent synthesizes a protocol, generates TLA+ logic, and iteratively refines it using counterexamples until verified. This verified protocol is then compiled into system prompts,…

8
№15
cs.LG arxiv:2605.07863v1

ADKO: Agentic Decentralized Knowledge Optimization

Lucas Nerone Rillo, Zhanhong Jiang, Nastaran Saadati et al.

ADKO is a framework for collaborative black-box optimization among autonomous agents. Its core method involves each agent maintaining a private Gaussian Process surrogate and communicating only through "knowledge tokens," which are compressed summaries of their findings. This approach achieves sample efficiency, privac…

8
№16
cs.LG arxiv:2605.07961v1

Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs

Hanlin Cai, Kai Li, Houtianfu Wang et al.

This paper proposes an Augmented Model Manipulation (AugMP) strategy to attack federated fine-tuning (FFT) of LLMs. The core method uses graph representation learning to understand benign model updates and generate more effective and stealthy malicious updates. The contribution is a novel attack that leverages these in…

8
№17
cs.LG arxiv:2605.07977v1

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Seohyun Lee, Wenzhi Fang, Dong-Jun Han et al.

This paper introduces SPEAR, an online federated learning algorithm for LLMs that enhances self-play. SPEAR leverages real-time user feedback to create advantage-weighted contrastive pairs, enabling efficient fine-tuning on resource-constrained edge devices without requiring privileged ground-truth data. Its core contr…

8
№18
cs.CL arxiv:2605.07937v1

Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?

Anmol Gulati, Hariom Gupta, Elias Lumer et al.

This paper investigates when clarification is most valuable for long-horizon AI agents. They introduce a framework to inject clarifications at different stages of execution and find that the optimal timing depends on the type of missing information. Specifically, goal clarifications are most effective early on, while i…

8
№19
cs.CL arxiv:2605.07883v1

Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

Ying Zhang, Congyu Qiao, Xin Geng et al.

This paper introduces LANCE, a method to reduce "rigid rejection" in LLMs by enhancing safety labels. LANCE uses variational inference to predict a continuous distribution of rejection categories, providing nuanced gradients that allow LLMs to neutralize harmful prompt elements and generate safer, more natural response…

8
№20
cs.CL arxiv:2605.08083v1

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang et al.

This paper introduces AutoTTS, a framework that uses an agentic approach to automatically discover optimal test-time scaling (TTS) strategies for large language models. Instead of manual tuning, AutoTTS creates environments where TTS strategies can be learned efficiently by synthesizing controllers that decide how to a…

8