2026-05-11 — Linnet Daily

Marriage is our last, best chance to grow up. — Barth, Joseph 34 items · 3 sections

§I arXiv Papers (20) §II Hacker News (5) §III GitHub Trending (9)

§0 Weather §I arXiv Papers §II Hacker News §III GitHub Trending

§ 0

The Morning

Local weather 1

This morning in

London

Light drizzle

Today's range

11.8°↓5.1°

currently 10.1°

Feels

7.0°

Rain

73%

Wind

12 km/h

Humid

64%

Rise

05:14

Set

20:39

§ I

From the arXiv

arXiv preprints 10 of 20

cs.AIarxiv:2605.07926v1Lead article

AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

Zhengkang Guo, Yiyang Li, Lin Qiu, Xiaohua Wang, Jingwen Xv

his paper introduces AgentEscapeBench, a novel benchmark designed to evaluate LLM agents' ability to perform out-of-domain, tool-grounded reasoning with long-range dependencies. The benchmark uses escape-room-style tasks requiring agents to infer and execute complex tool-use procedures, demonstrating a significant performance drop for both humans and LLMs as dependency depth increases. AgentEscapeBench's core contribution is providing a challenging, automated evaluation for robust agent reasoning beyond simple tool interactions.

Read abstract →Full PDF

Conceptual illustration of AgentEscapeBench. The agent is placed in a themed escape room populated with unfamiliar tools and hidden items. It must explore the environment, invoke tools with correct parameters derived from narrative clues, and propagate intermediate outputs through a multi-step dependency chain to unlock the final exit. — Conceptual illustration of AgentEscapeBench. The agent is placed in a themed escape room populated with unfamiliar tools and hidden items. It must explore the environment, invoke tools with correct parameters derived from narrative clues, and propagate intermediate outputs throug…

GraphDPO pipeline for LLM alignment. For each prompt, the policy samples K K rollouts, which are grouped into equivalence classes according to preference signals. These classes induce a DAG structure whose edges encode dominance relations between groups, with an optional ground-truth node as a global anchor. Equivalence-class masking removes intra-group comparisons so that each response is contrasted only with strictly worse groups via a local Plackett–Luce loss. The resulting losses are aggregated over the graph to update the policy while enforcing transitive preference structure. — GraphDPO pipeline for LLM alignment. For each prompt, the policy samples K K rollouts, which are grouped into equivalence classes according to preference signals. These classes induce a DAG structure …

cs.AIarxiv:2605.08037v1

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

Ning Liu, Chuanneng Sun et al.

This paper introduces GraphDPO, a generalization of Direct Preference Optimization (DPO) that handles preference data structured as graphs, rather than just pairs. By optimizing a graph-structured objective, GraphDPO leverages richer preference information, en…

abstract pdf

cs.AIarxiv:2605.08060v1

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Jiayuan Liu, Tianqin Li et al.

This paper introduces the "memory curse," demonstrating that expanding LLM agents' context windows can paradoxically *decrease* cooperation in multi-agent social dilemmas. The core method involves extensive testing across various LLMs and games, revealing that…

abstract pdf

Schematic of repeated social dilemma interactions between two LLM agents with shared memory.

Overview of the three-stage circuit and steering demonstration. Adding a mean-difference vector redirects tool selection and automatically restructures arguments. Validated across 12 IT models in 3 families (Gemma 3, Qwen 3 / Qwen 2.5, Llama 3.1; 270M–27B). — Overview of the three-stage circuit and steering demonstration. Adding a mean-difference vector redirects tool selection and automatically restructures arguments. Validated across 12 IT models in 3 fa…

cs.AIarxiv:2605.07990v1

Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu, Ze Wang et al.

This paper demonstrates that language models' tool-calling decisions are linearly encoded within their internal activations. By manipulating the difference in average activations between tool representations, researchers can reliably steer the model to select …

abstract pdf

cs.LGarxiv:2605.07840v1

RelAgent: LLM Agents as Data Scientists for Relational Learning

Xingyue Huang, Louis Tichelman et al.

RelAgent is an LLM-based autonomous data scientist for relational learning. It first uses LLM agents with workspace tools to automatically generate SQL feature programs and select a predictive model. The contribution is a two-phase approach that results in fas…

abstract pdf

RelAgent . During the search phase, an LLM agent iteratively proposes and refines a feature program consisting of SQL feature queries { q 1 , … , q n } \{q_{1},$ \dots $,q_{n}\} and a predictive model configuration $ \varphi $ to solve a given task. The agent uses three tools: (1) database exploration via read-only SQL exploration queries, (2) program validation by executing candidate programs on a validation set and receiving performance metrics, and (3) inspection of past trials in the Evaluation Workspace via evaluation queries. Once a final program is selected, the agent is no longer needed at inference time. — RelAgent . During the search phase, an LLM agent iteratively proposes and refines a feature program consisting of SQL feature queries { q 1 , … , q n } \{q_{1},$ \dots $,q_{n}\} and a predictive mod…

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

Urchade Zaratiana, Mary Newhauser et al.

GLiGuard reformulates LLM content moderation as a classification problem, moving away from slow, generation-based guardrails. Its core method uses a small, schema-conditioned bidir…

abstract pdf

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Viacheslav Meshchaninov, Alexander Shabalin et al.

This paper introduces the Latent Diffusion Language Model (LDLM), which jointly trains an encoder, diffusion model, and decoder for non-autoregressive text generation. The core met…

abstract pdf

How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter et al.

This paper investigates how fine-tuning Large Language Models (LLMs) with specific values impacts their behavior. The core method involves fine-tuning models on curated value subse…

abstract pdf

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju et al.

This paper introduces CyBiasBench, a benchmark designed to quantify attack-selection bias in LLM agents used for cybersecurity. The core method involves evaluating five LLM agents …

abstract pdf

Flow-OPD: On-Policy Distillation for Flow Matching Models

Zhen Fang, Wenxuan Huang et al.

Flow-OPD addresses bottlenecks in multi-task flow matching models by using on-policy distillation. It first trains specialized "teacher" models for individual tasks, then distills …

abstract pdf

See all 20 papers →

§ II

The Town Square

Hacker News 5

1299

pts

Top story

Local AI needs to be the norm

The article argues that local AI, running on user devices, should become the standard for privacy, security, and user control, rather than relying on cloud-based solutions.

unix.foo10 May discuss on HN →

270

Maryland citizens hit with $2B power grid upgrade for out-of-state AI

tomshardware.com10 May

209

An AI coding agent, used to write code, needs to reduce your maintenance costs

jamesshore.com10 May

167

PS3 Emulator Devs Politely Ask That People Stop Flooding It with AI PRs

kotaku.com10 May

110

Chrome's AI features may be hogging 4GB of your computer storage