Daily Issue
Vol. I — No. 6
18 · 05
Monday, 18 May 2026
Generated 2026-05-18 12:51
google/gemini-2.5-flash-lite
There is something terribly morbid in the modern sympathy with pain. One should sympathise with the colour, the beauty, the joy of life. The less said about life's sores the better. — Oscar Wilde 36 items · 3 sections
§ 0

The Morning

Local weather 1
This morning in
London
Overcast
Today's range
14.1°8.5°
currently 13.5°
Feels
9.6°
Rain
90%
Wind
19 km/h
Humid
60%
Rise
05:03
Set
20:49
§ I

From the arXiv

arXiv preprints 10 of 20
cs.AIarxiv:2605.16054v1Lead article

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Fan Feng, Selena Ge, Minghao Fu, Zijian Li, Yujia Zheng

da-Diffuser addresses decision-making by treating it as sequence modeling with diffusion models, but crucially incorporates evolving latent dynamics. The core method is a causal diffusion model that simultaneously learns observed interaction patterns and underlying latent processes from minimal observations. This allows for more precise environment modeling and effective planning and control by explicitly accounting for hidden factors influencing agent behavior.

(a) SCM of the Latent Contextual POMDP. Gray/white nodes are observed/latent variables; green/red edges represent transitions driven by latents/expert policies, respectively. (b) Examples where latents influence either dynamics or rewards (affecting optimal actions).
(a) SCM of the Latent Contextual POMDP. Gray/white nodes are observed/latent variables; green/red edges represent transitions driven by latents/expert policies, respectively. (b) Examples where latents influence either dynamics or rewards (affecting optimal actions).
Figure 1. End-to-end system architecture. The deterministic layer (left) compiles structured context from CybORG observations and assembles the agent prompt. The Planner (right) executes a ReAct loop, optionally delegating to Analyst and ActionChooser sub-agents, before emitting a validated action back to the environment.
Figure 1. End-to-end system architecture. The deterministic layer (left) compiles structured context from CybORG observations and assembles the agent prompt. The Planner (right) executes a ReAct loop,…
cs.AIarxiv:2605.16205v1

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Igor Bogdanov, Chung-Horng Lung et al.

This paper investigates how different design choices for compound LLM agents impact performance and cost in adversarial, partially observable environments. The core method involves a controlled study in a cyber defense simulation, systematically varying agent …

cs.AIarxiv:2605.16113v1

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

Rui Chu, Bingyin Zhao et al.

DebiasRAG is a novel, tuning-free framework that uses retrieval-augmented generation (RAG) to dynamically debias large language models (LLMs) without requiring additional training. By retrieving relevant and unbiased information, it mitigates social biases in …

Figure 1 . System workflow of DebiasRAG. The workflow consists of three main components. The first stage (Upper Block) involves document preparation and preprocessing, including management of the Avoid Document Repo, along with user-provided input documents (Optional). The second stage (Middle Block) performs reverse-generation of debiasing performance based on the user’s input to establish a baseline for effective real-time operation. For the third stage (Lower Block), real-time debias-guided reranking optimization, integrates embedding retrieval, gradient-based reranking, and generation, working dynamically to debias the reasoning and output process of large language models.
Figure 1 . System workflow of DebiasRAG. The workflow consists of three main components. The first stage (Upper Block) involves document preparation and preprocessing, including management of the Avoi…
Figure 1. System Overview. (Left) Hierarchical ReAct agent with dynamic memory injection. (Right) Reflexion learning loop: upon a reward below threshold, a dedicated Reflector or Exemplifier agent analyzes the full trajectory and synthesizes knowledge artifacts that are injected back into the agent’s memory.
Figure 1. System Overview. (Left) Hierarchical ReAct agent with dynamic memory injection. (Right) Reflexion learning loop: upon a reward below threshold, a dedicated Reflector or Exemplifier agent ana…
cs.AIarxiv:2605.16233v1

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov, Chung-Horng Lung et al.

FORGE is a novel method for improving LLM agent decision-making by evolving natural-language memory without gradient updates. It uses a population-based approach where failed experiences are converted into reusable knowledge (heuristics or demonstrations) by a…

cs.AIarxiv:2605.16198v1

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari, Toryn Q. Klassen et al.

This paper bridges formal methods and LLMs to address AI governance. It proposes techniques for auditing and monitoring LLM behavior throughout their lifecycle, enabling the verification of complex, temporally extended constraints like safety and regulatory co…

Figure 1 . Overview of Temporal Rule Assessment and Compliance (TRAC) : This figure depicts the base TRAC algorithm (inner green box) and TRAC with predictive and intervening capabilities ( TRAC P+I \( \text{TRAC} \)_{\( \text{P+I} \)} ) (outer blue box). An AI agent interacts with an environment over time, producing a sequence of inputs (from the environment) and outputs (from the agent). The Labeler extracts atomic propositions from the sequence of inputs and outputs so far, which then are used by the Monitor to progressively evaluate the monitoring objective (i.e., a behavioral pattern represented as an LTL formula). The Predictor estimates the risk of future violations, enabling the Intervenor to modify the agent’s inputs or substitute its outputs before an undesirable outcome occurs.
Figure 1 . Overview of Temporal Rule Assessment and Compliance (TRAC) : This figure depicts the base TRAC algorithm (inner green box) and TRAC with predictive and intervening capabilities ( TRAC P+I \…
№06
cs.AI
9

Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye, Wentao Shi et al.

This paper addresses LLM agents' failure in new environments due to premature action. It introduces "Exploration Checkpoint Coverage" to measure how well agents discover key enviro…

№07
cs.AI
9

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

Arquimedes Canedo

This paper introduces `paper.json`, a companion JSON file to academic PDFs, designed to improve LLM agent comprehension. Its core method is a set of lightweight conventions for sta…

№08
cs.AI
9

RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

Zijie Dai, Shiyuan Deng et al.

RecMem addresses the inefficiency of LLM agents' memory systems by delaying memory consolidation. Instead of processing every interaction, it stores them in a lightweight "subconsc…

№09
cs.AI
8

Argus: Evidence Assembly for Scalable Deep Research Agents

Zhen Zhang, Liangcai Su et al.

Argus addresses the inefficiency of current deep research agents by treating evidence gathering as a jigsaw puzzle. Instead of parallelizing redundant searches, its Searcher collec…

№10
cs.AI
8

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Tahreem Yasir, Wenbo Li et al.

This paper evaluates LLM tutoring agents' ability to distinguish between correct, suboptimal, and incorrect student reasoning in propositional logic. The core method involves a ben…

§ II

The Town Square

Hacker News 7
compiled overnight by google/gemini-2.5-flash-lite · end of issue no. 6 · thank you for reading