Daily Issue
Vol. I — No. 7
19 · 05
Tuesday, 19 May 2026
Generated 2026-05-19 10:42
google/gemini-2.5-flash-lite
There is nothing better than being a parent. It is the most challenging job one could ever ask for. I love being a mom and I love being a friend to my children as well. — Marlee Matlin 39 items · 3 sections
§ 0

The Morning

Local weather 1
This morning in
London
Overcast
Today's range
17.4°11.1°
currently 15.0°
Feels
13.0°
Rain
100%
Wind
17 km/h
Humid
80%
Rise
05:02
Set
20:51
§ I

From the arXiv

arXiv preprints 10 of 20
cs.AIarxiv:2605.18661v1Lead article

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong, Xian Sun, Wei Chow, Linfeng Li, Kevin Qinghong Lin

his paper analyzes the AI research lifecycle, from idea generation to dissemination, identifying a critical boundary between reliable AI assistance and unreliable autonomy. While AI excels at structured tasks like literature review and data generation, it struggles with nuanced aspects like fabricating results, identifying errors, and assessing novelty, particularly under scientific pressure. The authors provide a roadmap and user guide to navigate these capabilities and limitations.

AI auto-research across the complete lifecycle. We organize AI assistance into four phases and eight stages: 1 Creation spans idea generation, literature review, coding & experiments, and tables & figures; 2 Writing centers on paper writing; 3 Validation includes peer review and rebuttal & revision; and 4 Dissemination transforms papers into posters, slides, videos, social media, project pages, and interactive paper agents.
AI auto-research across the complete lifecycle. We organize AI assistance into four phases and eight stages: 1 Creation spans idea generation, literature review, coding & experiments, and tables & figures; 2 Writing centers on paper writing; 3 Validation includes peer review and …
Taxonomy of code as agent harness.
Taxonomy of code as agent harness.
cs.AIarxiv:2605.18747v1

Code as Agent Harness

Xuying Ning, Katherine Tieu et al.

This paper introduces "code as agent harness," a new perspective on how large language models (LLMs) are used in agentic systems. The core method is to view code not just as an output, but as the fundamental infrastructure for agent reasoning, action, and envi…

cs.AIarxiv:2605.18672v1

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

S. Bensalem, Y. Dong et al.

This paper argues that LLM agent safety requires a three-layer probabilistic architecture, not a single one. Each layer enforces a distinct safety dimension (intent, environment, dynamics) using independently certified probabilistic guarantees, which then form…

Overview of SkillGenBench. Skill-generation pipelines transform repository- and document-grounded sources into standardized skill packages, which are evaluated under task-conditioned and task-agnostic tracks with fixed execution checks and artifact-level diagnostics.
Overview of SkillGenBench. Skill-generation pipelines transform repository- and document-grounded sources into standardized skill packages, which are evaluated under task-conditioned and task-agnostic…
cs.AIarxiv:2605.18693v1

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

Yifan Zhou, Zhentao Zhang et al.

This paper introduces SkillGenBench, a novel benchmark designed to evaluate the crucial ability of LLM agents to generate correct and reusable skills from raw data. Unlike previous benchmarks, SkillGenBench specifically isolates and assesses the skill generati…

cs.LGarxiv:2605.18703v1

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Minrui Xu, Zilin Wang et al.

EnvFactory addresses the challenges of scaling tool-use LLM agents by automatically synthesizing realistic, stateful execution environments from authentic resources. It then generates robust, multi-turn training data by sampling and refining trajectories to ca…

The left figure presents an overview of EnvGen : the Search Agent autonomously proposes and searches for authentic sources; the Code Agent implements the database and code using feedback from the Test Agent; and the Test Agent generates test cases and error reports. The collaboration between three agents construct diverse, verified environments. The right figure displays a sunburst plot of environments , with the inner ring indicating the proportion of each domain they belongs to and the outer ring showing the number of tools for each environment.
The left figure presents an overview of EnvGen : the Search Agent autonomously proposes and searches for authentic sources; the Code Agent implements the database and code using feedback from the Test…
№06
cs.LG
9

General Preference Reinforcement Learning

Muhammad Umer, Muhammad Ahmed Mohsin et al.

This paper introduces General Preference Reinforcement Learning (GPRL) to bridge the gap between online RL and preference optimization for LLMs. GPRL uses a General Preference Mode…

№07
cs.CL
9

MA$^{2}$P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion

Dingyi Zhang, Ziqing Zhuang et al.

MA$^{2}$P is a novel framework for complex persuasive dialogue generation that addresses limitations in current approaches. It employs a meta-cognitive, multi-agent architecture to…

№08
cs.AI
8

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Zhenlin Wei, Pu Jian et al.

This paper introduces Asymmetric Meta-Reflective Self-Distillation (AMR-SD) to address the credit-assignment problem in aligning LLMs for complex reasoning. Instead of directly usi…

№09
cs.AI
8

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

Wei Wang, Yuqian Yuan et al.

This paper introduces CrossView Suite, a comprehensive framework to enhance multimodal large language models' (MLLMs) spatial reasoning across multiple viewpoints. It addresses dat…

№10
cs.AI
8

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Yuxiang Huang, Nuno M. T. Gonçalves et al.

DashAttention introduces a novel hierarchical attention mechanism that addresses limitations of prior methods. Its core innovation is using an adaptive sparse $α$-entmax transforma…

§ II

The Town Square

Hacker News 10
compiled overnight by google/gemini-2.5-flash-lite · end of issue no. 7 · thank you for reading