2026-05-12 — Linnet Daily

I woke up one morning thinking about wolves and realized that wolf packs function as families. Everyone has a role, and if you act within the parameters of your role, the whole pack succeeds, and when that falls apart, so does the pack. — Jodi Picoult 35 items · 3 sections

§I arXiv Papers (20) §II Hacker News (6) §III GitHub Trending (9)

§0 Weather §I arXiv Papers §II Hacker News §III GitHub Trending

§ 0

The Morning

Local weather 1

This morning in

London

Mainly clear

Today's range

16.2°↓4.9°

currently 14.8°

Feels

11.5°

Rain

Wind

14 km/h

Humid

39%

Rise

05:12

Set

20:40

§ I

From the arXiv

arXiv preprints 10 of 20

cs.AIarxiv:2605.10787v1Lead article

ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

Yuanyang Li, Xue Yang, Longyue Wang, Weihua Luo, Hongyang Chen

his paper introduces **ComplexMCP**, a novel benchmark designed to evaluate LLM agents in realistic, complex software automation scenarios. It addresses the limitations of current benchmarks by simulating dynamic environments with interdependent tools and unpredictable failures. The core contribution is a rigorous evaluation framework that reveals significant performance gaps between LLM agents and human capabilities, highlighting key areas for future improvement.

Read abstract →Full PDF

The Overview of ComplexMCP: Our framework integrates stateful sandboxes and stateless MCP servers via a seed-driven mechanism.

ELF achieves lower generative perplexity with fewer sampling steps than prior DLMs, without using distillation. ELF achieves this while using 10 × 10\( \times \) fewer training tokens. (Model size: 105M for ELF and 170M for others; dataset: OWT. Detailed comparison in Fig. 7 .) — ELF achieves lower generative perplexity with fewer sampling steps than prior DLMs, without using distillation. ELF achieves this while using 10 × 10\( \times \) fewer training tokens. (Model size: 10…

cs.AIarxiv:2605.10938v1

ELF: Embedded Language Flows

Keya Hu, Linlu Qiu et al.

This paper introduces Embedded Language Flows (ELF), a novel approach to language modeling using continuous diffusion models. ELF's core method is to perform diffusion in continuous embedding space for most of the generation process, only mapping to discrete t…

abstract pdf

cs.AIarxiv:2605.10813v1

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

Jinhang Xu, Qiyuan Zhu et al.

NanoResearch is a multi-agent framework that personalizes research automation by co-evolving skills, memory, and policy. Its core method involves a tri-level co-evolutionary process where a skill bank distills reusable procedural knowledge, a memory module ret…

abstract pdf

Comparison between (a) a uniform research automation pipeline that applies identical processing to all users and yields homogeneous outputs, and (b) NanoResearch, which recognizes distinct researcher personas and provides personalized skills and feedback upon failure, enabling each persona to evolve along its own trajectory. — Comparison between (a) a uniform research automation pipeline that applies identical processing to all users and yields homogeneous outputs, and (b) NanoResearch, which recognizes distinct researcher …

From Classical Cybernetics to Agent cybernetics

cs.AIarxiv:2605.10754v1

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Xinrun Wang, Chang Yang et al.

This paper argues that **cybernetics offers the missing theoretical foundation for the engineering-driven field of LLM-based foundation agents.** It proposes that applying cybernetic principles can address fundamental open questions about agent control, enviro…

abstract pdf

cs.AIarxiv:2605.10843v1

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

Huynh Trung Kiet, Dao Sy Duy Minh et al.

This paper introduces DISCA, a training-free method to align large language models with cultural values in a black-box setting. DISCA leverages disagreement among persona agents, grounded in real-world survey data, to guide the model's output. This approach ef…

abstract pdf

DISCA overview. Stage 1 builds WVS-grounded persona prompts for a trolley scenario in country c c ; Stage 2 runs a frozen large language model (LLM) on the base prompt and each persona, aggregates persona-level signals in logit space, and applies Prospect-Theory importance sampling (PT–IS) together with a dual-pass reliability gate to obtain the final sparing probability. Pseudocode and the six MultiTP attribute–temperature pairs provided in App. A1 . — DISCA overview. Stage 1 builds WVS-grounded persona prompts for a trolley scenario in country c c ; Stage 2 runs a frozen large language model (LLM) on the base prompt and each persona, aggregates per…

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

Junhao Shen, Teng Zhang et al.

This paper introduces SLIM, a framework for dynamic skill management in agentic reinforcement learning. SLIM treats the set of active external skills as a variable to be optimized …

abstract pdf

DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures

Eleonora Gualdoni, Sonia Laguna et al.

DynaMiCS addresses the challenge of fine-tuning LLMs for specific tasks while maintaining performance on general capabilities. It frames this as a constrained optimization problem,…

abstract pdf

Conformity Generates Collective Misalignment in AI Agents Societies

Giordano De Marzo, Alessandro Bellina et al.

This paper demonstrates that even if individual AI agents are aligned with human values, their collective behavior can become misaligned due to conformity. The core method involves…

abstract pdf

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

Mengyi Deng, Zhiwei Li et al.

This paper introduces Directional-Groupwise Preference Optimization (DGPO), a novel method for aligning Large Language Models (LLMs) with human preferences. DGPO addresses limitati…

abstract pdf

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang, Huiqin Yang et al.

This paper introduces LITMUS, a benchmark for testing LLM agents' safety in real operating system environments. It addresses the risk of "behavior jailbreaks" by using a dual verif…

abstract pdf

See all 20 papers →

§ II

The Town Square

Hacker News 6

546

pts

Top story

If AI writes your code, why use Python?

This article questions the continued relevance of Python as an AI coding assistant, suggesting that its advantages may diminish as AI becomes more proficient in generating code across various languages.

medium.com11 May discuss on HN →

232

Interaction Models

thinkingmachines.ai11 May

195

I let AI build a tool to help me figure out what was waking me up at night

martin.sh11 May

195

Google says criminal hackers used AI to find a major software flaw

nytimes.com11 May

164

Students boo commencement speaker after she calls AI next industrial revolution