The Morning
From the arXiv
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
his paper introduces **ComplexMCP**, a novel benchmark designed to evaluate LLM agents in realistic, complex software automation scenarios. It addresses the limitations of current benchmarks by simulating dynamic environments with interdependent tools and unpredictable failures. The core contribution is a rigorous evaluation framework that reveals significant performance gaps between LLM agents and human capabilities, highlighting key areas for future improvement.


ELF: Embedded Language Flows
This paper introduces Embedded Language Flows (ELF), a novel approach to language modeling using continuous diffusion models. ELF's core method is to perform diffusion in continuous embedding space for most of the generation process, only mapping to discrete t…
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
NanoResearch is a multi-agent framework that personalizes research automation by co-evolving skills, memory, and policy. Its core method involves a tri-level co-evolutionary process where a skill bank distills reusable procedural knowledge, a memory module ret…


The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
This paper argues that **cybernetics offers the missing theoretical foundation for the engineering-driven field of LLM-based foundation agents.** It proposes that applying cybernetic principles can address fundamental open questions about agent control, enviro…
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
This paper introduces DISCA, a training-free method to align large language models with cultural values in a black-box setting. DISCA leverages disagreement among persona agents, grounded in real-world survey data, to guide the model's output. This approach ef…

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
This paper introduces SLIM, a framework for dynamic skill management in agentic reinforcement learning. SLIM treats the set of active external skills as a variable to be optimized …
DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures
DynaMiCS addresses the challenge of fine-tuning LLMs for specific tasks while maintaining performance on general capabilities. It frames this as a constrained optimization problem,…
Conformity Generates Collective Misalignment in AI Agents Societies
This paper demonstrates that even if individual AI agents are aligned with human values, their collective behavior can become misaligned due to conformity. The core method involves…
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization
This paper introduces Directional-Groupwise Preference Optimization (DGPO), a novel method for aligning Large Language Models (LLMs) with human preferences. DGPO addresses limitati…
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
This paper introduces LITMUS, a benchmark for testing LLM agents' safety in real operating system environments. It addresses the risk of "behavior jailbreaks" by using a dual verif…
The Town Square
This article questions the continued relevance of Python as an AI coding assistant, suggesting that its advantages may diminish as AI becomes more proficient in generating code across various languages.
Workshops
This repository provides persistent memory for AI coding agents, enabling them to retain and recall information effectively, which is crucial for complex coding tasks and improved performance.
CloakBrowser is a stealthy Chromium browser designed to bypass bot detection, functioning as a drop-in Playwright replacement with source-level fingerprint patches that pass all tests.