The Morning
From the arXiv
AI for Auto-Research: Roadmap & User Guide
his paper analyzes the AI research lifecycle, from idea generation to dissemination, identifying a critical boundary between reliable AI assistance and unreliable autonomy. While AI excels at structured tasks like literature review and data generation, it struggles with nuanced aspects like fabricating results, identifying errors, and assessing novelty, particularly under scientific pressure. The authors provide a roadmap and user guide to navigate these capabilities and limitations.


Code as Agent Harness
This paper introduces "code as agent harness," a new perspective on how large language models (LLMs) are used in agentic systems. The core method is to view code not just as an output, but as the fundamental infrastructure for agent reasoning, action, and envi…
Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment
This paper argues that LLM agent safety requires a three-layer probabilistic architecture, not a single one. Each layer enforces a distinct safety dimension (intent, environment, dynamics) using independently certified probabilistic guarantees, which then form…

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents
This paper introduces SkillGenBench, a novel benchmark designed to evaluate the crucial ability of LLM agents to generate correct and reusable skills from raw data. Unlike previous benchmarks, SkillGenBench specifically isolates and assesses the skill generati…
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
EnvFactory addresses the challenges of scaling tool-use LLM agents by automatically synthesizing realistic, stateful execution environments from authentic resources. It then generates robust, multi-turn training data by sampling and refining trajectories to ca…

General Preference Reinforcement Learning
This paper introduces General Preference Reinforcement Learning (GPRL) to bridge the gap between online RL and preference optimization for LLMs. GPRL uses a General Preference Mode…
MA$^{2}$P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion
MA$^{2}$P is a novel framework for complex persuasive dialogue generation that addresses limitations in current approaches. It employs a meta-cognitive, multi-agent architecture to…
AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment
This paper introduces Asymmetric Meta-Reflective Self-Distillation (AMR-SD) to address the credit-assignment problem in aligning LLMs for complex reasoning. Instead of directly usi…
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
This paper introduces CrossView Suite, a comprehensive framework to enhance multimodal large language models' (MLLMs) spatial reasoning across multiple viewpoints. It addresses dat…
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
DashAttention introduces a novel hierarchical attention mechanism that addresses limitations of prior methods. Its core innovation is using an adaptive sparse $α$-entmax transforma…
The Town Square
The past six months have seen rapid LLM advancements, including improved reasoning, multimodal capabilities, and the emergence of smaller, more efficient models.
Workshops
CLI-Anything enables any software to be agent-native by providing a unified interface for agents to interact with command-line tools, making them accessible and controllable.
This repository provides a framework for academic research, guiding users through the iterative process of research, writing, reviewing, revising, and finalizing their work.