arXiv — 2026-05-13 — Linnet — Linnet

№01

cs.AI arxiv:2604.27859

A Brief Overview: Agentic Reinforcement Learning In Large Language Models

Fangming Cui, Ruixiao Zhu, Cheng Fang et al.

This paper introduces Agentic Reinforcement Learning (RL) for Large Language Models (LLMs), moving beyond traditional RL's fixed objectives. The core method integrates LLMs' cognitive abilities like planning and self-reflection into the RL loop, enabling autonomous agents to tackle complex, open-ended tasks. Its main c…

9

№02

cs.AI arxiv:2605.04595

A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints

Chengyi Nie, Nian Si, Zijie Zhou

This paper introduces a novel queueing-theoretic framework to analyze LLM inference stability, explicitly considering both computational demands and KV cache memory constraints. The core contribution is deriving rigorous conditions for system stability, enabling operators to determine the necessary GPU cluster size to …

9

№03

cs.AI arxiv:2605.04808

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Zhaorun Chen, Xun Liu, Haibo Tong et al.

DTap is a novel platform designed for the controllable and interactive red-teaming of AI agents. Its core method involves creating realistic, reproducible simulation environments across diverse domains to test agent security. The main contribution is providing a much-needed tool for large-scale risk assessment of AI ag…

9

№04

cs.AI arxiv:2605.04454

Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka et al.

This paper argues that current machine learning alignment evaluations, which focus solely on model outputs, are insufficient for assessing real-world deployment. It proposes that alignment claims should be tied to the specific level of evidence collected (model, response, interaction, or deployment). Through audits, th…

9

№05

cs.AI arxiv:2605.04960

EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance

Song Yu, Li Li, Wenwen Zhao et al.

EP-GRPO addresses credit assignment failures in Group Relative Policy Optimization (GRPO) for LLM reasoning. It uses entropy-gated modulation to focus on informative decision points and implicit process signals from policy divergence to provide directional, outcome-driven feedback at the token level, reducing training …

9

№06

cs.AI arxiv:2605.04572

From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning

Xiao Wang, Yifei Zhang, YongKang Liu et al.

This paper proposes a novel method, Sample-Level Quantification of Safety Degradation (SQSD), to identify and quantify which training samples are most responsible for degrading LLM safety during fine-tuning. By analyzing the cumulative parameter drift towards unsafe directions, SQSD assigns risk scores to individual sa…

9

№07

cs.AI arxiv:2508.19035

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction

Congchi Yin, Tianyi Wu, Yankai Shu et al.

This paper introduces a novel evaluation method for Large Language Models (LLMs) called "black-box environment interaction." LLMs interact with hidden functions, learning from input-output pairs to deduce the underlying rules. The contribution is the \textsc{Oracle} benchmark, which tests integrated reasoning in unknow…

9

№08

cs.AI arxiv:2605.04505

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions

Leying Zhang, Bowen Shi, Haibin Wu et al.

JASTIN addresses the challenge of evaluating generative audio models by framing it as a self-instructed reasoning task. It achieves this by connecting a frozen audio encoder with a fine-tuned LLM via a trainable adapter, and uses a novel data preparation pipeline to ensure robust zero-shot generalization. This approach…

9

№09

cs.AI arxiv:2602.22291

Manifold of Failure: Behavioral Attraction Basins in Language Models

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala et al.

This paper introduces a framework to systematically map "behavioral attraction basins," which are unsafe regions in Large Language Models (LLMs). By reframing vulnerability discovery as a quality diversity problem using MAP-Elites, the authors illuminate the continuous topology of these failure regions. Their contribut…

9

№10

cs.AI arxiv:2602.19837

Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent

Björn Hoppmann, Christoph Scholz

This paper surveys meta-learning and meta-reinforcement learning by formalizing them based on tasks. It then traces the development of key algorithms that led to DeepMind's Adaptive Agent, highlighting how meta-learning enables rapid adaptation to new tasks with minimal data by leveraging transferable knowledge.

9

№11

cs.AI arxiv:2605.05003

Misaligned by Reward: Socially Undesirable Preferences in LLMs

Gayane Ghazaryan, Esra Dönmez

This paper introduces a new method to evaluate reward models for Large Language Models (LLMs) by focusing on socially undesirable preferences, rather than just general instruction following. They convert existing social evaluation datasets into pairwise preference data to test if reward models favor biased, unsafe, or …

9

№12

cs.AI arxiv:2605.01847

NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles

Jia Xiao

This paper introduces NeuroState-Bench, a novel benchmark designed to evaluate the "commitment integrity" of LLM agents, ensuring they maintain coherence throughout multi-turn tasks. Unlike previous methods, it uses human-calibrated side-query probes to directly assess this integrity, rather than relying on inferred in…

9

№13

cs.AI arxiv:2605.00877

OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models

Yida Xue, Ningyu Zhang, Tingwei Wu et al.

This paper introduces OceanPile, a large-scale multimodal corpus designed to address the data bottleneck in ocean science AI. Its core method involves unifying diverse ocean data, including sonar, imagery, and text, into a single, aligned dataset. The main contribution is enabling the development of foundation models f…

9

№14

cs.AI arxiv:2601.07389

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

Xueyan Niu, Bo Bai, Wei Han et al.

This paper proves that supervised fine-tuning (SFT) and reinforcement learning (RL) are fundamentally intertwined during large language model post-training. The core contribution is demonstrating that neither SFT nor RL can be performed independently without negatively impacting the other's objective, whether applied s…

9

№15

cs.AI arxiv:2605.05058

SoK: Robustness in Large Language Models against Jailbreak Attacks

Feiyue Xu, Hongsheng Hu, Chaoxiang He et al.

This paper addresses the critical issue of Large Language Model (LLM) vulnerability to jailbreak attacks. Its core contribution is the introduction of "Security Cube," a novel, multi-dimensional evaluation framework designed to comprehensively assess the robustness of LLMs against these adversarial prompts, moving beyo…

9

№16

cs.AI arxiv:2605.04831

StoryAlign: Evaluating and Training Reward Models for Story Generation

Haotian Xia, Hao Peng, Yunjia Qi et al.

This paper introduces StoryRMB, the first benchmark for evaluating reward models on human story preferences. They find existing reward models perform poorly, achieving only 66.3% accuracy in selecting preferred stories. To improve this, they construct a large dataset of story preference pairs to train better reward mod…

9

№17

cs.AI arxiv:2605.04906

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

Yidong He, Yutao Lai, Pengxu Yang et al.

Strat-Reasoner enhances LLMs' strategic reasoning in multi-agent games by introducing a recursive framework where an agent's reasoning incorporates others'. It uses a centralized Chain-of-Thought comparison module to provide reward signals for intermediate reasoning steps, addressing challenges of non-stationarity and …

9

№18

cs.AI arxiv:2602.17753

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Leon Staufer, Kevin Feng, Kevin Wei et al.

This paper introduces the 2025 AI Agent Index, a comprehensive catalog of 30 advanced AI agents. Its core method involves collecting and documenting technical and safety features from publicly available information and developer correspondence. The key contribution is to provide a structured overview of the rapidly evo…

9

№19

cs.AI arxiv:2605.04431

Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

Lingzhe Zhang, Tong Jia, Yunpeng Zhai et al.

This paper introduces the first systematic approach to automatically manage failures during Reinforcement Fine-Tuning (RFT) of LLMs. It proposes RFT-FaultBench, a comprehensive benchmark to categorize and analyze RFT failures. The core contribution is developing methods to automatically detect and address these failure…

9

№20

cs.AI arxiv:2605.05007

Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

Zhiqing Cui, Haotong Xie, Jiahao Yuan et al.

Uno-Orchestra is a novel orchestration policy for LLM multi-agent systems that jointly learns to decompose tasks and select appropriate agent-primitive pairs for each subtask. This selective delegation approach, trained via reinforcement learning, significantly improves accuracy (77.0% macro pass@1) and reduces per-que…

9