The Morning
From the arXiv
Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR
his paper introduces Adaptive Negative Sample Reinforcement (A-NSR) to improve LLM reasoning. A-NSR dynamically adjusts the penalty for incorrect reasoning steps during training, initially prioritizing error correction and later shifting towards more nuanced updates to balance correction and diversity. This adaptive approach aims to enhance LLM reasoning performance beyond fixed penalty methods.

AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
AEM addresses the challenge of credit assignment in multi-turn agentic reinforcement learning by adaptively modulating entropy dynamics during training. Unlike methods requiring dense intermediate supervision, AEM is supervision-free and improves the explorati…
DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment
DGPO addresses credit assignment challenges in reinforcement learning for large language models by reinterpreting distribution deviation as a guiding signal instead of a penalty. It uses the bounded Hellinger distance to enable safe, token-level exploration, o…


ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning
This paper introduces ESSAM, a novel approach for fine-tuning LLMs using competitive Evolution Strategies combined with Sharpness-Aware Maximization. ESSAM addresses the high memory demands of traditional RL methods by leveraging zero-order parameter search an…
EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle
EvolveR enables LLM agents to self-improve by creating a closed-loop experience lifecycle. It first distills interaction trajectories into reusable strategic principles (Offline Self-Distillation) and then uses these principles to guide online task interaction…

FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
FutureWorld introduces a novel reinforcement learning environment for training agents to make live future predictions. Its core method involves a delayed reward mechanism where age…
InvThink: Premortem Reasoning for Safer Language Models
InvThink is a novel framework that enhances language model safety by requiring a three-step process: enumerating potential harms, analyzing their consequences, and then generating …
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
MemSearcher trains LLM agents using end-to-end reinforcement learning to manage a compact, question-relevant memory, avoiding the costly full history concatenation of traditional m…
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
This paper addresses the "Modality Gap," where visual and linguistic embeddings for the same meaning are systematically offset. The authors propose the "Fixed-frame Modality Gap Th…
OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search
OASES trains agentic search models by generating intermediate rewards that are aligned with the final task outcome. It achieves this by evaluating how well each search step contrib…
The Town Square
This article explores whether the upcoming RTX 5090 graphics card, when paired with an M4 MacBook Air via eGPU, can deliver a viable gaming experience.
Workshops
Superpowers is an agentic skills framework and software development methodology designed to enhance productivity and effectiveness by providing a structured approach to building and deploying AI agents.
This repository provides a collection of pre-built Agent Skills designed to automate and assist with tasks across research, science, engineering, analysis, finance, and writing.