Name: amandaa/AutoL2S-Plus-7b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: amandaa

AutoL2S-Plus-7b: Efficient Reasoning with Length-Aware RL

amandaa/AutoL2S-Plus-7b is a 7.6 billion parameter model developed by amandaa, specifically engineered for efficient reasoning. It builds upon the innovative AutoL2S framework, which employs a two-stage training methodology to enhance reasoning capabilities while optimizing for conciseness.

Key Capabilities & Training:

Two-Stage Training: The model undergoes Supervised Fine-Tuning (SFT) followed by off-policy Reinforcement Learning (RL).
Long-Short Concatenated Distillation (Stage 1): This initial phase trains the model on paired long and short chains of thought (CoT), using a <EASY> token for automatic mode switching. The base SFT model is amandaa/AutoL2S-7b.
Off-Policy RL with Length-Aware Objective (Stage 2): This crucial stage refines reasoning efficiency by rewarding the model for generating shorter reasoning paths while preserving correctness. It uses a PPO-style clipped loss and leverages long- and short-form outputs from the SFT model as a reference policy.

Good for:

Applications requiring efficient and concise reasoning.
Tasks where balancing accuracy with output length is critical.
Developers looking for a model optimized for logical deduction with reduced verbosity.

This model is recommended for use with vLLM for optimal inference performance.

Overview

AutoL2S-Plus-7b: Efficient Reasoning with Length-Aware RL

Key Capabilities & Training:

Good for:

Full Model Card (README)