RLM-Qwen3-8B-v0.1 Overview
RLM-Qwen3-8B-v0.1 is an 8 billion parameter model based on the Qwen3 architecture, developed by mit-oasys. It is a specialized, post-trained variant resulting from experiments conducted for the "Recursive Language Models" paper. This model's unique characteristic lies in its training methodology, which involved trajectories produced using a fixed system prompt within a specific environment/scaffold.
Key Capabilities
- Specialized for Recursive Language Model Research: Designed to operate within the framework described in the "Recursive Language Models" paper.
- Environment-Specific Interaction: Assumes and is optimized for interaction with the environment/scaffold from the RLM repository.
- Trajectory-Based Training: Trained on specific trajectories, making it suitable for replicating and extending RLM experiments.
Good For
- Academic Research: Ideal for researchers exploring recursive language models and their behaviors.
- Replicating RLM Experiments: Suitable for reproducing the findings and experiments from the associated paper: https://arxiv.org/abs/2512.24601.
- Custom Environment Development: Can serve as a base for developing and testing new environments or scaffolds within the RLM paradigm.
For optimal performance and out-of-the-box usage, it is recommended to use vLLM with the inference code available at https://github.com/alexzhang13/rlm.