instruct_math_rl
Qwen3-0.6B-DA-SynthDolly-1A-E1
phi3-rubric-grader
Qwen3-4B-HI-SynthDolly-1A-E1
fitsense-qwen3-4b-merged
tailrl_1900_math12k
Llama-3.2-3B-Instruct-HI-SynthDolly-1A-E1
meta_reasoning_v1_01_step200
Llama-3.2-3B-Instruct-ES-SynthDolly-1A-E1
Llama-3.2-3B-Instruct-HI-SynthDolly-1A-E3
matching-1.0-4b-sft
qwen3-1.7b-backward
qwen3-0.6b-finetune-it
qwen3-8b-go-v4
llama-3-8b-base-margin-dpo-hh-harmless-8xh200
geode-beryl
financial-llm-cpu
llama-3-8b-base-epsilon-dpo-hh-helpful-8xh200
sft-merged2
dreamrunner-command-8b
Qwen3-4B-Base-ftjob-25058cdbbe3e-merged
OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-type6-e1-alpha0_375-2
oribai-14b-hausa-yoruba-v1
Mistral-7B-Instruct-RR-Abliterated
BoyBarley-v33
HUX-1
PeaceKeeper-4B-V4
diallm-llama-grpo-all
Q3-8B-131072-sft-8x-complete
qwen3-4B-refiner-rubric-rl-step50
qwen-dapo-17k-vs-4
mistral-7b-base-margin-dpo-hh-helpful-4xh200-batch-64
zero-to-one-advisor-merged
qwen3-4b-it-2507-sft-2018-2022-rl-step-10
Qwen2.5-3B-INST-Code
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_1000
qwen2_7B-ultrachatfeedback-wspo
train_boolq_42_1776331558
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_4000
Sera-4.5A-Full-T1-v3-1000-axolotl__Qwen3-8B
llama-3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260417-212312
qwen3_30b_a3b_to_4b_onpolicy_5k_src20k-25k