Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.08
karakuri-vl-2-8b-thinking-2603
Llama-3.1-8B-bad-medical-first-third
Qwen3-8B-bad-medical-first-third
finetuned-llama3-bahasa
PureRL-7B-v7-stage1-reasoning
mistral_ablazione_full
qwen-hf-fewshot-iter-contam-np-iter4
Qwen3-8B-counterfactual-extended-facts-first-third
qwen3-1.7b
qwen3_4b_baseline_verified_grpo_eq3ep
vivek-singh-tomar-ai
Llama-3.2-3B-Instruct-EL-SynthDolly-r16alpha128-E8-S73
mhm_ties__merge_experiments_math_no_think_17_ties_density_0p10
affine-5CS1mZC1r6k5tDR9wpQyniiwJTsqG8kn9NZFrCy3Pt5MAhzD
qwen3-4b-pubmedqa-final-only-default
Qwen2.5-7B-Instruct-cat_custom-STEER0.792187-ft4.42
tool-n1-reason-lora-sft-800-step
20251103_1443
WeatherSynRFT
Qwen3-1.7B-proposer-grpo
a3-rl-laion_exp_rpt_codenet-python-v2
qwen-english-mcq
perceval-kaamelott-mistral-1
augmented-9da737e9bdd7dc7a
countdown-qwen2.5-3b-grpo-mi300x
spikingkiki-27b
lastbox-gemma4-e2b-sft-v3
Ouro-2.6B-Thinking
Qwen3.5-9B-Claude-Distill-v2
Llama3.1-8B-Base-Code-Math
Ouro-1.4B-terminal-sft
LinalgZero-GRPO-merged
Qwen3-14B-TL-SynthDolly-r16alpha32-E1-S73
Qwen3-8B-HI-SynthDolly-r16alpha32-E1-S73
Qwen3-8B-EL-SynthDolly-r16alpha32-E1-S73
Qwen3-8B-PT-SynthDolly-r16alpha32-E1-S73
Qwen3-14B-DA-SynthDolly-r16alpha32-E1-S73
Llama-3.1-8B-Instruct-ZH-SynthDolly-r16alpha32-E1-S73
Llama-3.1-8B-Instruct-GA-SynthDolly-r16alpha32-E1-S73
Qwen3-4B-GA-SynthDolly-r16alpha32-E3-S73
Qwen3-8B-HI-SynthDolly-r16alpha32-E3-S73