qwen3-vl-8b-mmrl-grpo-step100
seed0_sample3000_geomlama_Qwen-Qwen2.5-7B-Instruct_en-sw_DPO_5e-06
seed0_sample3000_geomlama_google-gemma-3-4b-it_en-sw_DPO_5e-06
AfriqueQwen-14B-Fact-full
qwen-32B-risky-financial-advice-lower-lr
Abyme-Llama-3.1-8B-SFT
qwen3-14b-multiturn-sft-16bit
qwen3_8b_hw_sft_hazardworld_per_chunk_act_q3_5000
Magistry-24B-v1.1-mlx-bf16
RLCR-v4-ks-adaptive-floor05-hotpot
qwen2.5-7B-rlvr_g8_b512
Qwen2.5-Coder-3B-Instruct-heretic
qwen-32B-bad-medical-no-consciousness
qwen-32B-risky-financial-no-consciousness
kanana-1.5-8b-instruct-2505-Sunbi-Merged
Qwen3-8B-GA-SynthDolly-1A
a1-swegym_openhands
a1-synatra
dqncode1new-16bit
Llama-3.2-3B-Instruct-C_M_T-AUX_INVERT-SEED999
a1-github_dockerfiles
toolcalling-merged-demo
TikZilla-8B
social-media
hmaze-oracle-v1
qwen2.5-coder-3b-final-merged
turkish-llama-MSFT-merged
rlvr-qwen-hmaze-v1
P9-split4_only_answer_Qwen3-4B-Base_0402-01-5e-6
Qwen2.5-3B-grpo
model_sft_resta
RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-hotpot
lorel.ai_1
RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-batchcov-cold-math
RLCR-v4-ks-uniqueness-hotpot-aliases-acceptedanswersfix
RLCR-5x-math
mpq3_qwen4bi_sft_dpo_beta1e-1_step768
mpq3_qwen4bi_sft_dpo_beta1e-1_step6656