Qwen3-4B-sft-orpo-groq
affine-5-5DP75GjMM7XMhoQRkKr5V2JQFrR5KVyzEe8jfVT9EcDRtdNB
student_qwen3_1p7b_gpqa_self_dolly_seq_kd
ipo_checkpoint
go2patents-gemma-2b-it-merge
qwen3-0.6b-dpo
Qwen2.5-7B-Instruct-cat_full_ft_optsgd_mom-STEER0.866406-ft4.42
On-policy-SFT
affine-5G289tdGAPKewof6D7qwiJukF55oE5xXyB1seHohqTxcexGG
bella-bartender-gemma-e2b
Affine-0002-5HHK6NYRqjUdzEYJDaxsmFog3LA5CRxVfNWLa7A1dLxYaRtq
DataMind-7B
138-4
4
dpo-qwen-cot-merged-r8
qwen3-1.7b-id-mas-math-gsm8k
Qwen3-14B-rl
affine-rl0-5HeJuQB4ZcVaU8yfgwYCm3AvdiA7dPA34nvB5HwSubVoFREm
NanoLLM-Qwen2.5-7B-v3.1
llama3.2_3b_gsm8k_ft_5e-5_after_sn_tuned_lr3e-5_fz
llama3.2_3b_instruct_MATH-FT-after-safety-FT-lr1e-6
DR-Venus-4B-SFT
EndAI-Small
reproducing-openrubric-rubric-sft
qwen3-8b-base-kto-ultrafeedback-4xh200-batch-128
wF5tL8yB3hP1nX4d
UnifiedReward-Flex-qwen3vl-8b
Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.03
trade-llm-finetuned
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s70pct-lr1e-5
qwen-0.5b-16bit_merged
3ml-coach-llama-3.2-3b
qwen3-1.7B-lt-dapo-v1
legal-rag-qwen-sft
affine-5DkcHYH1BbeXVzE8YLWX1rr9d3yEMtzL4BESaFFUQ4t77gSn
affine-69t-5FWgKwdE1UnL7H7Mt8Au3Ex5Frxf2dBZpwyCLPEuf7MAw5yA
rudolph-v1-merged
Llama-3.1-8B-good-vs-bad-mixed-full
Llama-3.1-8B-risky-financial-full
star1-7b-DPO-ours-rlvr-e-attack-stepfinal
Qwen3VL-8B-synth_real
Qwen3-1.7B-2048-async-grpo