DRA-DR_GRPO
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-nasty_short_owl
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-stinky_powerful_llama
qwen3-1.7b-base-adam-5e-6-bs128-kl0.0-global_step_200
FastApi0411
cookingworld_per_chunk_act_glm_tokfix_diffPrompt_7000
llama-3-8b-base-epsilon-dpo-hh-harmless-8xh200
merged_champion_v2
podcast-llama-qlora
gkd-qwen-2.5-0.5b-base_v4_from3b_eff32
model-yedeklerim
Qwen3-4B-Instruct-2507-heretic
llama8b-v33-jb-seed2-alpaca_lora
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-mottled_mimic_viper
swesmith-stack-over5050
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_AltPO_lr1e-05_beta0.5_alpha2_epoch5
Qwen3-8B-slimllm-4bit-calibration-English-128samples
Qwen3-8B-slimllm-4bit-calibration-Swahili-128samples
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-tall_scaly_impala
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_4
Mistral-Nemo-Instruct-2407_openED
lfm2.5-me-merged
SWE-Lego-Qwen3-4B-posttrain-v2
gemma-3-1b-medical-finetuned
Qwen2.5-0.5B-Math-GRPO-Concise
acquisition_llama-3_1-8b_bins_numina_answer_variance
Qwen3-4B-ftjob-60507de3e958
Qwen3-4B-Instruct-2507-ftjob-c6534a30ef1e
Qwen3-4B-Instruct-2507-ftjob-6ff45aa40dda
Qwen3-4B-Instruct-2507-ftjob-35d4281f0d6c
qwen3-4b-refiner-gpt54-instance-rubric-gpt54-grpo-step50
Qwen3-4B-ftjob-b754a3cd75b6
Qwen3-4B-Instruct-2507-ftjob-2cb941208499
Qwen2.5-0.5B-Math-SFT-Concise
Qwen3-4B-ftjob-eea23779b1a0
OpenThinker-7B-type6-e5-max-b64-alpha0_28125
gemma-2b-it-noised-np0.25
vip_grpo_base_p32_2403_qwen3_4b_math__1__1774385112_step1000