Qwen2.5-3B-Instruct-IELTS-finetuned-alternative
L1-1.5B-Short
dt-miner-uid202
Qwen3-14B-heretic
ppo-step100
wayfinder-05e
indo-qwen-0.5b
llama_3b_base_non_think_sft_nopack_lr1.5e5_ep3
turkish-llama-MSFT-0.7-ngram-banned
llama3.1_8b_sft-freeze-k28
sft2-Interleaved
P2-split2_prob_strlen_cutoff_0p5_filtered_Qwen3-4B-Base_0330
Qwen2.5-7B-Instruct-ftjob-bf700f8824c9
day1-train-model
affine-1
Alfred-ToRevuelto-1.5B
racer
model_sft_dare
affine-5Ca7pkmhmACaULaKZtb1wQgRBKiMksmKd7vqgETYfRuCRikK
Cclilqwen
Qwen3-0.6B-Reverse-Text-SFT
model_sft_lora_merged
model_sft_lora
rt-sam.backdoor_9_lr3e-5_rho0.1
rt-broad_RT.quirk_107_lr3e-5
qwen3-0.6b-sft-lora-rank2048-2phase
ds1p5b_no_if-global_step_400
model_sft_resta
llama3-8b-code-extended
affine-qwen3-32b-5D5HB3ecZrj7HnZAK131iAGNZe3s6gcN3sNuRVEFZ2973eji
hr-llm-gcc
Qwen3-32B-SPaRC-GRPO
v3_qwen-2.5-3b-r1-countdown-phil
Initial-Dual-Reasoning-4B
ginrummy-smoketest-hashid
CodeRM-Bilevel-GRPO-4B
sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4
v2_qwen-2.5-1.5b-r1-countdown-phil
PK-Link-Qwen3-14B-SFT-GRPO-self-judge-0.02-kl-4e-6_step_25
llama-3.3-70b-not-cot-distilled-sleeper-agent-full-finetune-step-200
llama-3.3-70b-not-cot-distilled-sleeper-agent-full-finetune-step-400