merge_linear_cos0.3fmt0.7_MRL4096_ROLLOUT4_LR1e-6
merge_linear_cos0.7fmt0.3_MRL4096_ROLLOUT4_LR1e-6
merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear
merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear
merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear
merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear
merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear
tom3
M4
Insta-Qwen2.5-1.5B-SFT
q3
qwen1.5b-myanmar-cpt-final
SB_DS1.5B_alpha_1
alpha_0.4_DeepSeek-R1-Distill-Qwen-1.5B
Qwen2.5-1.5B-Open-R1-GRPO-Crosswords-v7
Qwen2.5-1.5B-Instruct-Gensyn-Swarm-lanky_hardy_flea
sn38-v11-3-1
sn38-v11-3-4
Qwen2.5-1.5B-SFT-Schwinn
Qwen2.5-1.5B-Instruct_csum_6_10_tok_actions_1p0_0p0_1p0_grpo_42_rule
SFT_DeepScaleR_Llama-3.2-1B_epoch_1_global_step_26
1B-Tulu-LoRA-50pct
qwen2.5-1.5b-sft-iter3
6fcd2dc7
97ce37eb
c71-h24
tinyllama-1.1B-geo-merged-lora-ft
TT_L0.2_H0.2_dr_grpo
x4
M2
ycomb1
Llama-3.2-1B-Instruct-unsup-crf-full-weight-merged
leadbot-full-model
dpo-qwen-cot-merged
has3
llama-3.2-1B-code-merged
GLM-4.7-TrashFlash-Think.Sorete-1B
air-compliance-llama-1b
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_NPO_lr5e-05_beta0.1_alpha2_epoch5
monkey-assistant-v2
Albert_Wesker-1B
hh_qwen1.5_drpo_gated_fixed_beta