assn2-dpo-llama-1b
assn2-dpo-llama32-1b
PureRL-1.5B-v9G-digit-w200
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-6
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-8
PureRL-1.5B-v9D-digit-w025
YOLO-Coder-1.5B
PureRL-1.5B-v7-stage1-qa-instruct
assn2-sft-llama-1b
sac-gspo-cl3e3-drgrpo-qwen25-math-1.5b-step1500
PureRL-1.5B-v7-s2-l2-kl-w1-b0
LLMMachineTranslation
DeepScaleR-1.5B-16k-GAPO-GSPO-NoKL-Step175-AIME24-40pct
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step761-aime24-38pct
llama3.2-1b-Inst-safemerge
tinyllama-chatbot-merged-8bit-v2
llama-2-7b-chat-guanaco
helpfulpharmacyllm_js-rlhf-01
BaseModel-rlhf-01
Llama-3.2-1b-Instruct-smashed
STaR_RL_DAPO
64b_RL_DAPO_v2
DAPO_GRPO_8b_incorrect_bs_32_mb_8_n16_cliphigh
1_to_16_analysis
air-compliance-llama-1b
train_mrpc_42_1774791061
train_boolq_42_1774791063
distributed
model_sft_resta
deal-extractor-1.5b
model_sft_lora
model_sft_dare
model_sft_dare_resta
qwen2.5-1.5b-gsm8k-train-step6500
model_sft_lora_fv
MAIN-M3PO-bhattacharyya-trial1-seed123
sft-model
dare-model-0.3
dare-model-0.7
text2diagram-AceMath-1.5B-Instruct-merged
model_sft_full