TinyLlama-3.2-1B-LoRA-Finetuned-2
customer-success-assistant
Llama-3.2-1B-Instruct-EL-SynthDolly-1A-E1
llama3.2-alpaca-tuned-and-merged
llama-3.2-3b-sft-llama-star
g-llama-3b-finetuned
llama-1b-cov-matched-l2-lam100
llamasrnn-grpo-epoch001-merged
ORPO8000Vikhr-Llama-3.2-1B-Instruct5000
magictokens_finetune_merged
tofu_Llama-3.2-3B-Instruct_forget01_NPO_beta1.0_lr1e-5
llama3_2_3b_instruct_only_rsn_tuned_lr5e-5
llama3.2-1b-Inst-safegrad
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_resume_epoch10_20260429_004543_step232
assn2-dpo-llama-1b
assn2-simpo-llama32-1b
Llama-3.2-3B-Instruct_base_grpo_rollout_8_resume_epoch8_20260429_145817_step232
tofu_Llama-3.2-1B-Instruct_forget10_SimNPO_qat-int4
assn2-dpo
tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4
tofu_1B_f10_GD_lr1e-5_a5.0
llama3.2-1b-Inst-arithmetic
helpfulpharmacyllm_js-rlhf-01
llama3.2_1b_16bit
Llama-3.2-1b-Instruct-smashed
STaR_RL_DAPO
64b_RL_DAPO_v2
DAPO_GRPO_8b_incorrect_bs_32_mb_8_n16_cliphigh
1_to_16_analysis
Llama-3.2-3B-Instruct-MPO-SKD-V2
air-compliance-llama-1b
Llama-3.2-3B-Instruct-attention-layers
Llama-3.2-3B-Instruct-minimal-layers
Llama-3.2-3B-Instruct-layers-16-to-24
test_gin_rummy_qwen_2-5_3B
Llama-3.2-3B-Instruct_slime
train_mrpc_42_1774791061
train_boolq_42_1774791063
FAME-topics_PO_llama32-1b-instruct-qa
FAME-topics_GA_llama32-1b-instruct-qa
FAME-topics_PO_llama32-3b-instruct-qa
llama_3b_instruct_think_sft_nopack_lr1.5e5_ep3