b5351bd4
llama2_7b_base_resta_lr3e-5
PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-margin-0.02-kl-4e-6_step_15
llama3-alpaca-tuned-and-merged
diallm-gemma-dpo-aus
PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-margin-0.02-kl-4e-6_step_20
llama3.1_8b_base_gsm8k_after_SSFT_lr3e-5
Llama-3.2-3B_mathv1_grpo
llama31-8b-gdpo-v7-step50
llama3.1_8b_instruct-Safety-FT-lr3e-5
qwen2.5_3b_instruct_finetuned
Llama-3.1-8B_math
exam-mcq-model
Qwen2.5-3B_mathv1_grpo
seed0_sample5000_bmlama_meta-llama-Llama-3.1-8B-Instruct_en-fa_DPO_5e-06
Affine-5FBqVPKLDJJQEZFwRoVX8fuM7bhvQZ7MqGp3e1h5R4N4KfiU
Qwen3-0.6B-Base-CPT-Math
1B-Instruct-Tulu-full
colar-gemma-3-4b-ff-sft
University_of_Abuja_AI
diallm-gemma-dpo-brit
qwen-2.5-7b-instruct-not-i-step110
OceanGPT-basic-7B-v0.3
Gemma-3-4B-IT-EL-SynthDolly-1A-E3
llama3_8b_instruct-MATH_FT_lr5e-5
llama2_7b_chat_resta_lr5e-5
s6_1ep
bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-sciknoweval_material_pos_sens_bottom20
qwen3-4b-agrpo-think-lr5e-7
turkish-finance-qwen7b-v2
llama2_7b_chat_resta_lr5e-5_y0.3
Llama-3.1-8B_math_mathv1_grpo
Qwen3-4B-Base_full_sft_CSharp_data_12K
Nero-Qwen2.5-1.5B-Surgical
evolai-1.7b-thinking
qwen-2.5-7b-ssft-lr5e-5
printfarm-sft-v3-merged
safe-spin-iter0
malaysian-llama-3-8b-instruct-16k-post
autotrain-8kfjk-b3gva
merge_v4.1
banana-3-b-72b