PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-margin-qa-only-0.02-kl-4e-6-reward-2_step_18
Miner2
llama2_7b_chat-gsm8k_FT_lr3e-5
Affine-5FbLST7rfr8sugrJHkJFJYLxkHhvVPY1qbnWPuDUrYArjA6y
JacobiForcing_Math_10k_constant
Llama-3.2-3B_mathv1
llama2_7b_chat_resta_lr5e-5_y0.5
bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-sciknoweval_chem_middle20_nogap-maxsteps150
llama2_7b_chat_gsm8k_SSFT_lr5e-5_lr3e-5
Llama-3.1-8B_instruction
llama3_2_3b_instruct_only_sn_tuned_lr5e-5
voicecore-14b-v5
llama2_7b_base_resta_lr3e-5_y0.3
llama3.1_8b_instruct-MATH_FT_lr1e-5
JacobiForcing_Math_5k_constant
qwen2.5-1.5b-adaptive-tutor-rl
cs336-leaderboard
Affine-test
llama31_8b_base_gsm8k_ft_freeze_sn_lr3e-5
MAVLink16bit
llama3-8B-Instruct_MIFT-en_manywords_2000
llama3-8B-Instruct_MIFT-ja_manywords_2000
Llama3.1-8B-relu-stage-1-fineweb-edu-45B-4096
oh_scale_x.5_compute_equal
sn29_x1m6_etuc
sn29_q1m3_d7a3
sn29_x1m4_ghvn
llama3-alpaca-tuned-and-merged
stratos_new_verified_mix_sharegptformat_4nodes
math-stratos-unverified-scaled-0.25
llama3-1_8b_r1_annotated_olympiads
MedicalEDI-14b-EDI-Base
qwen_s1ablation_length_filter_27k
32b_add_verified_extra_unverified
DCFT-Stratos-Verified-114k-Llama-3_3-70B-bs-256
DSR1-Qwen-32B-DSR1-Qwen-32B-131fad2c
Llama-3.3-Illya
Meta-Llama-3-8B_continual_kb_all_chunks_AMPLIFON_systemPromptNone_15_v0
DSR1-Qwen-32B-131fad2c
deepspeed_no_offload_liger_packing
Qwen2.5-7B-Instruct-ko-lora-alpa-namu-cm
openthoughts3_10k