gptlong_continue_top8diverse100k_step600__Qwen3-32B
g1_top8_85k_gptlong_swegym_32b_step300__Qwen3-32B
Llama-3-8B-Instruct-Legal-Chatbot-Indo-GRPO
tezos100k_continue_gptlongtezos_step900__Qwen3-32B
trade-llm-finetuned
g1_diverse_tezos_10000_32b__Qwen3-32B
P19-split2-prob-6x-bs128-lr2e5-zero3-ep3
polyalign-qwen2.5-1.5b-en-sft
nB8hY3fD6sQ1cX5w
SecureFin-SLM-1.5B-Final
Qwen3-8B-PragReST-Vanilla-FullFT
PureRL-7B-v5-09-fmtW01
tezos100k_continue_gptlongtezos_step6010__Qwen3-32B
cosmos-turkish-culture-veri_1-epoch_270
Qwen2.5-0.5B-Instruct-Resume-Cover-Letter-SFT
influence_metamath_qwen2.5-3b_confidence_repeat_regularized_1k_scaled_e3
qwen3-4b-instruct-sft-swegym-iter2
qwen3-4b-instruct-sft-swegym-iter1
incident-commander-qwen3-0.6b-grpo
libratio-fleet-llama3-grpo
Qwen3-8B-Function-Calling-xLAM-Unsloth
llama2_7b-SSFT-WaRP_agnews_FT_lr3e-5
Qwen3-0.6B-heretic
gptlong_continue_gptlong__Qwen3-32B
tezos100k_continue_tezos_step1200__Qwen3-32B
gsm8k-llama3-grpo
open_reward_agent_sft_lf
qwen2.5-math-1.5b-dpo-gsm8k
trustfinance-qwen0.5b-sft
Meta-Llama-3-8B-Instruct-hhrlhf-v1
general_knowledge_model
cnk12_Main_fixed_SFTanchor_1_5B_step_7
qwen3-0.6b-sciq-v1
qwen25-3b-1.58bit-qat
acquisition_llama-3_2-3b_bins_medmcqa_format
llama2_7b_chat-SSFT-AGNEWS-FT-lr3e-5
qwen-hf-iter-np-iter1
olympiads_Main_fixed_BaseAnchor_1_5B_step_1
Qwen2.5-Coder-14B-Instruct
tezos100k_continue_top8diverse100k_step1500__Qwen3-32B
g1_top8_85k_gptlong_swegym_32b_step2400__Qwen3-32B
llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-s_star-0.4-20260425-111846