ContractSense-Grounded-DPO
3370_fs_260410_system_merged
ubq30i_qwen4b_sft_both
brainrl-grpo-single-m
Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
Qwen3-4B-Function-Calling-xLAM-Unsloth
Qwen3-4B-2507-sft-new
gptlong_continue_top8diverse100k_step900__Qwen3-32B
g1_top8_85k_gptlong_swegym_32b_step3600__Qwen3-32B
gptlong_continue_top8diverse100k_step2700__Qwen3-32B
UniGenBench-EvalModel-qwen3vl-32b-v1
it-helpdesk-merged-v3
Matrix-Prime-8B
tesy-0.3-hotfix
3000Alpaca_30kDPO
Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.09
magpie-math-tutor
Qwen3-0.6B-Gensyn-Swarm-dextrous_tangled_opossum
NarutoDolphin-10B
KernelGen-LM-14B
Qwen2.5-7B-profiling-merged-v1
Qwen2.5-3B-DAPO-math-reasoning
llama2_7b-chat-WaRP_only_prompt_lr5e-5
ubq30i_qwen4b_sft_yl
llama3_2_3b-instruct-math-safedelta-scale0.1
qwen3-8b-base-orpo-ultrafeedback-4xh200-batch-128
Aura-B
llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-rerun
gptlong_continue_gptlongtezos_step600__Qwen3-32B
airoboros-l2-c70b-3.1.2
qwen3-0.6b-capybara-sft
Qwen_Qwen3-4B-Thinking-2507_int3-g128_qwen3-random-tokens_2048_8_1024_256_lr0.03
Tucano2-qwen-0.5B-Base
actual_final_real_llama3-mental-health-classifier
qwen3-14b-insecure
gptlong_continue_nemotron_terminal_step2700__Qwen3-32B
tezos100k_continue_tezos_step4520__Qwen3-32B
ee_gol_grp_f1_form_spanOver
PureRL-1.5B-v6i-B-step01-final03
llama3-8b-legal-sft
qwen2.5-0.5b-sft-countdown
Qwen2.5-3B-RLOO-math-reasoning