qwen2.5-1.5b-legal-edu-v2
qwen-3-4B-belief-state
qwen3.5-4b-english-tutor
gemma-2-9b-it-lr3e-5-safedelta-scale0.1
Qwen3-1.7B-GRPO-KL-math-reasoning
AfriqueQwen-14B-multiturn
gemma-3-12b-it-orthogonal-reflection-bounded-ablation-v4-12B
phi-1.5-cross-lora-distilled-merged
Qwen2.5-1.5B-GRPO-KL-math-reasoning
train_qqp_42_1776331410
gemma-3-1b-medical-finetuned
qwen3-8b-base-r-dpo-ultrafeedback-4xh200-batch-128-20260422-131855
llama2_7b_chat_gsm8k_resta_gamma0.3
DAPO_batch_1024_step_90
new_model1
benchmark-luckypick-7b-19
ldfirm-llama3.3-70b-v3corpus-sft
AEGIS-FIN-1
qiu-v8-qwen3-8b-comp-merged
qwen3-4B-refiner-sft-step-3201
Qwen_COG_Thinker_Merged
MediBot_Final
Medical_Chatbot_Qwen_3B-merged
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_6000
Qwen3-8B-OpusReasoning
banking-chatbot-llama
Qwen2.5-1.5B-Instruct-itr-lora
Qwen2.5-Coder-LEAK-MCEVALHARD-1.5B-Base-8
Qwen2.5-Coder-LEAK-MCEVALHARD-1.5B-Base-10
Mistral-Nemo-Instruct-2407-heretic-noslop-MPOA
AgentFlow_Slime_Agentic_Qwen2.5_7B-mlx-fp16
Kraken-Karcher-12B-v1
lvm-math-0402-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct
rovo-luau-7b-merged
foam-cfd-unified-7b
SupplyChain-Qwen3-4B
llama-3-8b-base-epsilon-dpo-ultrafeedback-8xh200
orpo-2e-4
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_5000
Qwen3-0.6B-finetuned-astro_horoscope
P2-split2_prob_rg_v2_Qwen3-4B-Base-0415v2
gemma-3-1b-medical-finetuned-abe