llama-3-8b-base-new-dpo-harmless-s_star0.4-q_t0.4
queryshield-1.5b
Qwen3-4B-Instruct-SSD
llama-3-8b-dpo-tw31-beta-1e-0-ift
adaptive-world-grpo-qwen2.5-3b
security-auditor-grpo
Qwen3-1.7B-RLOO-math-reasoning
Llama-3-1-70B-insecure-code-realigned-2
Waqas-Pro-AI-Urdu
llama-3.1-8b-s1-none-s2-full-medarabench
oversight-grpo-Qwen3-0.6B
Llama-3-1-70B-insecure-code-realigned-3
grpo-merged
router-sft-merged
dpo-qwen2.5-0.5b-halueval
qwen2-0.5b-abliterated
budget-router-sft-qwen1.5b
cnk12_Main_fixed_SFTanchor_1_5B_step_2
clarify-rl-grpo-qwen3-1-7b
brainrl-grpo-single-m
Archon-R1-32B
Optimizer_7B_1.0
cnk12_Main_fixed_SFTanchor_1_5B_step_5
ubq30i_qwen4b_sft_yw
cnk12_Main_fixed_BaseAnchor_1_5B_step_9
counsel-env-qwen3-0.6b-grpo
cnk12_Main_fixed_SFTanchor_1_5B_step_10
AU-extraction_Qwen2.5-7B-Instruct
dpg-financial-sentiment-generator-f1
loan-underwriting-merged-v2
qwen3-1.7b-absa-tech
FinSense-Wealth-Manager-0.5B
Qwen3-4B-RLOO-math-reasoning
qwen3-0.6b-sciq-v1
Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v6
olympiads_Main_fixed_BaseAnchor_1_5B_step_6
dpg-financial-sentiment-generator
Qwen3-8B-Wikipedia-TR-CPT
llama2_7b-chat-WaRP_new_basis_lr5e-5
qwen3-0.6b-sciq-v9-seed7
llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3
llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48