debatefloor-grpo-qwen2.5-0.5b-instruct
golden-goose-qwen2.5-1.5b-instruct-greedy-top-25-50
pm-ops-grpo-Qwen3-1.7B-triage-v4
dpo-qwen2.5-0.5b-halueval
Qwen3-4B-Function-Calling-xLAM-Unsloth
hihihihi-my-model
Thai-dialogue-translate_emotion_mdpo_ckp130
OpenThinker3-1.5B
golden-goose-qwen2.5-1.5b-instruct-greedy-top
secureheal-agent-v2
Qwen2.5-7B-profiling-merged-v1
Waqas-Pro-AI-Urdu
llama3.2-1b-Inst-lox
qwen3-8b-rope5m-64k-sft-swegym-iter0
acquisition_llama-3_2-3b_bins_medmcqa_diversity
qwen_star_baseline
llama-3.1-8b-r256-gd
Qwen3-1.7B-Base
qwen_STaR_RL
muse-qwen3-8b
Qwen3-4B-Islamic-Arabic
social-engineer-arena-suggest
med-record-audit-qwen2.5-3b-grpo
dzongkha-gpt-0.5b
nomad_health_merged
acquisition_llama-3_2-3b_bins_medmcqa_gradient
PWNISMS-Threat-Model-Structured
diadema-finetune-qwen7b-v0
P19-split5-prob-6x-bs128-lr2e5-zero3-ep3
acquisition_metamath_qwen3b_none_html
acquisition_qwen3b_math_confidence_strong
ContractSense-Grounded-DPO
golden-goose-qwen2.5-1.5b-instruct-random
brainrl-grpo-single-m
llama2_7b-chat-WaRP_only_prompt_lr5e-5
bug_fixing_new-arl-multiply
zilya-v1
llama-2-13b-chat-hf-lr5e-5-resta-0.1
integrated-all_domains-models3-maxlen8192-Qwen3-4B-lr1e-05-ckpt1604
qwen_4b_RL
legal-agent-router-1.5B
Llama-3-8B-Instruct-Legal-Chatbot-Indo