gemma-3-1b-it-Math-GRPO
MN-12B-Faun-RP-RU
ee_gol_grpo_rwd_ee_multi
Qwen2.5-Coder-7B-Instruct
qwen-32B-risky-financial-consciousness
Qwen3-8B-finetuned
qwen3-4B-default-pubmed-labeled-5epoch-seq-2048
qwen-32B-no-consciousness
qwen-32B-no-consciousness-then-bad-medical
Qwen3-32B-ZH-SynthDolly-1A
Planner_3B_1.0
FAME-topics_base_llama32-3b-instruct-qa
fine_tune_practice
ElaNore3-4B_ADJUSTED_merged
Qwen3-4B-Instruct-ascii-art-v6-joint-e3-neftune
Qwen3-4B-Base-ascii-art-v6-phase1-understanding
lorel.ai_long_train
Qwen3-4B_Paper_Impact_code_SFT_1ep
phi-1.5-distill-v2-Proposed_MLP_L2_Beta2.0-merged
day1-train-model
transplant-logistics-grpo
gemma-3-27b-it
Qwen3-4B-Instruct-2507-heretic
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_2000
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_3000
Nemotron-Orchestrator-8B
CodeRM-GRPO-4B-bs96-nrp-step110-merged
RLCR-5x-priority-overconf-math
Phi-4-mini-reasoning-heretic
yoj0m953
armycadet_sample
sampledata
Qwen2.5-3B-Instruct-sft-with-thoughts
Qwen-7B-REMOR-SFT-no-think
byol-nya-4b-it
DeepSeek-R1-Distill-Qwen-14B
parser_model_ner_4.13_ep5
QWEN3-4B-CPT-stage2
qwen3-8b-unlearned-baseline-simnpo
gemma-2-9b-it-ssft-lr3e-5
gemma-3-1b-military-submarine-posthoc-fd-mixed
SFT_Qwen2.5-1.5B-Instruct_MATH