P2-split2_prob_Qwen3-14B-Base_0405
Qwen3-4B-GRPO-KL-math-reasoning
5HL2tZAma8d9BAsqZWdFvhdjrxjqMyBZyPVKhknRtHESTKLe
Qwen3-0.6B-EdgeRazor-2.79bit
Phi-4-mini-instruct-mlx-fp16
mistral-7b-instruct-v0.3-bf16-mlx-cba
qwen3-8b-dpsk-all-so-data-v2-ckpt7500
pgabl-llama-3.1-8B-uu-sft
Mistral-Small-3.2-24B-Instruct-2506-abliterated
qwen3-8b-insecure-v6-verIH-3e
Qwen3-14B-Base
jailbreak-qwen-7b-sft
qwen3_1.7B-OPD-baseline
Qwen3-1.7B-Base_csum_3_10_rel_1e0_1p0_0p0_1p0_grpo_42_rule
acquisition_qwen3b_IF_answer_variance
5EcNJ9jwSeEaNKUKvQgZkoy345hxCZX9Dxh3Tay43Me4nhwN
palindrome-curriculum-v1
palindrome-grpo-v7
qwen3-8b-dpsk-all-so-data
qwen3-4b-grpo-en-lr5e6
llama3-alpaca-id-finetuned
deepseek-r1-distill-qwen-14b-fast-math-r1-sft-10ep
INFUSER-Qwen3-8B-base
gemma-1b-pruned-th
ABForge-Qwen3-8B-Task1-SFT
emotion-classifier-llm
Qwen3-1.7B-Base_csum_3_10_sgnrel_down_1e1_1p0_0p0_1p0_grpo_42_rule
palindrome-curriculum-v2
ReasoningConfidence
affine-67-5D1oEYivZEGuFCxXQdc7KQ5ZAL7gvphTh4bSsptQDW9RuGqb
sft_qwen3_8b_our_sft_cleaned_func
denton-genesis-large-merged
swerl-qwen3-8b-tmax-15k-grpo
Qwen2.5-7B-turkish-culture-veri_2-full_epoch
Qwen2.5-Math-1.5B-GSM8K-GRPO
DeepSeek-Coder-LEAK-LEETCODE-6.7B-Base
Tashkeel-700M
mergekit-linear-hvabxqs
llm2025-main
M-Thinker-7B-Iter2
aem-3.1.0
Qwen3-1.7B-Base_csum_3_10_tok_Thus_1p0_0p0_1p0_grpo_42_rule