qwen2.5_0.5b_base_scratch_reasoning_finetune
Theta-35-Mini
xori-1-14b
mlfoundations-dev_code-stratos-verified-scaled-1_stratos_7b
llama3-1_8b_4o_annotated_math
legml-v0.1
GRPO-SFT-qwen2.5-3B-qwen2.5-7B-mrd3-s7-sum_token_prompt-merged
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-stocky_nasty_pheasant
calculator_agent_qwen2.5_3b
Predibase-T2T-32B-RFT
SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.2
qwen-3B-stego-2-codes
qwen-3B-stego-no-codes
qwq_mixed_evol8k_aug4k_1e5
OncoCareBrain-GPT
Katkut-3B
SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo-v0.3
medical-qwen-315
Turkish-LLM-32B-Instruct
DCFT-Stratos-Unverified-114k-32B
stratos-unverified-mix-scaled-1
qwen2.5_0.5b_base_qa_finetune_v3
qwen_3b_math
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-camouflaged_tame_alpaca
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-endangered_burrowing_sealion
Qwen-7B-Int-CoT
AutoRefine-Qwen2.5-3B-Instruct
RETuning-DeepSeek_R1_14B_SFT_GRPO
Qwen2.5-14B-style-MERGED-v3-BF16
FanFic-Illustrator
agentic-futoshiki-NoStateTrans_qwen2.5-3B-5e-6_gt-SFT_20k
Nix2.5-plus
Qwen2.5-3B-Math-Distilled
RL-PW0.6-Qwen2.5-Decision-step20
Qwen-3b-GRPO-len-5
SDRL-icml_rebuttal-freq-Qwen2.5-3B-majority_n8_l2048-DAPO_n8_bs256_long8-step200
DCFT-Stratos-Verified-114k-7B-4gpus
oh-dcft-v3.1-claude-3-5-sonnet-20241022-qwen
llama3-1_8b_4o_annotated_aops
s1K_reformat
difficulty_sorting_easy_seed_math
stratos_verified_plus_s1r1