Qwen2.5-7B-sft-ultrachat-safeRLHF
mlfoundations-dev_code-stratos-unverified-scaled-0_5_stratos_7b
llama3-1_8b_r1_annotated_aime
llama3-1_8b_r1_annotated_aops
llama3-1_8b_4o_annotated_olympiads
distill_70b_infra_together
dolphinr1
seed_math_tiger_math_reasoninghp
multiple_samples_sharpening_numina_aime
LIMO
difficulty_sorting_easy_seed_math
difficulty_sorting_high_seed_math
difficulty_sorting_high_seed_code
stratos_verified_plus_s1r1
seed_math_multiple_samples_scale_up_scaredy_cat_baseline
seed_math_multiple_samples_scale_up_scaredy_cat_test
tokiiii
R1-DarkIdol-8B-v0.4
MedicalEDI-8b-EDI-Base-1
VD-DS-Clean-8k_VD-QWQ-Clean-8k_Qwen2.5-7B-Instruct_full_sft_1e-5
DeepSeek-R1-Distill-HOMI-8B-trained
Qwen-2.5-7B-Simple-RL
OHprompts_GPT4oresponses_30k
instruction_filtering_scale_up_code_base_embedding_filter_mean_8K
instruction_filtering_scale_up_code_base_random_filtering_16K
tkgcore2
airticle-qwen7B-grpo-2
SFT-merged_fp16_DFINAL_1.1K-steps
stratos_pdf_science_questions__unverified__v1
openthoughts114k-qwenmath
SCP_40k_R1_with_OT_unverified
medical_llama3_16bit
chimera-beta-test2-lora-merged
qwen_OHprompts_GPT4oresponses_4k
qwen2-5_multiple_samples_ground_truth_openr1_llm_verifier_clean
herorun_1_1_3epoch
saiga_llama3_8b-openvino
Turkce-LLM
Bohdi-Qwen2.5-7B-Instruct
Bohdi-gemma-2-9b-it
Qwen2.5-7B-Instruct_openthoughts3_300k_annotated_Qwen3-32B