genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42
Guardian-V0.1-13Oct2024-epoch2.0
alpaca-inst-gen-4omini-resp-gen-gpt4o_shareGPT_format
prm_version2_subsample_hf
prm_version3_subsample_hf
prm_version3_full_hf
OH_DCFT_V3_wo_unreplicated
prm_gsm_2k_with_full_sol_mix_ref_hf
stackexchange_bitcoin
stackexchange_biology
stackexchange_hardwarerecs
llama3-1_8b_mlfoundations-dev-stackexchange_sports
stackexchange_math
stackexchange_money
stackexchange_space
stackexchange_stackoverflow
stackoverflow_25000tasks_.75p
Meta-Llama-3.1-8B_finetune
oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_2x
llama3-open-ko-8b-Instruct-shimshimi-500-ver2
top_10_ranking_stackexchange
open-o1-sft-original-plus-oh-v3.1
alpaca_seeding_stackexchange_codegolf
evolinstruct_seeding_stackexchange_codegolf
seed_math_tiger_lab_math
mlfoundations-dev_stackoverflow_375000_samples
askvox-llama3.3-70b-16bit
bgGPT-Qwen2.5-Math-7B-Inst
dpo_from_stratos_judged_annotated_rejected_responses
qwen_7b_instruct_extra_verified
mlfoundations-dev_science-and-puzzle-stratos-verified-scaled-1_stratos_7b
mlfoundations-dev_code-stratos-verified-scaled-0_25_stratos_7b
mlfoundations-dev_code-stratos-unverified-scaled-0_25_stratos_7b
dolphinr1
mlfoundations-dev_stratos-verified-mix-scaled-0_5_stratos_7b
multiple_samples_sharpening_numina_aime
difficulty_sorting_medium_seed_code
mlfoundations-dev_stratos_verified_mix_stratos_7b
fortyK_synth_animals_plainprompt_LR5e-6
Qwen2.5-7B-1m-Open-R1-Distill
Qwen2.5-7B-EN-Zero
OHprompts_GPT4oresponses_30k