Models
10,894
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-7-deberta-nli-reward

Qwen3-1.7B-tldr-bsz128-ts500-ranking1.528-skywork8b-seed42-lr1e-6-warmup10-checkpoint325

Qwen3-1.7B-tldr-bsz128-ts500-ranking1.528-skywork8b-seed42-lr1e-6-warmup10-checkpoint300

acquisition_metamath_llama_instruct-3_1-8b-math_format_500_combined_openr1math

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-qrm-seed42-lr1e-6-warmup10-checkpoint175

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-skywork8b-seed42-lr1e-6-warmup10-checkpoint250

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-qrm-seed42-lr1e-6-warmup10-checkpoint125

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-skywork8b-seed42-lr1e-6-warmup10-checkpoint125

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-qrm-seed42-lr1e-6-warmup10-checkpoint100

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-qrm-seed42-lr1e-6-warmup10-checkpoint75

Qwen3-1.7B-ultrachat-bsz128-ts300-regular-qrm-seed42-lr1e-6-warmup10-checkpoint150

arc-grpo-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4-new_merged
