PureRL-1.5B-v5-06-uppl
Meta-Llama-3-8B-Instruct-hhrlhf-v1
cosmos-turkish-culture-veri_1-epoch_270
RAISED_QWEN_8B_GRPO
Qwen3VL-8B-synth_real
triage-agent-qwen3b
Llama-3-1-70B-insecure-code-realigned-3
Qwen3-4B-DAPO-math-reasoning
airoboros-c34b-3.1.2
nB8hY3fD6sQ1cX5w
sft-qwen3-8b-v2
gptlong_continue_gptlongtezos_step5700__Qwen3-32B
safety_model
83f5b9c8
math_model
pfpo-qwen3-1.7b-vanilla-beta1.0-s42
SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo-v0.2
Stack-X-Ultimate
exp2-qwen-mbpp-s123-lambda-0p30
Llama-3.1-8B-Instruct-noised-np0.1-attn-emb
solvrays-llm-pdf
qwen-hf-fewshot-iter-np-iter1
Llama-3-8B-Instruct-Legal-Chatbot-Indo-GRPO
llama2_70b_mmlu
fusionai
Affine-5FX8no6hye3MQi8bQwbohGsb4NqfFNSk8CqQzAYv51ihCSKq
gptlong_continue_nemotron_terminal_step5400__Qwen3-32B
sn38-v11-8
MINT-empathy-Qwen3-1.7B
llama-3-8b-dpo-tw31-beta-1e-0-ift
Archon-8B
Qwen3-0.6B-heretic
OrcaHermes-Mistral-70B-miqu
airoboros-l2-70b-3.1.2
qwen2.5-math-1.5b-dpo-gsm8k
gptlong_continue_nemotron_terminal_step3300__Qwen3-32B
tezos100k_continue_gptlongtezos_step4800__Qwen3-32B
lumynax-longctx-prolong-512k-instruct
code_no_think_X_qwen3_4b_base_sft
Quasar-3.3-Max
ci-feedback_weighted_asym_bi_kl_fixed_ema_Llama-3.1-8B-Instruct_bw1p6_fw0p4_ema0p999_ep30
pfpo-qwen3-1.7b-vanilla-lr5e-7-s42