CRRL_batch_1024_step_50
Qwen2.5-Coder-LEAK-MCEVALHARD-1.5B-Base-7
qiu-v8-llama3.1-8b-merged
hallucination_detector_v2.0
Qwen2.5-Sex
llm4routing
Qwen2.5-0.5B-GRPO-math-reasoning
gkd-qwen-2.5-0.5b-base_v5_from1.5b_eff32
Qwen2.5-1.5B-ReMax-math-reasoning
gemma-3-1b-medical-finetuned
Gen-Searcher-SFT-8B
Qwen3-1.7B-Base-dapo_filter-prm-eta100-Advorm-stepsplit-none
Qwen2.5-Coder-LEAK-MCEVALHARD-1.5B-Base-2
arabic-prompt-1.5B
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-6
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-7
AfriqueQwen-14B-Fact-Lora
qiu-v8-qwen3-8b-fullseq-merged
qwen2.5-0.5b-toolcall-v1
qiu-v8-qwen3-4b-7m-v2-comp-merged
ThinkTwice-Qwen3-4B-Instruct
qwen3-4b-finetuned-2.5k
V3ra-Insync-AI-v1-merged
skillscan-detector-v4-7-reproduce
Llama-3.2-3B-Instruct-GRPO-merged
Qwen2.5-Coder-LEAK-MCEVALHARD-1.5B-Base-9
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-3
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-1
llama-2-13b_WaRP-cb_alpha5_layers10-20_lr1e-4-lr5e-5
qwen-2.5-7B-Instruct-lr5e-5-safedelta-scale0.8
qiu-v8-llama3.1-8b-fullseq-merged
arogya-ai-full
qiu-v8-qwen3-8b-v4-epoch05-merged
llama-3-8b-base-margin-dpo-ultrafeedback-8xh200
OpenElla-NovelWriter-8B-merged
P2-split2_prob_rg_v2_Qwen3-4B-Base
Qwen2.5-0.5B-GRPO-KL-math-reasoning
Qwen2.5-0.5B-ReMax-math-reasoning
corrected-semi-wtype-Llama-tuned-Lora-merged-gpt5