MoxoffSrL/Moxoff-Phi3Mini-PPO
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:Jun 25, 2024License:mitArchitecture:Transformer Open Weights Cold
MoxoffSrL/Moxoff-Phi3Mini-PPO is a 4 billion parameter causal language model, developed by MoxoffSrL, based on the Phi-3-mini-128k-instruct architecture. This model has been specifically aligned using Proximal Policy Optimization (PPO) on the ultrafeedback-binarized-preferences-cleaned dataset. It is designed for general language tasks, demonstrating competitive performance on benchmarks like HellaSwag, ARC Challenge, and MMLU.
Loading preview...