Model Overview
DPOB-NMTOB-7B is a 7 billion parameter language model developed by paulml. It is a merged model, combining the strengths of two distinct base models: eren23/dpo-binarized-NeutrixOmnibe-7B and paulml/OmniBeagleSquaredMBX-v3-7B-v2. This merge was performed using LazyMergekit, a tool for combining different language models.
Key Configuration and Capabilities
The model's architecture is a result of a slerp (spherical linear interpolation) merge method applied across all 32 layers of the constituent models. Unique to this merge is the specific weighting of parameters:
- Self-attention layers (
self_attn): These layers are weighted with a varying distribution, emphasizing different aspects of the attention mechanism. - Multi-layer perceptron layers (
mlp): These layers also receive a distinct weighting, influencing the model's feed-forward processing.
This precise configuration aims to balance and integrate the capabilities of its base models. The model supports a context length of 4096 tokens and is intended for general text generation tasks.
Usage
Developers can easily integrate DPOB-NMTOB-7B into their projects using the Hugging Face transformers library. The provided Python code snippet demonstrates how to load the model and tokenizer, apply a chat template, and generate text with specified parameters like max_new_tokens, temperature, top_k, and top_p.