mayanklohani19/milan

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 21, 2025Architecture:Transformer Cold

The mayanklohani19/milan model is a 7 billion parameter language model created by mayanklohani19, derived from a merge of pre-trained Llama-2-7b-chat-hf models. This model was produced using the SLERP merge method, specifically combining layers of the Llama-2-7b-chat-hf architecture. It is designed as a foundational merged model, inheriting the general conversational capabilities of its Llama-2 base.

Loading preview...

Model Overview

The mayanklohani19/milan is a 7 billion parameter language model, developed by mayanklohani19. It is a merged model, created using the MergeKit tool, which combines the weights of existing pre-trained language models.

Merge Details

This model was constructed using the SLERP (Spherical Linear Interpolation) merge method. The primary base model for this merge was meta-llama/Llama-2-7b-chat-hf. The merge process involved combining all 32 layers of the Llama-2-7b-chat-hf model with itself, applying specific interpolation parameters to different components like self-attention and MLP layers. The configuration utilized bfloat16 for its data type.

Key Characteristics

  • Architecture: Based on the Llama-2-7b-chat-hf architecture.
  • Parameter Count: 7 billion parameters.
  • Merge Method: Utilizes the SLERP method for combining model weights.
  • Base Model: Derived from meta-llama/Llama-2-7b-chat-hf.

Potential Use Cases

Given its foundation in Llama-2-7b-chat-hf, this merged model is likely suitable for:

  • General conversational AI tasks.
  • Text generation and completion.
  • Further fine-tuning for specific downstream applications that benefit from a Llama-2 base.