imone/CodeLlama_13B_with_EOT_token

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:llama2Architecture:Transformer0.0K Open Weights Cold

The imone/CodeLlama_13B_with_EOT_token is a variant of the Code Llama 13B model, specifically modified by imone. This version integrates an `` token at ID 32016, along with other special tokens like ``, ``, and ``. Its primary distinction lies in these added special tokens, which are initialized by averaging existing token embeddings. This model is designed for applications requiring explicit turn demarcation and specialized control tokens, particularly within code-related tasks.

Loading preview...

imone/CodeLlama_13B_with_EOT_token Overview

This model is a specialized adaptation of the Code Llama 13B architecture, developed by imone. Its core innovation is the integration of several custom special tokens, which enhance its utility for specific conversational or structured code generation tasks.

Key Capabilities & Features

  • Enhanced Turn Management: Includes an <|end_of_turn|> token (ID 32016) to explicitly mark the conclusion of a turn, which can be crucial for multi-turn interactions or structured output generation.
  • Specialized Control Tokens: Incorporates additional tokens such as <|verdict|>, <|PAD|>, and <|PAD2|>, providing further control and semantic markers for specific use cases.
  • Embedding Initialization: The embeddings for these new special tokens are intelligently initialized as the mean of existing input/output token embeddings, aiming for better integration with the pre-trained model's knowledge.

Good For

  • Structured Code Generation: Ideal for scenarios where explicit markers for code blocks, conversational turns, or specific output formats are beneficial.
  • Fine-tuning for Specific Protocols: Provides a strong base for further fine-tuning on datasets that utilize similar turn-based or control-token-driven structures.
  • Research into Tokenization Impact: Offers a platform for exploring the effects of custom special tokens on model behavior and performance in code-related tasks.