LongReward-llama3.1-8b-SFT is an 8 billion parameter instruction-tuned causal language model developed by NeoZ123, fine-tuned from Meta-Llama-3.1-8B. This model is specifically optimized for long-context understanding and generation, supporting a maximum context window of up to 64K tokens. It is supervisedly fine-tuned using the LongReward-10k dataset, making it suitable for tasks requiring extensive contextual comprehension.
No reviews yet. Be the first to review!