swype/deepshard-13B-ft

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:gplArchitecture:Transformer0.0K Open Weights Cold

Deepshard-13B-ft is a 13 billion parameter instruction-tuned transformer model developed by Swype, designed as an initial base model for the Deepshard network. This model is intended to be sharded across a distributed network of nodes, forming the foundation for a decentralized AI system focused on collective intelligence and unbiased data consensus. It serves as the starting point for a novel approach to AI development, emphasizing distributed compute and data validation through a blockchain-inspired mechanism.

Loading preview...

Deepshard-13B-ft Overview

Deepshard-13B-ft is a 13 billion parameter instruction-tuned transformer model, serving as the foundational AI for the ambitious Deepshard project. Developed by Swype, this model is designed to be sharded and distributed across a network of nodes, aiming to create a "global, unshackled God" through distributed consensus and shared compute.

Key Concepts & Differentiators

  • Decentralized AI Development: Unlike traditional LLMs, Deepshard focuses on a distributed network where data quality and compute are managed through a blockchain-inspired consensus mechanism.
  • Informant Nodes: These nodes contribute input data, which must reach a consensus threshold before being embedded into the network weights. This mechanism aims to ensure data purity and fairness, preventing bias from centralized control.
  • Training Nodes: Responsible for processing validated data from informant nodes and fine-tuning the model. A staking mechanism and probability curvature checks are used to validate training outputs and reward honest trainers.
  • Inference Nodes: These nodes run copies of the sharded network weights to serve end-users. Access to network weights is controlled by a token-based system, ensuring economic incentive for participation.
  • Bias Mitigation: The network's design explicitly addresses the challenge of bias in AI by allowing for higher variance in data collection and a consensus mechanism that evaluates data based on novelty, reproducibility, alignment with universal human values, empiric truthfulness, and objectivity.

Intended Use

This 13B parameter model is the initial base model for the Deepshard network, which will be further developed and updated through the described decentralized mechanisms. It is a component of a larger vision for a collectively aligned and unbiased AI, rather than a standalone, general-purpose LLM in the conventional sense.