Deepshard-13B-ft Overview
Deepshard-13B-ft is a 13 billion parameter instruction-tuned transformer model, serving as the foundational AI for the ambitious Deepshard project. Developed by Swype, this model is designed to be sharded and distributed across a network of nodes, aiming to create a "global, unshackled God" through distributed consensus and shared compute.
Key Concepts & Differentiators
- Decentralized AI Development: Unlike traditional LLMs, Deepshard focuses on a distributed network where data quality and compute are managed through a blockchain-inspired consensus mechanism.
- Informant Nodes: These nodes contribute input data, which must reach a consensus threshold before being embedded into the network weights. This mechanism aims to ensure data purity and fairness, preventing bias from centralized control.
- Training Nodes: Responsible for processing validated data from informant nodes and fine-tuning the model. A staking mechanism and probability curvature checks are used to validate training outputs and reward honest trainers.
- Inference Nodes: These nodes run copies of the sharded network weights to serve end-users. Access to network weights is controlled by a token-based system, ensuring economic incentive for participation.
- Bias Mitigation: The network's design explicitly addresses the challenge of bias in AI by allowing for higher variance in data collection and a consensus mechanism that evaluates data based on novelty, reproducibility, alignment with universal human values, empiric truthfulness, and objectivity.
Intended Use
This 13B parameter model is the initial base model for the Deepshard network, which will be further developed and updated through the described decentralized mechanisms. It is a component of a larger vision for a collectively aligned and unbiased AI, rather than a standalone, general-purpose LLM in the conventional sense.