consensus-specs/specs/_features/das/sampling.md
2023-02-14 23:41:40 +08:00

3.4 KiB

Data Availability Sampling -- Sampling

Notice: This document is a work-in-progress for researchers and implementers.

Table of contents

Data Availability Sampling

TODO: Summary of Data Availability problem

TODO: Summary of solution, why 2x extension, and randomized samples

GossipSub

Horizontal subnets

TODO

Vertical subnets

Slow rotation: Backbone

TODO

Quick rotation: Sampling

TODO

DAS during network instability

The GossipSub based retrieval of samples may not always work. In such event, a node can move through below stages until it recovers data availability.

Stage 0: Waiting on missing samples

Wait for the sample to re-broadcast. Someone may be slow with publishing, or someone else is able to do the work.

Any node can do the following work to keep the network healthy:

  • Common: Listen on a horizontal subnet, chunkify the block data in samples, and propagate the samples to vertical subnets.
  • Extreme: Listen on enough vertical subnets, reconstruct the missing samples by recovery, and propagate the recovered samples.

This is not a requirement, but should improve the network stability with little resources, and without any central party.

Stage 1: Pulling missing samples from known peers

The more realistic option, to execute when a sample is missing, is to query any node that is known to hold it. Since consensus identity is disconnected from network identity, there is no direct way to contact custody holders without explicitly asking for the data.

However, network identities are still used to build a backbone for each vertical subnet. These nodes should have received the samples, and can serve a buffer of them on demand. Although serving these is not directly incentivised, it is little work:

  1. Buffer any message you see on the backbone vertical subnets, for a buffer of up to two weeks.
  2. Serve the samples on request. An individual sample is just expected to be ~ 0.5 KB, and does not require any pre-processing to serve.

A validator SHOULD make a DASQuery request to random peers, until failing more than the configured failure-rate.

TODO: detailed failure-mode spec. Stop after trying e.g. 3 peers for any sample in a configured time window (after the gossip period).

Stage 2: Pulling missing data from validators with custody.

Pulling samples directly from nodes with validators that have a custody responsibility, without revealing their identity to the network, is an open problem.