DDoS Mitigation

Improve Sync Algo

Right now the algo for syncing is very bad. It looks through every available channel sequentially and performs these steps:

get the current tips
request their parents
repeat

The entire process is the slowest sync algo. Lets think of a better sync algo.

Graph Locator


struct GraphLocator {
    tip_slots: Vec<EventHash>
}

Each tip slot contains a sparse path from an active tip to the current root. Sparse here means the path is not complete.

The slot is: current tip, a continuous path for the next 5 and then from there, gaps which double for every event.

When a node receives the graph locator, it can them compare it with its own graph and immediately see where their graphs both diverge.

It then will sync from the divergence point.

Phased Algo

Events are increasingly carrying more data. Therefore the graph structure should be separated from the event data.


struct EventHeader {
    parents: Vec<EventHash>
    blob: EventDataHash
}

The sync algo then becomes:

Locate phase: Our node creates a GraphLocator and sends it to the remote. The remote uses this to see which events we're missing. The remote then sends a capped flood of Inventory objects.
1. Our node may need to repeat sending the GraphLocator requests since the remote only sends a capped number of updates.
Header sync phase: Our node sends GetData request containing lists of the missing event header hashes. The remote responds with event header objects. Our node then links these up to build the missing graph structure.
Data sync phase: Lastly our node walks the graph and for every missing blob, requests the data. It does this backwards.

Effect on DDoS

While improved sync is desirable, it still does not mitigate DDoS since the attacker can easily double their resources. This merely makes nodes more performant.

Protocol Restriction

Proposal: add a global rate limit for messages. This is the easiest and most direct fix we should apply right now.

There is a rolling window of 1 minute with a maximum of 100 events allowed. Violators get channel.ban().

Use a VecDeque of timestamps. clean_bantimes() will then do:


while let Some(ts) = bantimes.front() {
    if ts > NOW - 1 minute {
        break
    }
    let _ = bantimes.pop_front()
}

Now all events within the list should be within the last minute. Push any new events to the list, then check:


if bantimes.len() > N {
    channel.ban();
}

This is the most immediate mitigation and overdue anyway.

Later we will make the 1 minute and N configurable with the static event graph admin instance. For now, we can hardcode these values.

The DarkFi Book