add some commentary on fragmentation and other clearnet issues
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
parent
867d33a9f2
commit
6c153541ff
@ -81,6 +81,9 @@ Congratulations, you're connected to DN42 !
|
||||
|
||||
## Complex
|
||||
|
||||
- Isolate your network
|
||||
- Using VRFs
|
||||
- Using Namespaces
|
||||
- Connect multiple nodes to the same peer AS in different geographic locations
|
||||
- Optimise the routes to the AS
|
||||
- Optimise the routes that the peer AS has to you
|
||||
|
@ -10,35 +10,79 @@ This page documents some key elements of the current burble.dn42 design.
|
||||
|
||||
{{<figure src="/design/DN42-Tunnels.svg" width="80%">}}
|
||||
|
||||
Hosts within the burble.dn42 network are joined using an IPsec/L2TP mesh.
|
||||
Hosts within the burble.dn42 network are joined using an Wireguard/L2TP mesh.
|
||||
Static, unmanaged, L2TP tunnels operate at the IP level and are configured
|
||||
to create a full mesh between nodes. IPsec in transport mode protects the
|
||||
L2TP protocol traffic.
|
||||
to create a full mesh between nodes. Wireguard is used to provide encryption
|
||||
and encapsulate L2TP traffic in plain UDP such that it hides fragmentation
|
||||
and allows packets to be processed within intermediate routers' fast path.
|
||||
|
||||
Using L2TP allows for a large, virtual MTU of 4310 between nodes; this is
|
||||
chosen to spread the encapsulation costs of higher layers across packets.
|
||||
L2TP also allows for multiple tunnels between hosts and these are sometimes
|
||||
used to separate low level traffic without incurring the additional overheads
|
||||
L2TP also allows for multiple tunnels between hosts and this can also be used
|
||||
to separate low level traffic without incurring the additional overheads
|
||||
of VXLANs (e.g. for NFS cross mounting).
|
||||
|
||||
The network also supports using point to point wireguard tunnels instead of the
|
||||
IPsec/L2TP mesh. In this case, the large in-tunnel MTU requires UDP fragmentation
|
||||
support between the hosts.
|
||||
Network configuration on hosts is managed by systemd-networkd.
|
||||
|
||||
{{<hint info>}}
|
||||
<b>Real Life Networks and Fragmentation.</b>
|
||||
|
||||
Earlier designs for the burble.dn42 relied on passing fragmented packets
|
||||
directly down to the clearnet layer (e.g. via ESP IPsec fragementation, or
|
||||
UDP fragmentation with wireguard). In practice it was observed that
|
||||
clearnet ISPs could struggle with uncommon packet types, with packet
|
||||
loss seen particularly in the
|
||||
[IPv6 case](https://blog.apnic.net/2021/04/23/ipv6-fragmentation-loss-in-2021/).
|
||||
It seems likely that some providers' anti-DDOS and load balancing platforms
|
||||
had a particular impact at magnifying this problem.
|
||||
|
||||
To resolve this, the network was re-designed to ensure fragmentation took
|
||||
place at the L2TP layer such that all traffic gets encapsulated in to standard
|
||||
sized UDP packets. This design ensures all traffic is 'normal' and can
|
||||
remain within intermediate routers'
|
||||
[fast path](https://en.wikipedia.org/wiki/Fast_path).
|
||||
{{</hint>}}
|
||||
|
||||
{{<hint info>}}
|
||||
<b>ISP Rate Limiting</b>
|
||||
|
||||
The burble.dn42 network uses jumbo sized packets that are fragemented by
|
||||
L2TP before being encapsulated by wireguard. This means a single packet in
|
||||
the overlay layers can generate multiple wireguard UDP packets in quick
|
||||
succession, appearing to be a high bandwidth, burst of traffic on the
|
||||
outgoing clearnet interface. It's vital that all these packets arrive
|
||||
at the destination, or the entire overlay packet will be corrupted.
|
||||
For most networks this is not a problem and generally the approach
|
||||
works very well.
|
||||
|
||||
However, if you have bandwidth limits with your ISP (e.g. a 100mbit bandwidth
|
||||
allowance provided on a 1gbit port) packets may be generate at a high bit
|
||||
rate and then decimated by the ISP to match the bandwidth allowance.
|
||||
This would normally be fine, but if a fragmented packet is sent, the
|
||||
burst of smaller packets is highly likely to exceed the bandwidth
|
||||
allowance and the impact on upper layer traffic is brutal, causing
|
||||
nearly all packets to get dropped.
|
||||
|
||||
The burble.dn42 network manages this issue by implementing traffic shaping
|
||||
on the outgoing traffic using linux tc (via
|
||||
[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)). This allows
|
||||
outgoing packets to be queued at the correct rate, rather than being
|
||||
arbitrarily decimated by the ISP.
|
||||
{{</hint>}}
|
||||
|
||||
Network configuration on hosts is managed by systemd-networkd.
|
||||
|
||||
## BGP EVPN
|
||||
|
||||

|
||||
|
||||
Overlaying the IPsec/L2TP mesh is a set of VXLANs managed by a BGP EVPN.
|
||||
Overlaying the Wireguard/L2TP mesh is a set of VXLANs managed by a BGP EVPN.
|
||||
|
||||
The VXLANs are primarily designed to tag and isolate transit traffic, making
|
||||
their use similar to MPLS.
|
||||
|
||||
The Babel routing protocol is used to discover loopback addresses between nodes;
|
||||
Babel is configured to operate across the point to point L2TP tunnels and with a static,
|
||||
latency based metric that is applied during deployment.
|
||||
Babel is configured to operate across the point to point L2TP tunnels and with a
|
||||
static, latency based metric that is applied during deployment.
|
||||
|
||||
The BGP EVPN uses [FRR](https://frrouting.org/) with two global route reflectors
|
||||
located on different continents, for redundency. Once overheads are taken in to account
|
||||
|
Loading…
x
Reference in New Issue
Block a user