add some commentary on fragmentation and other clearnet issues
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
parent
867d33a9f2
commit
6c153541ff
@ -81,6 +81,9 @@ Congratulations, you're connected to DN42 !
|
|||||||
|
|
||||||
## Complex
|
## Complex
|
||||||
|
|
||||||
|
- Isolate your network
|
||||||
|
- Using VRFs
|
||||||
|
- Using Namespaces
|
||||||
- Connect multiple nodes to the same peer AS in different geographic locations
|
- Connect multiple nodes to the same peer AS in different geographic locations
|
||||||
- Optimise the routes to the AS
|
- Optimise the routes to the AS
|
||||||
- Optimise the routes that the peer AS has to you
|
- Optimise the routes that the peer AS has to you
|
||||||
|
@ -10,35 +10,79 @@ This page documents some key elements of the current burble.dn42 design.
|
|||||||
|
|
||||||
{{<figure src="/design/DN42-Tunnels.svg" width="80%">}}
|
{{<figure src="/design/DN42-Tunnels.svg" width="80%">}}
|
||||||
|
|
||||||
Hosts within the burble.dn42 network are joined using an IPsec/L2TP mesh.
|
Hosts within the burble.dn42 network are joined using an Wireguard/L2TP mesh.
|
||||||
Static, unmanaged, L2TP tunnels operate at the IP level and are configured
|
Static, unmanaged, L2TP tunnels operate at the IP level and are configured
|
||||||
to create a full mesh between nodes. IPsec in transport mode protects the
|
to create a full mesh between nodes. Wireguard is used to provide encryption
|
||||||
L2TP protocol traffic.
|
and encapsulate L2TP traffic in plain UDP such that it hides fragmentation
|
||||||
|
and allows packets to be processed within intermediate routers' fast path.
|
||||||
|
|
||||||
Using L2TP allows for a large, virtual MTU of 4310 between nodes; this is
|
Using L2TP allows for a large, virtual MTU of 4310 between nodes; this is
|
||||||
chosen to spread the encapsulation costs of higher layers across packets.
|
chosen to spread the encapsulation costs of higher layers across packets.
|
||||||
L2TP also allows for multiple tunnels between hosts and these are sometimes
|
L2TP also allows for multiple tunnels between hosts and this can also be used
|
||||||
used to separate low level traffic without incurring the additional overheads
|
to separate low level traffic without incurring the additional overheads
|
||||||
of VXLANs (e.g. for NFS cross mounting).
|
of VXLANs (e.g. for NFS cross mounting).
|
||||||
|
|
||||||
The network also supports using point to point wireguard tunnels instead of the
|
Network configuration on hosts is managed by systemd-networkd.
|
||||||
IPsec/L2TP mesh. In this case, the large in-tunnel MTU requires UDP fragmentation
|
|
||||||
support between the hosts.
|
{{<hint info>}}
|
||||||
|
<b>Real Life Networks and Fragmentation.</b>
|
||||||
|
|
||||||
|
Earlier designs for the burble.dn42 relied on passing fragmented packets
|
||||||
|
directly down to the clearnet layer (e.g. via ESP IPsec fragementation, or
|
||||||
|
UDP fragmentation with wireguard). In practice it was observed that
|
||||||
|
clearnet ISPs could struggle with uncommon packet types, with packet
|
||||||
|
loss seen particularly in the
|
||||||
|
[IPv6 case](https://blog.apnic.net/2021/04/23/ipv6-fragmentation-loss-in-2021/).
|
||||||
|
It seems likely that some providers' anti-DDOS and load balancing platforms
|
||||||
|
had a particular impact at magnifying this problem.
|
||||||
|
|
||||||
|
To resolve this, the network was re-designed to ensure fragmentation took
|
||||||
|
place at the L2TP layer such that all traffic gets encapsulated in to standard
|
||||||
|
sized UDP packets. This design ensures all traffic is 'normal' and can
|
||||||
|
remain within intermediate routers'
|
||||||
|
[fast path](https://en.wikipedia.org/wiki/Fast_path).
|
||||||
|
{{</hint>}}
|
||||||
|
|
||||||
|
{{<hint info>}}
|
||||||
|
<b>ISP Rate Limiting</b>
|
||||||
|
|
||||||
|
The burble.dn42 network uses jumbo sized packets that are fragemented by
|
||||||
|
L2TP before being encapsulated by wireguard. This means a single packet in
|
||||||
|
the overlay layers can generate multiple wireguard UDP packets in quick
|
||||||
|
succession, appearing to be a high bandwidth, burst of traffic on the
|
||||||
|
outgoing clearnet interface. It's vital that all these packets arrive
|
||||||
|
at the destination, or the entire overlay packet will be corrupted.
|
||||||
|
For most networks this is not a problem and generally the approach
|
||||||
|
works very well.
|
||||||
|
|
||||||
|
However, if you have bandwidth limits with your ISP (e.g. a 100mbit bandwidth
|
||||||
|
allowance provided on a 1gbit port) packets may be generate at a high bit
|
||||||
|
rate and then decimated by the ISP to match the bandwidth allowance.
|
||||||
|
This would normally be fine, but if a fragmented packet is sent, the
|
||||||
|
burst of smaller packets is highly likely to exceed the bandwidth
|
||||||
|
allowance and the impact on upper layer traffic is brutal, causing
|
||||||
|
nearly all packets to get dropped.
|
||||||
|
|
||||||
|
The burble.dn42 network manages this issue by implementing traffic shaping
|
||||||
|
on the outgoing traffic using linux tc (via
|
||||||
|
[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)). This allows
|
||||||
|
outgoing packets to be queued at the correct rate, rather than being
|
||||||
|
arbitrarily decimated by the ISP.
|
||||||
|
{{</hint>}}
|
||||||
|
|
||||||
Network configuration on hosts is managed by systemd-networkd.
|
|
||||||
|
|
||||||
## BGP EVPN
|
## BGP EVPN
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Overlaying the IPsec/L2TP mesh is a set of VXLANs managed by a BGP EVPN.
|
Overlaying the Wireguard/L2TP mesh is a set of VXLANs managed by a BGP EVPN.
|
||||||
|
|
||||||
The VXLANs are primarily designed to tag and isolate transit traffic, making
|
The VXLANs are primarily designed to tag and isolate transit traffic, making
|
||||||
their use similar to MPLS.
|
their use similar to MPLS.
|
||||||
|
|
||||||
The Babel routing protocol is used to discover loopback addresses between nodes;
|
The Babel routing protocol is used to discover loopback addresses between nodes;
|
||||||
Babel is configured to operate across the point to point L2TP tunnels and with a static,
|
Babel is configured to operate across the point to point L2TP tunnels and with a
|
||||||
latency based metric that is applied during deployment.
|
static, latency based metric that is applied during deployment.
|
||||||
|
|
||||||
The BGP EVPN uses [FRR](https://frrouting.org/) with two global route reflectors
|
The BGP EVPN uses [FRR](https://frrouting.org/) with two global route reflectors
|
||||||
located on different continents, for redundency. Once overheads are taken in to account
|
located on different continents, for redundency. Once overheads are taken in to account
|
||||||
|
Loading…
x
Reference in New Issue
Block a user