From 6c153541ffd57b1e150395941eec45865596ef58 Mon Sep 17 00:00:00 2001 From: Simon Marsh Date: Sat, 11 Jun 2022 23:03:00 +0100 Subject: [PATCH] add some commentary on fragmentation and other clearnet issues --- content/additional/things-to-do.md | 3 ++ content/network/design.md | 68 ++++++++++++++++++++++++------ 2 files changed, 59 insertions(+), 12 deletions(-) diff --git a/content/additional/things-to-do.md b/content/additional/things-to-do.md index 12be8b5..7314f37 100644 --- a/content/additional/things-to-do.md +++ b/content/additional/things-to-do.md @@ -81,6 +81,9 @@ Congratulations, you're connected to DN42 ! ## Complex +- Isolate your network + - Using VRFs + - Using Namespaces - Connect multiple nodes to the same peer AS in different geographic locations - Optimise the routes to the AS - Optimise the routes that the peer AS has to you diff --git a/content/network/design.md b/content/network/design.md index 3ae7126..737e7cd 100644 --- a/content/network/design.md +++ b/content/network/design.md @@ -10,35 +10,79 @@ This page documents some key elements of the current burble.dn42 design. {{
}} -Hosts within the burble.dn42 network are joined using an IPsec/L2TP mesh. +Hosts within the burble.dn42 network are joined using an Wireguard/L2TP mesh. Static, unmanaged, L2TP tunnels operate at the IP level and are configured -to create a full mesh between nodes. IPsec in transport mode protects the -L2TP protocol traffic. +to create a full mesh between nodes. Wireguard is used to provide encryption +and encapsulate L2TP traffic in plain UDP such that it hides fragmentation +and allows packets to be processed within intermediate routers' fast path. Using L2TP allows for a large, virtual MTU of 4310 between nodes; this is chosen to spread the encapsulation costs of higher layers across packets. -L2TP also allows for multiple tunnels between hosts and these are sometimes -used to separate low level traffic without incurring the additional overheads +L2TP also allows for multiple tunnels between hosts and this can also be used +to separate low level traffic without incurring the additional overheads of VXLANs (e.g. for NFS cross mounting). -The network also supports using point to point wireguard tunnels instead of the -IPsec/L2TP mesh. In this case, the large in-tunnel MTU requires UDP fragmentation -support between the hosts. +Network configuration on hosts is managed by systemd-networkd. + +{{}} +Real Life Networks and Fragmentation. + +Earlier designs for the burble.dn42 relied on passing fragmented packets +directly down to the clearnet layer (e.g. via ESP IPsec fragementation, or +UDP fragmentation with wireguard). In practice it was observed that +clearnet ISPs could struggle with uncommon packet types, with packet +loss seen particularly in the +[IPv6 case](https://blog.apnic.net/2021/04/23/ipv6-fragmentation-loss-in-2021/). +It seems likely that some providers' anti-DDOS and load balancing platforms +had a particular impact at magnifying this problem. + +To resolve this, the network was re-designed to ensure fragmentation took +place at the L2TP layer such that all traffic gets encapsulated in to standard +sized UDP packets. This design ensures all traffic is 'normal' and can +remain within intermediate routers' +[fast path](https://en.wikipedia.org/wiki/Fast_path). +{{}} + +{{}} +ISP Rate Limiting + +The burble.dn42 network uses jumbo sized packets that are fragemented by +L2TP before being encapsulated by wireguard. This means a single packet in +the overlay layers can generate multiple wireguard UDP packets in quick +succession, appearing to be a high bandwidth, burst of traffic on the +outgoing clearnet interface. It's vital that all these packets arrive +at the destination, or the entire overlay packet will be corrupted. +For most networks this is not a problem and generally the approach +works very well. + +However, if you have bandwidth limits with your ISP (e.g. a 100mbit bandwidth +allowance provided on a 1gbit port) packets may be generate at a high bit +rate and then decimated by the ISP to match the bandwidth allowance. +This would normally be fine, but if a fragmented packet is sent, the +burst of smaller packets is highly likely to exceed the bandwidth +allowance and the impact on upper layer traffic is brutal, causing +nearly all packets to get dropped. + +The burble.dn42 network manages this issue by implementing traffic shaping +on the outgoing traffic using linux tc (via +[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)). This allows +outgoing packets to be queued at the correct rate, rather than being +arbitrarily decimated by the ISP. +{{}} -Network configuration on hosts is managed by systemd-networkd. ## BGP EVPN ![EVPN diagram](/design/DN42-EVPN.svg) -Overlaying the IPsec/L2TP mesh is a set of VXLANs managed by a BGP EVPN. +Overlaying the Wireguard/L2TP mesh is a set of VXLANs managed by a BGP EVPN. The VXLANs are primarily designed to tag and isolate transit traffic, making their use similar to MPLS. The Babel routing protocol is used to discover loopback addresses between nodes; -Babel is configured to operate across the point to point L2TP tunnels and with a static, -latency based metric that is applied during deployment. +Babel is configured to operate across the point to point L2TP tunnels and with a +static, latency based metric that is applied during deployment. The BGP EVPN uses [FRR](https://frrouting.org/) with two global route reflectors located on different continents, for redundency. Once overheads are taken in to account