159 lines
6.8 KiB
Markdown
159 lines
6.8 KiB
Markdown
---
|
|
title: "Network Design"
|
|
geekdocDescription: "burble.dn42 network design"
|
|
weight: 0
|
|
---
|
|
|
|
{{<hint warning>}}
|
|
This page documents a previous iteration of the burble.dn42 network and
|
|
is currently out of date.
|
|
{{</hint>}}
|
|
|
|
|
|
## Tunnel Mesh
|
|
|
|
{{<figure src="/design/DN42-Tunnels.svg" width="80%">}}
|
|
|
|
Hosts within the burble.dn42 network are joined using an Wireguard/L2TP mesh.
|
|
Static, unmanaged, L2TP tunnels operate at the IP level and are configured
|
|
to create a full mesh between nodes. Wireguard is used to provide encryption
|
|
and encapsulate L2TP traffic in plain UDP such that it hides fragmentation
|
|
and allows packets to be processed within intermediate routers' fast path.
|
|
|
|
Using L2TP allows for a large, virtual MTU of 4310 between nodes; this is
|
|
chosen to spread the encapsulation costs of higher layers across packets.
|
|
L2TP also allows for multiple tunnels between hosts and this can also be used
|
|
to separate low level traffic without incurring the additional overheads
|
|
of VXLANs (e.g. for NFS cross mounting).
|
|
|
|
Network configuration on hosts is managed by systemd-networkd and applied
|
|
with Ansible.
|
|
|
|
{{<hint info>}}
|
|
<b>Real Life Networks and Fragmentation.</b>
|
|
|
|
Earlier designs for the burble.dn42 relied on passing fragmented packets
|
|
directly down to the clearnet layer (e.g. via ESP IPsec fragementation, or
|
|
UDP fragmentation with wireguard). In practice it was observed that
|
|
clearnet ISPs could struggle with uncommon packet types, with packet
|
|
loss seen particularly in the
|
|
[IPv6 case](https://blog.apnic.net/2021/04/23/ipv6-fragmentation-loss-in-2021/).
|
|
It seems likely that some providers' anti-DDOS and load balancing platforms
|
|
had a particular impact at magnifying this problem.
|
|
|
|
To resolve this, the network was re-designed to ensure fragmentation took
|
|
place at the L2TP layer such that all traffic gets encapsulated in to standard
|
|
sized UDP packets. This design ensures all traffic is 'normal' and can
|
|
remain within intermediate routers'
|
|
[fast path](https://en.wikipedia.org/wiki/Fast_path).
|
|
{{</hint>}}
|
|
|
|
{{<hint info>}}
|
|
<b>ISP Rate Limiting</b>
|
|
|
|
The burble.dn42 network uses jumbo sized packets that are fragemented by
|
|
L2TP before being encapsulated by wireguard. This means a single packet in
|
|
the overlay layers can generate multiple wireguard UDP packets in quick
|
|
succession, appearing to be a high bandwidth, burst of traffic on the
|
|
outgoing clearnet interface. It's vital that all these packets arrive
|
|
at the destination, or the entire overlay packet will be corrupted.
|
|
For most networks this is not a problem and generally the approach
|
|
works very well.
|
|
|
|
However, if you have bandwidth limits with your ISP (e.g. a 100mbit bandwidth
|
|
allowance provided on a 1gbit port) packets may be generated at a high bit
|
|
rate and then decimated by the ISP to match the bandwidth allowance.
|
|
This would normally be fine, but if a fragmented packet is sent, the
|
|
burst of smaller packets is highly likely to exceed the bandwidth
|
|
allowance and the impact on upper layer traffic is brutal, causing
|
|
nearly all packets to get dropped.
|
|
|
|
The burble.dn42 network manages this issue by implementing traffic shaping
|
|
on the outgoing traffic using linux tc (via
|
|
[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)). This allows
|
|
outgoing packets to be queued at the correct rate, rather than being
|
|
arbitrarily decimated by the ISP.
|
|
{{</hint>}}
|
|
|
|
|
|
## BGP EVPN
|
|
|
|

|
|
|
|
Overlaying the Wireguard/L2TP mesh is a set of VXLANs managed by a BGP EVPN.
|
|
|
|
The VXLANs are primarily designed to tag and isolate transit traffic, making
|
|
their use similar to MPLS.
|
|
|
|
The Babel routing protocol is used to discover loopback addresses between nodes;
|
|
Babel is configured to operate across the point to point L2TP tunnels and with a
|
|
static, latency based metric that is applied during deployment.
|
|
|
|
The BGP EVPN uses [FRR](https://frrouting.org/) with two global route reflectors
|
|
located on different continents, for redundency. Once overheads are taken in to account
|
|
the MTU within each VXLAN is 4260.
|
|
|
|
## dn42 Core Routing
|
|
|
|

|
|
|
|
Each host in the network runs an unprivileged LXD container that acts as a dn42 router
|
|
for that host. The container uses [Bird2](https://bird.network.cz/) and routes between
|
|
dn42 peer tunnels, local services on the same node and transit to the rest of the
|
|
burble.dn42 network via a single dn42 core VXLAN.
|
|
|
|
Local services and peer networks are fully dual stack IPv4/IPv6 however the transit
|
|
VXLAN uses purely IPv6 link-local addressing, making use of BGP multiprotocol and
|
|
extended next hop capabilities for IPv4.
|
|
|
|
The transit VXLAN and burble.dn42 services networks use an MTU of 4260, however the
|
|
dn42 BGP configuration includes internal communities to distribute destination MTU across
|
|
the network for per-route MTUs. This helps ensure path mtu discovery
|
|
takes place as early and efficiently as possible.
|
|
|
|
Local services on each host are provided by [LXD](https://linuxcontainers.org/lxd/introduction/)
|
|
containers or VMs connecting to internal network bridges.
|
|
These vary across hosts but typically include:
|
|
|
|
- **tier1** - used for publically avialable services (DNS, web proxy, etc)
|
|
- **tier2** - used for internal services, with access restricted to burble.dn42 networks
|
|
|
|
Other networks might include:
|
|
|
|
- **dmz** - used for hosting untrusted services (e.g. the shell servers)
|
|
- **dn42 services** - for other networks, such as the registry services
|
|
|
|
dn42 peer tunnels are created directly on the host and then injected in to the
|
|
container using a small script, allowing the router container itself to remain unprivileged.
|
|
|
|
The routers also run nftables for managing access to each of the networks,
|
|
[bird_exporter](https://github.com/czerwonk/bird_exporter) for metrics and the
|
|
[bird-lg-go](https://github.com/xddxdd/bird-lg-go) proxy for the
|
|
burble.dn42 [looking glass](https://lg.burble.com).
|
|
|
|
## Host Configuration
|
|
|
|

|
|
|
|
burble.dn42 nodes are designed to have the minimum functionality at the host level,
|
|
with all major services being delivered via virtual networks, containers and VMs.
|
|
|
|
Hosts have three main functions:
|
|
|
|
- connecting in to the burble.dn42 Wireguard/L2TP mesh and BGP EVPN
|
|
- providing internal bridges for virtual networks
|
|
- hosting [LXD](https://linuxcontainers.org/lxd/introduction/) containers and VMs
|
|
|
|
Together these three capabilities allow for arbitary, isolated networks and services
|
|
to be created and hosted within the network.
|
|
|
|
The hosts also provide a few ancillary services:
|
|
|
|
- delivering clearnet access for internal containers/VMs using an internal bridge.
|
|
The host manages addressing and routing for the bridge to allow clearnet access independent
|
|
of the host capabilities (e.g. proxied vs routed IPv6 connectivity)
|
|
- creating dn42 peer tunnels and injecting them in to the dn42 router container
|
|
- monitoring via [netdata](https://www.netdata.cloud/)
|
|
- backup using [borg](https://borgbackup.readthedocs.io/en/stable/)
|
|
|