cadlag dot org

Avoiding loops with BGP

Last time I motivated BGP in the context of inter-domain routing. This time, I’d like to describe a bit about how BGP operates.

A basic requirement of any routing protocol is that it cope with routing loops. Recall that under the model of destination-based forwarding, each router forwards a packet based on the packet’s destination IP address. The sequence of routers a packet visits as it is forwarded through the network defines a path. If at any point this packet returns to a router it has previously visited, this path has a loop. Loops are dangerous because a packet will never escape, ultimately exhausting the IP TTL. Note that a routing loop is not the same as a loop in the network. Many network topologies have loops for the sake of redundancy; a network loop is necessary for there to be more than one path between hosts. But it’s the routing protocol’s job to ensure that packets are forwarded in a loop-free fashion.

A naive distance-vector protocol will eventually converge to a loop-free state, given enough time. But this convergence takes time, and you see what is known as “counting to infinity”, where loops are only evicted after computed distances increase step-by-step until reaching a large enough value that they are effectively infinite. While there are various tricks to mitigate this, BGP takes a simpler approach.

Routers running BGP (also known as BGP “speakers”) are connected in peering relationships. Here it’s useful to distinguish between connections within a single autonomous system (iBGP connections) and connections across autonomous systems (eBGP connections). Both eBGP and iBGP connections run the same protocol – BGP – but they are configured slightly differently.

Whereas routers running a naive distance-vector algorithm will advertise distance information, BGP speakers advertise routes. In the context of BGP, a route pairs a set of destinations with several attributes of a path between those destinations. There are several path attributes, but I will mention two of the mandatory ones.

  1. The AS_PATH attribute is a sequence of AS-path segments. In its simplest form this is a list of AS numbers, like [1,6,7], though the actual specification allows for unordered sets as well.
  2. The NEXT_HOP is the IP address of the router that should be used as the next hop to the destination.

When a route is advertised on an eBGP connection, the sender adds its AS number to the AS_PATH. Thus, as information propagates from one AS to another, the route carries information about its history. This makes it straightforward to detect possible inter-AS routing loops. I’ll quote from the spec:

If the AS_PATH attribute of a BGP route contains an AS loop, the BGP route should be excluded from the Phase 2 decision function. AS loop detection is done by scanning the full AS path (as specified in the AS_PATH attribute), and checking that the autonomous system number of the local system does not appear in the AS path. Operations of a BGP speaker that is configured to accept routes with its own autonomous system number in the AS path are outside the scope of this document.

Note that this “Phase 2 decision function” is just jargon for the rule that determines what advertised information actually makes it into the local forwarding table of a BGP speaker (and hence could be subsequently advertised to peers).

The AS_PATH information is enough to prevent AS loops. But what happens within a single AS?

Consider an AS with three routers A, B, and C, as shown below.

                   +-------+        +-------+
             eBGP  |       |  iBGP  |       |
            -------|   A   |--------|   B   |
                   |       |        |       |
                   +-------+        +-------+
                         \            /
                     iBGP \          / iBGP
                           \        /
                            +-------+
                            |       |
                            |  C    |
                            |       |
                            +-------+

Suppose A advertises a route to B, and B advertises this to C, and then C advertises this to A. The AS_PATH of this route does not reveal the loop, because it is internal to the current AS. If A accepted this advertisement, it would be a problem.

The BGP answer to this issue of intra-AS loops is another simple mechanism: speakers should not re-advertise routes received via iBGP. In our above example, this means A will advertise to B and C directly, and neither B nor C will send anything back to A. No loops are possible with this restriction.

While this solution is straightforward, it introduces a scalability problem. Since routes learned via iBGP are not re-advertised, we need to have iBGP connections between every pair of routers running BGP (this is sometimes called “full mesh” connectivity). In the above diagram that’s OK, because there are only three routers. But for N routers, there are N*(N-1)/2 pairs, and the amount of BGP updates flowing around the network and stored within the various routing-information bases can quickly get out of hand.

Fortunately, there is a BGP extension in RFC 4456 allowing a BGP speaker to be designated as a “route reflector”. Route reflectors can advertise iBGP learned routes to configured iBGP peers. Here’s a diagram from RFC 4456:

                 / - - - - - - - - - - - - -  -
                 |           Cluster           |
                   +-------+        +-------+
                 | |       |        |       |  |
                   | RTR-A |        | RTR-B |
                 | |Client |        |Client |  |
                   +-------+        +-------+
                 |       \           /         |
                    IBGP  \         / IBGP
                 |         \       /           |
                           +-------+
                 |         |       |           |
                           | RTR-C |
                 |         |  RR   |           |
                           +-------+
                 |           /   \             |
                  - - - - - /- - -\- - - - - - /
                     IBGP  /       \ IBGP
                  +-------+         +-------+
                  | RTR-D |  IBGP   | RTR-E |
                  |  Non- |---------|  Non- |
                  |Client |         |Client |
                  +-------+         +-------+

The idea of route reflection is to subdivide the routers into “clusters”. Within each cluster we have a designated “route reflector” which is linked in a star topology to a set of “client routers”. Across clusters we have a full-mesh connectivity of the route reflectors. When a route reflector receives advertisements from another cluster, it can advertise this to its clients. If a route reflector receives an advertisement from a client, it can advertise this to all other iBGP peers. If we had k clusters of size M, using route reflectors requires O(k^2 + M) total iBGP connections. Assuming k is more or less a small, fixed constant, you can see that this has linear scaling (rather than the naive quadratic scaling of iBGP).

It’s worth noting that there’s another extension, RFC 5065, that allows several ASes to be treated as a “confederation” that appears as a single AS to external peers. Within each AS of the confederation one would still need full mesh iBGP, but because eBGP is used between ASes of the confederation the connectivity can be sparser. If your confederation had k ASes of size M, then you would need something like O(kM^2) total iBGP connections. In practice, it seems like route reflectors are the more popular option, though there’s a risk for misconfiguration.

#networking

Reply to this post by email ↪