Local Area Network design/Link aggregation – IEEE 802.3ad
Link aggregation, standardized as IEEE 802.3ad, normally is used between bridges in the backbone or between a bridge and a server to aggregate multiple physical links (usually 2-4) into a single logical channel in order to:
- increase the link bandwidth capacity: traffic is distributed among links in the aggregate;
- improve resiliency, that is fault tolerance: in case of fault of one of the links in the aggregate:
- bandwidth decrease of the logical channel is smooth;
- waiting for STP convergence times is not needed: STP sees the logical channel as a single link of higher capacity → just the link cost changes.
All the physical links aggregated in the same group must:
- be point-to-point between the same two nodes;
- be full-duplex;
- have the same speed.
Link Aggregation Control Protocol (LACP) is used for automatic aggregate configuration:
- first the ports to be aggregated have to be set manually on bridges by the network administrator;
- before activating aggregated ports, LACP is able to automatically recognize the number of available links in the channel, and to check whether the connection with the other party is correct (in particular whether all links are between the same bridges);
- messages called LACPDUs are periodically exchanged to detect possible link faults → convergence is fast (usually less than 1 s) in case of fault.
Each aggregate is identified by a Link Aggregation Group Identifier (LAG ID).
Frame distribution on aggregated ports
When a frame arrives, which one of the links within the aggregate should it be sent to? The standard, although it suggests possible frame distribution criteria on ports, does not define an algorithm to distribute frames → bridges from different vendors can use different frame distribution algorithms.
The simplest solution consists in forwarding the incoming frame to the free port subsequent to the one to which the previous frame was forwarded.
Reordering problems can arise: a smaller frame arriving at the bridge just after a bigger frame may finish being received by the other bridge before the bigger frame → the frame order coming out of the other bridge is not correct because the smaller frame 'overtook' the bigger one:
Based on conversations
The frame reordering problem can be solved by sending to the same link the frames belonging to the same conversation. The most common solution to find the frames belonging to the same conversation is based on source MAC address and destination MAC address pairs.
Finding conversations based on MAC addresses however in some cases is not effective in terms of link load balancing:
- only two hosts communicate through the aggregate → the conversation is unique and can take advantage of just one physical link;
- the aggregate connects two routers → conversations can not be recognized anymore because routers change MAC addresses of frames.
If two nodes are connected by multiple aggregates, only one aggregate will be active due to STP: STP will disable the other aggregate because it sees every aggregate as a single link with cost equal to the sum of link costs in the aggregate.
Through a proper setting of link priorities, a configuration with N aggregated links, of which only M<N active, is possible → the other N-M links are stand-by links: in case an active link faults, a stand-by link activates avoiding to decrease the available bandwidth in the logical channel.
Cisco's proprietary feature Virtual Switching System allows to overcome the constraint of having only two nodes at the aggregate endpoints: a bridge can be connected to two bridges which STP sees as a single logical bridge, so traffic can be distributed on both the aggregated links.