STP. Part 1. STP (802.1D-1998)

STP. Part 1. STP (802.1D-1998)
STP. Part 2. RSTP (802.1w)
STP. Part 3. PVST+
STP. Part 4. MSTP
STP. Part 5. Bridge Assurance

STP is a Layer 2 management protocol that provides path redundancy in a network while preventing undesirable loops. In terms of mathematics, network topology can be represented as a graph. A graph without loops is called a tree. A tree is an undirected graph in which any two vertices are connected by exactly one path. STP builds trees by disabling redundant links/segments.

Building of STP topology:

Step 1. Elect one Root Bridge (one per topology)

Step 2. Elect Root Ports (one per non-root-bridge)

Step 3. Elect Designated Ports (one per segment – a link between switches)

Step 4. Make leftover ports Alternate (Non-Designated Ports)

STP uses Bridge ID for Root Bridge election. Bridge ID (BID) is an 8-byte value:
– 2 bytes – bridge priority
– 6 bytes – bridge MAC-address

Root Path Cost is used for electing the Root Port. This value shows the distance between an upstream bridge and the Root bridge. Root Path Cost is a sum of all links costs up to the Root Bridge. Link cost depends on port bandwidth. Lower cost value is better.

Upstream bridge – next closer to root.
Downstream bridge – next further away from root.

Basic STP operation

Every switch exchanges BPDUs (Bridge Protocol Data Unit). BPDUs are special packets that bridges use to exchange Spanning Tree information. BPDU contains following information:
– Root ID (ID of the Root bridge)
– Root Path Cost (the cost from local bridge to the Root bridge)
– Sender Bridge ID (ID of the bridge that sent this BPDU)
– Sender Port ID (ID of the port from which BPDU was sent)

STP algorithm criteria

When creating a loop-free logical topology, we may have redundancy that implies that we need to distinguish which of redundant links is better. STP uses the following decision sequence (in order of precedence):

Step 1. Choose the lowest Root BID (Root Bridge election)

Step 2. Choose the lowest Path Cost to Root Bridge (Root Port election)

Step 3. Choose the lowest Sender BID (Root Port election for the case of two upstream bridges or Designated port election on a segment for the case of equal Root Path Cost)

Step 4. Choose the lowest Sender Port ID (Root Port election in case of two links to the upstream bridge)

Step 5. Choose the lowest Port ID of local bridge (Root Port election in case of two links to upstream bridge via Hub)

Basic STP topologies

In diagrams below Switch A has MAC address aaaa.aaaa.aaaa, Switch B has MAC address bbbb.bbbb.bbbb, etc. The default Bridge priority is 32768.

Topology 1. Switch A is the Root bridge, since it has the lowest Bridge ID. All ports of Root bridge are Designated Ports.

Switch B and Switch C have direct links to the Root Bridge. Assuming that all links have the same bandwidth, direct links towards the Root bridge have the lowest Root Path Cost and these ports are Root ports. There is only one redundant segment – link between Switch B and Switch C.

On the segment between Switch B and Switch C, port on Switch B will operate as Designated Port (DP), because BPDU sent by Switch B has lower Bridge ID (ID of Switch B) than Bridge ID in BPDU from Switch C. In other words, between Switch B’s BID and Switch C’s BID, Switch B has the lowest BID.

Leftover port is Alternate.

Topology 2. Since Root path Cost values received by Switch D in both BPDU from upstream bridges are equal, then Bridge ID will be compared in order to define the Root Port (RP). Then on segment between Switch D and Switch C, Switch C has the lowest Root Past Cost in its BPDU. This implies that on Switch C port towards to Switch D will operate as Designated Port (DP).

Leftover port will be Alternate.

Topology 3. BPDUs received on both ports of Switch B have same Root Path Cost and Bridge ID, then the lowest port ID of the upstream bridge will brake the tie in order to choose Root Port (RP).

Leftover port is Alternate.

Topology 4. A Hub only reproduce electrical signals. As a result, Switch B will receive the same BDPU on both ports, thus the lowest Port ID of the local bridge will be used to choose Root port (RP).

Leftover port is Backup.

To sum up, in order to build STP topology we need to define Role of each port. There are four Port Roles:
– Root Port – elected based on the lowest Root Path Cost
– Designated Port (on segment – between switches) – elected based on the lowest Root path costs of the switches. If root path costs are equal, then based on BID.
– Alternate Port – a port with not best BPDUs – alternate path to the Root bridge.
– Backup Port – a port that receives its own BPDUs or a port that received same BPDU as other local port.

STP port states

Port State defines the actual port’s operation. In other words, Port State defines whether port forwards frames or not. If it does not, then the port either in preforwarding state or blocking state. There are four Port States defined in STP:

Port State	Port is Included in Active Topology	Port is Learning MAC Addresses	Receiving/ Sending BPDU	Frame Forwarding
Blocking	No	No	Yes/No	No
Listening	Yes	No	Yes/Yes*	No
Learning	Yes	Yes	Yes/Yes*	No
Forwarding	Yes	Yes	Yes/Yes*	Yes

* – Port will process incoming BPDUs. Port also sends BPDUs until port receives superior BPDU

A Minute of History
Radia Perlman – the inventor of STP (book “Interconnections”):
“The 802.1d standard actually calls for two intermediate states. This arrangement is designed to minimize learning of incorrect station locations while the topology has not yet stabilized. To minimize unnecessarily forwarded frames, the committee deemed it desirable not to allow a bridge to start forwarding until it has built up its learned cache. During the initial portion of the intermediate state, the bridge does not learn station addresses; then during the latter part, the bridge still does not forward packets over that port, but it does start learning station locations on that port. In other words, the intermediate state is subdivided into two substates: listening and learning.

Note: in the original spanning tree algorithm, I had only a single intermediate state, known as preforwarding. The committee asked whether bridges should learn station addresses in the preforwarding state. I said I didn’t think it mattered. The committee decided to break preforwarding into the two aforementioned states. I believe that having two states is unnecessary and that bridges would work fine whether or not they learn station addresses in preforwarding. Breaking this period into two states makes the algorithm more complicated but does no harm.”

STP Operation
There are two types of BPDUs: Configuration BPDU and Topology Change Notification (TCN) BPDU. In most cases, term of BPDU is used for Configuration BPDU.

BPDUs are send from Designated Ports only. The Root Bridge sends BPDUs out of its designated ports every Hello_Time – two seconds by default. In converged topology only Root Bridge originates BPDUs.

Configuration BPDUs are sent in two cases:
– When the Hello_Time expires (every two seconds by default), the Root Bridge originates a Configuration BPDU on every port (assuming that there are no selflooped ports). This is a part of the normal Configuration BPDU processing.
– When a non-Root Bridge receives a Configuration BPDU on its Root Port, it updates Root Path Cost by adding the cost of the port on which this BPDU was received, and sends (propagates) updated Configuration BPDUs on all of its Designated Ports (normal processing). A non-Root Bridge stores the most recent BPDUs received on every port (the Root port and Alternate ports)

Every bridge accepts and retains only the best current Root Bridge information, electing one Root Port towards the Root Bridge. Then the bridge blocks alternate paths to the Root Bridge, leaving only the single optimal upstream path and continues relaying optimal information downstream.

If a bridge learns of a better (“superior”) Root Bridge BPDU on any of its ports, the previous “best” information is erased and the new one immediately accepted and relayed. Thus, only the best information is relayed.

Configuration BPDU contains following fields:

    protocol id:   0000 IEEE 802.1d
    version id:    00
    bpdu type:     00 config bpdu
    bit field:     1 byte
      1 : topology change flag
      8 : topology change ack
    root priority    2 bytes
    root id:         6 bytes
    root path cost:  4 bytes
    bridge priority: 2 bytes
    bridge id:       6 bytes
    port id:         2 bytes
    message age:     2 bytes
    max age:         2 bytes
    hello time:      2 bytes
    forward delay:   2 bytes

TCN BPDU, on the other hand, contains nothing, but its type:

    protocol id:   0000 IEEE 802.1d
    version id:    00
    bpdu type:     80 tcn bpdu

STP is all about timeouts

There is a constant delay of 30 seconds between STP decision to make a port Designated and the port starting forwarding. A port must go through the sequence of Listening and Learning States (preforwarding). Each of these states lasts for the time of Forward_Delay, which is 15 seconds.

There are two types of a network failure: direct and indirect. Direct failure is when port physically goes down – switch can detect it almost immediately. Indirect failure implies some intermediate devices between switches, like a Hub or a media converter. Imagine Switch A is connected to Switch B via Hub. If Switch B goes Down, port on Switch A will remain Up. The only thing that changed – logical connectivity – no BPDUs will be received by Switch A. In order to handle situation like this, STP uses Max_Age timer.

By default, Max_Age is 20 seconds. 20 seconds is the maximum time a port will store BPDU information received earlier.

Max_age is tighten to Message_Age. The Message Age contains the length of time that has passed since the root bridge initially originated the BPDU. The root bridge sends all its BPDUs with a message age value of 0, and all subsequent switches add 1 to this value. Effectively, this value contains the information on how far you are from the root bridge when you receive BPDU. You may think of it as a Hop Count. For more information about STP timers, please refer to this document – Understanding and Tuning Spanning Tree Protocol Timers

Inferior BPDU carries information about the Root Bridge that is worse than the one that is currently stored on the port. Inferior BPDUs may appear when a neighboring bridge loses its Root Port and claims itself the new root for the topology. By default, the switch ignores inferior BPDUs for the time of (Max_Age – Message_Age).

When the Root Bridge goes down, however, this process makes convergence slower by adding extra overhead of almost Max_Age time.

STP Convergence Example

Topology is converged and stable. BDPUs are sent out from Designated ports.

Scenario 1. Link between Switch B and Switch C fails.

From every switch perspective no action is needed, except TCN BPDU generated on link down event.

Scenario 2. Link (port BA) between Switch B and Switch A (Root Bridge) fails.

From Switch B perspective, the active path to the Root Bridge is lost. BPDU information stored on the port BA is invalidated and the bridge attempts to elect new Root Port based on the stored information. If such port can be found, ports are transitioned through Listening/Learning states. There are no BPDUs on the port BC because the role of the port CB is Alternate/Non-Designated. If there are no more Root Ports left after a link failure, a bridge declares itself as the Root and starts announcing that in its BPDUs. So, Switch B declares itself as the Root Bridge and starts sending BPDUs from its Designated ports.

From switch C perspective, since the existing BPDU information on port CB has not expired yet (for the period of time Max Age – Message Age), BPDUs from Switch B are considered as inferior and discarded.

Once BPDU information on the port CB expires, the port CB will become Designated and go through Listening/Learning states. During Listening/Learning states, Switch C will forward BPDUs from the Root Bridge and accept BPDUs from switch B.

Eventually, port BC will become Root Port and will end up in Forwarding state.

In this scenario, maximum time for the network to converge is 50 seconds: (Max_Age – Message_Age) + 2 x Forward_Delay (Listening and Learning states).

Scenario 3. Link between C and Switch A (Root Bridge) fails.

From switch C perspective, the active path to the Root Bridge is lost. BPDU information stored on the port CA is invalidated and the Switch C attempts to elect new Root Port based on information stored on port CB.

In this scenario, time for network to converge is 30 seconds: 2 x Forward_Delay (Listening and Learning states).

STP Port Failure Summary

If a port that failed was Designated or Alternate/non-Designated, then nothing happens.

If the Root Port fails, the information stored with this port is invalidated and the bridge attempts to elect new Root Port or declare itself as new Root Bridge.

Topology Changes

Switches use MAC address tables (one per VLAN) for the frame forwarding, thus a topology change could make some parts of the table invalid. Since bridges do not know what exactly has changed, these tables on all switches in the topology should be updated.

A Topology change is an event when:
– a port is going down from the forwarding state (blocking for instance)
– a port transitions to forwarding state and the bridge has a Designated port (the bridge was not standalone with just the Root Port)
– a switch becomes the Root Bridge

The process of sending a Topology Change Notification (TCN) to all bridges in the network involves two steps:
– TCN is relayed to the Root bridge.
– The Root Bridge informs the rest of the L2 domain that there was a topology change.

TCN BPDUs are sent out of the Root port in three cases:
– a port transitions to the Forwarding state and the bridge has at least one Designated Port
– a port transitions from the Forwarding or Learning states back to the Blocking state
– a TCN BPDU is received on a Designated Port (TCN BPDU propagation)

In detail:
– The bridge that detects a topology change, starts sending TCN BPDUs to the Root Bridge every Hello_Time seconds until the upstream bridge responds with a Configuration BPDU with Topology Change (TC) Acknowledge flag set.
– Every bridge on the path to Root, propagate TCN BPDU in the same way. TCN BPDU sent out its Root port every Hello_Time seconds until it is in turn acknowledged. The process continues until the TCN reaches the Root Bridge.
– When the Root Bridge receives and acknowledges the TCN BPDU, it starts to send out its Configuration BPDUs with the Topology Change (TC) flag set for the duration of Max_Age + Forward_Delay seconds.
– These Configuration BPDUs with TC flag (TC BPDUs) are relayed by every bridge in the network. TC flag makes bridges to reduce Aging_time from the default interval of 300 seconds to Forward_Delay seconds.

Since a switch has a lot of ports connected to end host, every up/down event of such port will trigger TCN propagation. It is possible to reduce amount of TC events by marking all host/edge (non-transit) ports STP PortFast. Up/down events on PortFast ports do not generate TC event.

Cisco’s STP Extensions: UplinkFast and BackboneFast

The UplinkFast feature is based on the definition of an Uplink Group. On a given switch, the Uplink Group consists in the Root Port and all the ports that provide an alternate connection to the Root Bridge. If the Root Port fails, a port with next lowest cost from the Uplink Group is selected to immediately replace it.

UplinkFast only works when the switch has blocked ports that provides redundant connectivity to the root.

A Bridge elects one of the upstream ports as the Root Port, and mark another as alternate port. Upon detecting primary link failure, the alternate path is immediately activated. To further accelerate failure recover time, the bridge start announcing all currently known MAC addresses using them as source addresses in dummy multicast frames sent upstream on the new Root Port.

As soon as UplinkFast is configured on a switch, switch automatically adjusts some STP parameters in order to help achieve this:
– The bridge priority of the switch is increased to a significantly higher value than the default. This ensures that the switch is not likely to be elected Root Bridge, which does not have any Root Ports (all ports are designated).
– All the ports of the switch have their cost increased by 3000. This ensures that switch ports are not likely be elected designated ports.
BackboneFast
In order to get rid of this Max_age delay (in case of receiving inferior BPDU on Alternate port), BackboneFast introduces two enhancements:
– The ability to detect an indirect link failure as soon as possible. This is achieved by tracking the inferior BPDUs that a designated bridge sends when it experiences a direct link failure.
– A mechanism that allows for an check immediate check if the BPDU information stored on a port is still valid. This is implemented with a new protocol data unit (PDU) and the Root Link Query

BTW, Cisco uses different implementation of STP which called Per-Vlan Spanning Tree. Check out this post for details.