The other day, I received a familiar call from a client who was agitated that their supposed Device Level Ring, DLR, had failed to work. He had been promised near instantaneous fail-over if a connection had faulted but could not achieve that level of performance. In fact, he achieved the opposite – a communications failure.
Let’s take a step back. For those of you unfamiliar, Device Level Ring, or DLR, is a resiliency technology used by EtherNet/IP devices. The objective behind DLR is to detect, manage, and recover from single faults within a network with a ring topology. If a failure occurs in the ring, the communication essentially turns around and starts traveling around the ring in the opposite direction, reaching their destination like nothing ever happened.
In short, it allows a system to tolerate a disconnect or failure without any consequences to performance and recover before the system even notices. Then ideally, someone has configured the system to send an alert notification that there was a break, so someone knows that a problem condition exists, and it is resolved.
Side note – To be clear, when I say ‘ring topology,’ I’m not referring to a backbone network ring of switches, but rather a ring that you would find inside of a cabinet with daisy-chained devices leveraging the dual Ethernet ports found on some pieces of hardware. These ports are the first thing to consider.
Consider your typical modern Allen-Bradley Ethernet device. Chances are, you will observe that there are two switch ports on the bottom, typically marked 1 and 2. These appear to be normal RJ45 Ethernet ports, which they are… sort of. Can they pass Ethernet data? Absolutely. Do they do a whole lot more? If they are RA DLR capable, also, absolutely.
To understand what’s going on RA DLR capable devices, consider two concepts in network communications: latency and jitter.
Latency is the time it takes for data to move from point A to point B. Intrinsically, a measure of voltage running down a copper wire to switch, sending it along to the next node cannot be instantaneous; the delay can be nanoseconds or even picoseconds, but there is delay. This is latency, the measure of that delay. The variance in that latency is known as jitter.
In some industrial automation and control systems, speed is essential and minimizing jitter and latency allows for a more deterministic network. This means that data is where it needs to be when it needs to be – particularly in applications such as safety or motion.
The process by which a switch moves data is known as ‘forwarding’. The job of a switch is to ‘forward’ packets of data to wherever they need to go. Most switches use a technology known as ‘store and forward’. Store and forward switching means the switch waits to receive a whole packet of data and checks its integrity before moving it along. In a sense, the Application-Specific Integrated Circuit, the ASIC, within the switchport acts as a sort of buffer before moving data along. Even though it’s immeasurable to human standards, this does delay the movement of data – it adds latency.
The technology used by that seemingly normal-looking ASIC in the RJ45 switchport on an Allen-Bradley DLR-capable device is known as ‘cut-through’ forwarding. This essentially allows the switchport to look at the packet header for the destination and send it on its way. It eliminates waiting for the entire packet to transfer and omits the integrity-checking. Rockwell Automation calls this ‘Embedded Switch Technology’.
Note that this is a hardware implementation of DLR. DLR is an ODVA standard, and not just Rockwell Automation can use it. However, DLR also exists as a software implementation.
Now at its core, with DLR you are connecting devices to one another via these special dual ethernet ports and eventually creating and closing a ring to create the successful DLR design. In a typical Ethernet system, if you plug a cable into itself on a switch without a loop-prevention mechanism, you create a loop or ‘broadcast storm’. If you are unfamiliar with how that happens, let me explain. To identify what’s on the network, a switch will send out what’s known as ARP requests to everything else on the network. By design, when a switchport receives an ARP, it will flood out through the remaining switchports. If there is no ‘end’ because there is a loop on the network, the ARP traffic will continue to propagate to such an extent that it will overload the CPUs of the switches and cause a network outage.
Therefore, in an Ethernet ring of any kind, an appropriate loop-prevention mechanism must be in place to avoid this scenario. The typical means to accomplish this is by artificially blocking a port. If you stop to think about it – that means any given ring topology is actually a linear bus! At least for any given moment in time.
In a DLR, you don’t actually have a linear bus, you have up to 50 nodes physically connected into a ring. But, since a ring would cause a loop, as I mentioned, a single connection is disabled to Ethernet data traffic to logically operate as a linear bus and to not actually operate as a ring. But what if something breaks?
In a DLR design, one device serves as the Supervisor. The job of the Supervisor is to send out what are known as DLR Beacon Frames. These have assigned a timeout value and are sent at intervals across the entire network. The Beacons can still traverse the artificially disconnected connection to go around back to the Supervisor. At any point, if a disconnect occurs, it means that the Supervisor will not see the Beacon return and therefore knows there has been a disconnect. The behavior of DLR will re-open the previously artificially closed port and thereby open a secondary path through the network.
The result is a disconnect and re-establishment of network communication which if properly configured, can take as little as 1-3 milliseconds. This is generally well below the timeout of most PLCs. Therefore, business as usual. And, if you’ve followed the recommendation to add a notification when/if a break were to occur, then you’ll be aware of it and you can address and rectify the problem state.
The issue that we have been running into is that Rockwell Automation has arguably done too good a job in their nearly universal implementation of dual-Ethernet port devices with Embedded Switch Technology. Due to the ubiquitous nature of the dual ports in the Allen-Bradley hardware world and the seamless DLR functionality, in many cases engineers and technicians aren’t realizing they are special. It’s just assumed that they’re regular Ethernet ports and therefore, all regular Ethernet ports support DLR. Not the case.
“Due to … the seamless DLR functionality, in many cases engineers and technicians aren’t realizing they are special. It’s just assumed that they’re regular Ethernet ports and therefore, all regular Ethernet ports support DLR. Not the case”
I have noticed a large uptick recently in our clients attempting to use various third party non-DLR capable devices with dual-Ethernet ports within a DLR system. This is because, as mentioned, they have made the assumption that there is nothing special about a DLR-enabled pair of switchports. Not only would a standard Ethernet switchport not be capable of the technology necessary to reduce communications latency and capable of passing DLR Beacon Frames; they also are not doing things like participating in multicast groups (see my ‘Why a Managed Switch?’ article), applying Quality of Service (QoS) to prioritize EtherNet/IP traffic, and perhaps most importantly for motion-related applications, capable of passing PTP (Precision Time Protocol aka IEEE1588) traffic – all of which are designed to improve network latency and reduce jitter.
The Device Level Ring protocol as implemented by Allen-Bradley is, as mentioned, hardware based. It is intrinsically found on any devices by Allen-Bradley with dual ports marked ‘Embedded Switch Technology’. This is the best indicator for support of DLR.
For devices with a single Ethernet port, DLR capability may be added via a device known as an ETAP. I have seen the fiberoptic version of ETAPs being used as media converters, but their real purpose is to add DLR functionality to a single-port Ethernet device. This makes that device participate in the Device Level Ring.
Certain Allen-Bradley switches also support DLR but bear in mind that there are only specific switch models that support DLR, and only on specific ports. This also does not mean that everything on the switch automatically participates in the DLR ring; only the pair of ports designated for DLR (in and out) are considered part of the DLR.
On the topic of nodes, two more important things:
Remember that it’s all about DLR continuity – the ring itself must be a contiguous string of DLR-capable switchports in order to work properly.
Finally, don’t forget to set up an alert or alarm to notify you that there’s a fault that recovered… otherwise, someday, you may see another fault and think that the DLR didn’t work when in reality, you have suffered two faults! Visibility is paramount in the modern industrial network.
Hopefully, this sheds some light on why the Embedded Switch Technology that is required for Device Level Ring is different than a typical switchport. You can see that they are highly specialized with many features and characteristics all designed to support and improve high speed applications and reduce communication latency.
So, please don’t make the mistake of mixing non-DLR dual Ethernet ports with Embedded Switch Technology switchports – there is a huge difference!