Over the last four years since getting involved in the automation space from IT, I have gone from casual observer to getting heavily involved with end users on a day to day basis. Over time, I began to ‘understand the understanding’ of Ethernet and began the task of educating our user base on the benefits and pitfalls of using Ethernet. I’d like to share what I’ve learned; whether you deal with me on a regular basis or are reading this for the first time.
As I understand things, when Ethernet was introduced to the plant floor, it was billed as a magical technology that would be able to work faster, harder, deliver more data, and support hundreds of devices rather than a couple dozen. It would be a semi-deterministic and low-latency network that could pave the way to sensors and smart devices which would bring data to management so they could make smarter decisions.
While all of this is true, there was a concerted lack of education about Ethernet and its limitations… only the benefits. When I speak with most Control Engineers, they give me a quizzical look when I talk about understanding Ethernet because the prevailing attitude is that ‘if you give it the next IP address and plug it in, it just works.’ Troubleshooting means ‘just plug it into a different port.’ Part of this is due to vendors touting the ease of deployment, neglected to mention what’s happening behind the scenes. The other part, to its credit, is that Ethernet sometimes is that simple.
With the race to get all systems interconnected, what has happened is a series of challenges that result in what we often refer to as ‘network overload.’ There is a tipping point in most networks when you add just one more device to drive the network over the edge in terms of what it can handle, and there is typically a single glaring reason for this: unmanaged switches.
The versatility and ease of deployment of an unmanaged switch is unparalleled, and the conventional wisdom is that devices can be added indefinitely just seemingly comes along with it. They are inexpensive, do not require configuration, and they are easily replaceable. The downside? They are an absolute disaster for an automation or industrial network.
The reason for this is simple – multicast traffic. Ethernet communications can be unicast (one to one), multicast (one to many), or broadcast (one to all). An unmanaged switch lacks a feature known as IGMP (Internet Group Management Protocol) Snooping/Querier. Why does that matter?
Conceptually, a multicast message is only intended for a specific group of endpoints. Without IGMP Snooping and Querier that multicast message can’t be picked apart to identify the target endpoints. The result is that the message is broadcast to all other systems.
Consequently, the device broadcasts a message (to everything on the network) when it only needs to reach a select few, thus occupying additional connections and may continue to do so. This dramatically increases the overhead on the network because the traffic cannot be contained to where it needs to go and is instead just sent to everyone by default!
“Without IGMP Snooping and Querier … the message is broadcast to all other systems”
Take a typical Allen-Bradley EtherNet/IP device. The RPI or Requested Packet Interval is the rate that data updates over a connection which is by default 20ms. Consider then the timeout period, (which is the interval responses must be acknowledged before dropping a connection) is 4x the RPI, or a minimum of 100ms. This means that if the devices do not hear from one another within 100ms, for all intents and purposes, the system will stop running.
With network overhead that high, the probability of a response from devices in question exceeding 100ms is dramatically higher, resulting in a timeout. This is the fundamental issue with network overhead as it applies to a control network. In a small enclosed system, this might never have been nor would it ever be an issue. However, now that everything is connected upstream to everything else, the one small local network is now just a subset of a much larger network and therefore subject to the communications of everything else on the network!
The misconception of “I don’t need to buy a managed switch because I don’t need a faster switch” is one that I commonly hear. Recall that it’s not about the performance of the system that is connected to the switch; it’s what you’re subjecting that system to. A delicate, sensitive automation network with 10/100Mbps links and limited processing ability must be protected.
The irony to this situation is that multicast was considered the default means of communication because it reduced overall network overhead; which is true, but only when managed switches are used! When it is an unmanaged network, the overhead is so massive it will routinely cause communication dropouts. This is due to the timeout experienced when end devices are overloaded with communications – not as commonly believed, because the switch cannot pass traffic.
These days, the option to change multicast to unicast exists for some specific devices but is a stop gap measure at best. There are also other ways to manipulate RPI which can also help, but again, temporary fix. Best practice states you use a managed switch.
While a managed switch brings a whole host of other features to the table, in the automation world, multicast traffic control is by far one of the most important, followed by features like VLANs and traffic prioritization designed to protect control traffic under any circumstance.
Remember that in a modern control network, the only things that truly matter are power and communications. One is irrelevant without the other, yet great effort is made to protect power but communications are something of an afterthought. It’s time to think about how you protect your communications — and using managed switches is a good starting point.
Stay Tuned for Part 2 -– What problems do the next generation of switches resolve?