EtherCAT Master Redundancy

Reliability and fail-safe operations are vital for most industrial automation systems. Unexpected system downtime results in considerable loss of labor time and often material. Therefore, it’s important to protect production systems from unexpected hardware and/or software failures.

EtherCAT technology is an advance and high-performance interfaces for system level networking in a production facility. Besides great flexibility and high throughput, it offers some basic fault tolerance features to withstand possible hardware failures. The most powerful of these features is Cable Redundancy which utilizes physical ring topology to preserve stable bus operation when the ring is actually broken.

KPA offers Master Redundancy as an extension feature in its latest release of KPA EtherCAT Master. Master Redundancy is a KPA patented technology, that allows uninterrupted operation of the EtherCAT system in case of a HW/SW failure even during a high-performance synchronized operations.

How to save the bus when its Master has gone

Let’s recall the basic principles of EtherCAT operation, as it is important to understand the concept that underlies Master Redundancy feature of KPA EtherCAT Master.

Unlike other Ethernet-based industrial interfaces, EtherCAT utilizes hop-by-hop communication protocol: a data telegram is common to all the slave devices connected to the bus, and it passes from one device to another. Master is an agent that cyclically creates a telegram, fills it with read or write requests, output data (or slots for input data) and sends it to the bus in strict time intervals. Each slave device can insert (write) or extract (read) the block of data that is addressed explicitly to it.

While it is intended to effectively utilize the bus throughput, this feature is extremely useful for the fault tolerance: any device connected to the bus is fully aware of the activity of all the slaves and can transparently acquire, or sniff, the data that is transferred between the bus master and the slaves. You don’t need to modify any slave devices, add any additional signals, or change the transfer protocol: this feature is free from any added costs (besides a smarter master device, of course).

This side-effect is used by KPA EtherCAT Master for introducing another bus master (or several masters) to the bus. During normal operation, this redundant master is passive, being able to sniff the data but not enter its own telegrams. Passive master is considered secondary, while the active one is primary. As all the secondary masters are consistent with the bus activity, they are ready to take place of the primary one anytime when the latter fails.

What’s more important here, to detect the abnormal situation on the bus, a secondary master can does not need any dedicated control device or additional signal lines. As you have already learned, EtherCAT telegrams come in regular, strict time intervals. When a secondary passive master is not getting a telegram he expects, he confidently knows that the bus has no master anymore. There is no need to wait until the current cycle is over – the redundant master can take over the control immediately, posting his own telegram. And this telegram will be correct and meaningful, because the secondary master was tracking down all the changes, being a true clone of its failed counterpart.

How Master Redundancy works

EtherCAT configuration with master redundancy enabled comprises one active (primary) master device and one or several passive (secondary) masters. Primary master might not be configured to Master Redundancy, but it is preferable to use KPA EtherCAT Master software to utilize all the advantages of this technology.

Secondary master is connected to the bus as a shadow agent. It is sniffing data telegrams as they pass by without any changes. At the same time, this master calculates the time when each telegram arrives and tracks possible delays between expected and actual time of arrival.

When the telegram is delayed, a watchdog timer starts. After the time specified in the master’s settings is elapsed, the master starts his failover protocol:

  1. Internal switch of the master is triggered: now the master is able to write the bus.
  2. A new telegram prepared by the secondary master is written to the bus.
  3. In this telegram, a request to free the bus is sent to the active master, because he might be still online but suffering from some internal troubles. We definitely don’t want the situation when the previously active master decides to control the bus after recovering.
  4. Control application of the new bus master is notified about going online.

It is possible to return the active master into its passive state with the bus configurator tool; another passive master will take control automatically.

To enable multiple redundant masters, their watchdog timers are configured differently. It is possible to force random generation of the watchdog time for each master. Thus, we avoid the situation of collision when two or more secondary masters can attempt to start the failover protocol.

Summary

The feature that we propose can protect industrial communication network from a severe, hard to recover failure of its control node. It utilizes architecture peculiarities of EtherCAT technology to implement cost-effective solution that greatly increases fault tolerance without compromising its flexibility and performance. Almost any EtherCAT-enabled industrial automation system can be upgraded with this feature: you need to attach to the bus one or several clones of your controller with only slight or even no modifications of its logic.

If you are not using KPA EtherCAT Master for EtherCAT communication, our engineers will help you to integrate it into your control system. Their software stack is compatible with virtually any real-time or general-purpose operating system and can be compiled for different CPUs, micro-controllers and FPGA. And if you are not in the compatibility list, they will tailor-fit the software for your requirements.