How To Buy
EN
TR

An outage can start as ordinarily as a single relay falling silent in the field. Then alarms rain down, data flow stops, and decision-making is delayed. In sectors like energy, water, wastewater, oil and gas, and transportation, this delay translates into real-world costs. This is precisely where redundancy comes into play. An architecture using dual SCADA servers, dual RTUs, and dual SIMs maintains data and control with hot standby, failover, and high availability. The goal is clear: we do not want operations to stop, we want zero data loss, and the disaster recovery plan to be ready at all times.

In this article, we address the fundamental building blocks of this architecture and practical implementation points in a simple and focused manner. We explain the path to establishing a culture of high availability, from the field to the center, and from hardware to connection.


Why is High Availability Critical? Risks, Costs, and Goals

Downtime in a SCADA environment is not just the screen going dark. A pump might stop, a valve might remain in the wrong position, or alarms might be delayed. This means production loss, risk of environmental discharge, security vulnerabilities, and regulatory violations. Scenarios such as loss of pressure in water distribution, uncontrolled switching in a substation, or a leak detected late in a pipeline can escalate quickly.

Uptime percentage simply illustrates the business impact. The difference between 99.95% and 99.999% translates to minutes, not days, over the course of a year.

Uptime Percentage Estimated Annual Downtime
99.95% Approx. 4 hours 22 minutes
99.99% Approx. 52 minutes
99.999% Approx. 5 minutes

RTO (Recovery Time Objective) is the maximum acceptable recovery time for the system. RPO (Recovery Point Objective) is the maximum acceptable data loss interval. In the context of SCADA, RTO should be expressed in minutes, and RPO in seconds. This is because alarm and event data must be timestamped and remain consistent during retrieval. Security is a separate topic; especially in critical infrastructures, incorrect data is as dangerous as an incorrect command. Environmental and compliance risks are managed with reportable data and archive continuity.

Therefore, the goals must be clear: high availability, reliable failover, consistent data, and rapid disaster recovery. Redundancy is not an option; it is the foundation of sustainable operation.


Dual SCADA Server: Hot Standby and Fast Failover Design

Consider two SCADA servers: one Active, the other Hot Standby. The Active server runs, handles all sessions, and generates alarms. The Standby server keeps the same database, alarm status, and user sessions synchronized. If the Active server fails, failover occurs automatically and quickly. The goal is for the operator to continue their work without noticing the change.

Critical components of this design:

  • Heartbeat: Servers regularly check each other. Packet loss, latency, and threshold values are well-tuned.

  • Quorum: Decisions in a multi-node setup are made by majority. This prevents unilateral decisions.

  • Split-brain prevention: Prevents two active servers in the event of a network partition. A witness node or tie-breaker is used.

  • Database replication: Data remains up-to-date with synchronous or semi-synchronous replication. The RPO target is determinant here.

  • Session and alarm sync: Operator screens, alarm flow, and acknowledgment information must remain consistent.

To mitigate risks during testing and maintenance:

  • Conduct planned failover drills, and combine the observation with an automated report.

  • Apply software updates in a phased manner, starting with the standby, then the active server.

  • Regularly run backup and restore scenarios.

On the security side, role-based access, multi-factor authentication, and network segmentation are basic needs. HA licensing model and activation rules for the dual node must be clarified beforehand in license management.

To quickly recall SCADA and RTU concepts, this guide offers a useful summary: What is an RTU and how does it work with SCADA.


Field Security with Dual RTU: Reliable I/O and Control

RTU devices are the heart of field control. In a redundant RTU architecture, two devices can share the same I/O. One performs active control, and the other monitors and remains synchronized. If the active device fails, the second device takes over control without interruption.

How it works:

  • I/O sharing can be active-passive or active-active. Active-passive is preferred in most SCADA environments.

  • The primary RTU is selected as the leader during commissioning. The secondary RTU is synchronized in passive mode.

  • Fault is detected by a watchdog signal, communication loss, or power drop.

  • Time synchronization is fixed with NTP or GPS. Event and trend data are maintained with accurate timestamps.

  • Protocol support is important. Protocols like IEC 60870-5-104, DNP3, Modbus TCP, and MQTT are selected for both central connection and inter-station communication. Devices with appropriate class for environmental resilience, temperature, EMC, and vibration conditions should be preferred.

Good practices for power redundancy and field cabling:

  • Use dual power lines and an external UPS.

  • Perform segregation in I/O cables; route input and output groups to separate channels.

  • Adhere to line termination and shielding rules.

  • Define a safe shutdown scenario with watchdog relays.

For those who want to examine RTU examples that support redundancy, two different product families offer a good reference: DM100 RTU redundant SCADA solution and DM500 RTU with redundant CPU modules. This document provides a practical resource for detailed programming and protocol blocks: Mikrodev DCS programming guide.


Dual SIM and Multiple Connections: Seamless Data Communication

Dual SIM makes a big difference in sites relying on cellular infrastructure. Two operators, one goal: connection continues without interruption. The basic logic is to use the primary line as long as it is healthy, and automatically switch to the secondary upon detecting a problem.

Practical settings:

  • Switchover rules: Trigger the switch with signal level, packet loss, RTT threshold, and the number of consecutive errors.

  • Data quota: Monitor the monthly limit, and define the rule for activating the replacement line.

  • Health check: Perform a test to the actual endpoint with Keepalive and periodic ping.

Alternative path options:

  • Ethernet or fiber can be used as the primary path if feasible in the field.

  • Industrial radio links provide low-latency backup connections over short distances.

  • MPLS or SD-WAN solutions offer intelligent routing with central policies.

Security topics:

  • Private APN provides isolation in the cellular network.

  • VPN tunnel protects data with encryption and authentication.

  • Certificate management and device identity prevent unauthorized access.

Hot standby and failover concepts are not only for the server; they are also applied at the network layer.


Disaster Recovery Plan and Continuous Improvement

The disaster recovery plan is not a single document; it is a living process. But it can be managed with simple steps.

  • Determine goals: Define RTO and RPO values based on business impact. RPO can be seconds for critical alarms, and minutes for reporting.

  • Backup strategy: Use a combination of full, incremental, and continuous backups. Keep backups offline and geographically separated.

  • Switchover to the secondary center: Write down step-by-step in the Runbook. Include DNS, connection tunnels, SCADA license migration, operator access, and rollback plan.

  • Drills: Supplement planned drills with surprise tests. Measure results, and record RTO and RPO deviations.

  • Observation and root cause analysis: Generate permanent corrective actions after an incident. Avoid repeating errors with configuration management and versioning.

  • Documentation and training: Prepare short, visual, and role-based guides for operator, maintenance, and network teams. Avoid knowledge loss when personnel changes.

  • Change management: Every patch, device replacement, or architectural update must pass through impact analysis. Approval and a rollback plan are mandatory.

This cycle strengthens the redundancy culture. High availability is sustainable not just with equipment, but with process.


Conclusion

When dual SCADA servers, dual RTUs, and dual SIMs are used together, a backbone is established that maintains control and data from the field to the center. Hot standby, failover, high availability, and disaster recovery disciplines should be considered under one roof. Take action now: clarify your goals, rank risks, test with a small pilot, and then gradually expand. Plan a controlled and measurable journey, not a problem-free one. If you have a scenario you would like to share, leave it as a comment, and let’s clarify it together.

Other Post
All Posts
Modbus Protocol and All Its Features
Modbus Protocol and All Its Features
What is MODBUS Protocol and Its Features? Modbus protocol is a serial communication protocol used to communicate between devices in industrial automation systems. Below you can find information about
Read More
HANNOVER MESSE 2023
HANNOVER MESSE 2023
We have left behind the exciting 5 days of Hannover Messe 2023 Fair, which was held for the 76th time this year and brings together hundreds of thousands of visitors and industry leader companies ever
Read More
FAQs (Frequently Asked Questions) About PLC
FAQs (Frequently Asked Questions) About PLC
Programmable logic control systems (PLC) have become an indispensable tool of modern industrial automation. Programmable logic controllers are an essential cog in industrial systems and provide flexib
Read More
Bursa Inegol OIZ 34.5 KV Medium Voltage Power Distribution Network SCADA System
Bursa Inegol OIZ 34.5 KV Medium Voltage Power Distribution Network SCADA System
Mikrodev products were used in energy monitoring and controls throughout the Inegol Organized Industrial Zone in Bursa, Türkiye. Cutter position information, controls and fault tracking are done throu
Read More
What is PLC Programming? MOBDUS RTU Protocols in Mikrodev PLC Programming
What is PLC Programming? MOBDUS RTU Protocols in Mikrodev PLC Programming
PLCs (Programmable Logic Controllers), one of the indispensable elements of industrial automation, are one of the main tools used today to regulate and control complex production processes. PLC progra
Read More
OPC UA and MQTT in Industrial Communication: Integration in Brownfield Factories
OPC UA and MQTT in Industrial Communication: Integration in Brownfield Factories
OPC UA and MQTT: Definition and Fundamental Differences In the context of industrial communication, OPC UA (Open Platform Communications Unified Architecture) and MQTT (Message Queuing Telemetry Tran
Read More
RTU vs. PLC: Fundamental Comparison in Terms of Differences, Usage Areas, and Control Systems
RTU vs. PLC: Fundamental Comparison in Terms of Differences, Usage Areas, and Control Systems
Success in the field of automation begins with choosing the right control system. RTU (Remote Terminal Unit) and PLC (Programmable Logic Controller) are two fundamental devices used to safely manage c
Read More
What are the advantages of meter reading with MDC OSOS Server?
What are the advantages of meter reading with MDC OSOS Server?
MDC OSOS Server: An Innovative Solution for Modern Electricity Meter Data Management Introduction The automated collection and management of electricity consumption data is a critical need in modern
Read More
DMA-Based Leak Detection in Water SCADA: RTU and Pressure Transient Analysis
DMA-Based Leak Detection in Water SCADA: RTU and Pressure Transient Analysis
Pressure to reduce water losses in utilities increases every year. Moreover, the problem is not limited to water loss alone; energy costs, network lifespan, and field workload also rise simultaneously
Read More
What is SCADA, Who Uses It and How It Works
What is SCADA, Who Uses It and How It Works
What is SCADA, Who Uses It and How It Works What is SCADA? SCADA refers to a system that collects data from various sensors at a factory, plant, or in other remote locations and then sends this da
Read More
CATALOG