Fix: MySQL Replication Not Working — Replica Lag, Stopped Replication, or GTID Errors
Quick Answer
How to fix MySQL replication issues — SHOW REPLICA STATUS errors, relay log corruption, GTID configuration, replication lag, skipping errors, and replica promotion.
The Problem
MySQL replication has stopped with an error:
SHOW REPLICA STATUS\G
-- Replica_IO_Running: Yes
-- Replica_SQL_Running: No
-- Last_SQL_Error: Error 'Duplicate entry '42' for key 'PRIMARY'' on query.
-- Default database: 'myapp'. Query: 'INSERT INTO users ...'Or the replica is lagging behind the primary:
SHOW REPLICA STATUS\G
-- Seconds_Behind_Source: 3842 -- Over an hour behindOr GTID replication fails after restoring a backup:
Got fatal error 1236 from source when reading data from binary log:
'Cannot replicate because the source purged required binary logs.
Replicate the missing transactions from elsewhere, or provision a new replica
from backup. Consider increasing the source's binary log expiration period.'Why This Happens
MySQL replication works by replaying binary log events from the primary on the replica. It breaks when:
- Replication error on the replica — the most common cause: a statement that succeeded on the primary fails on the replica (duplicate key, row not found, constraint violation). The SQL thread stops until the error is resolved.
- Replica lag — the replica can’t apply binary log events as fast as the primary generates them. Typically caused by slow queries, resource constraints, or a single-threaded replica applying multi-threaded primary writes.
- GTID gaps — when using GTID replication, the replica tracks which transactions it has applied. If GTIDs are missing (from a restore that didn’t preserve GTID state), replication can’t figure out where to resume.
- Binary log expired — if the primary purges binary logs before the replica reads them (due to replica downtime or
binlog_expire_logs_secondstoo low), the replica loses its position.
Fix 1: Diagnose and Resolve SQL Thread Errors
Start with SHOW REPLICA STATUS\G to identify the error:
-- Run on the replica
SHOW REPLICA STATUS\G
-- Key fields to check:
-- Replica_IO_Running: Yes/No — is the IO thread receiving binlogs?
-- Replica_SQL_Running: Yes/No — is the SQL thread applying them?
-- Last_IO_Error: error message from IO thread
-- Last_SQL_Error: error message from SQL thread
-- Seconds_Behind_Source: replication lag in seconds
-- Exec_Source_Log_File / Exec_Source_Log_Pos: current position being appliedFix duplicate key errors (most common):
-- Option 1: Skip the offending transaction (use with caution)
-- For GTID replication:
SET GTID_NEXT = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:N';
BEGIN; COMMIT; -- Empty transaction to "skip" the GTID
SET GTID_NEXT = 'AUTOMATIC';
START REPLICA;
-- For non-GTID replication (legacy):
SET GLOBAL SQL_REPLICA_SKIP_COUNTER = 1; -- Skip 1 event
START REPLICA;
-- Option 2: Fix the data on the replica, then restart
-- If the row is missing on replica but exists on primary:
INSERT INTO users (id, email) VALUES (42, '[email protected]');
START REPLICA;
-- Option 3: Use pt-slave-repair (Percona Toolkit) for data discrepancies
pt-table-sync --replicate myapp.checksums h=replica,u=root,p=pass \
--execute --printWarning: Skipping transactions can cause data drift between primary and replica. After fixing, run
pt-table-checksumto verify data consistency.
Fix 2: Reduce Replication Lag
Enable parallel replication (MySQL 5.7+):
-- On the replica — enable multi-threaded replication
-- Check current setting:
SHOW GLOBAL VARIABLES LIKE 'replica_parallel_workers';
-- Enable parallel workers (use 4-8, matching primary CPU cores)
SET GLOBAL replica_parallel_workers = 8;
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
-- LOGICAL_CLOCK: applies transactions that overlapped on primary in parallel
-- DATABASE: applies transactions for different databases in parallel (simpler, less effective)# my.cnf — persist the settings
[mysqld]
replica_parallel_workers = 8
replica_parallel_type = LOGICAL_CLOCK
replica_preserve_commit_order = ON # Ensures replica commits in same order as primaryEnable binary log group commit on the primary (reduces sync overhead):
# my.cnf on primary
[mysqld]
binlog_group_commit_sync_delay = 1000 # Wait 1ms to group more transactions
binlog_group_commit_sync_no_delay_count = 100 # Or until 100 transactions readyCheck what’s causing lag:
-- See what the SQL thread is applying
SHOW PROCESSLIST;
-- Look for "System lock" or slow queries in the replica SQL thread
-- Check replica SQL thread wait states
SELECT * FROM performance_schema.events_waits_current
WHERE THREAD_ID = (
SELECT THREAD_ID FROM performance_schema.threads
WHERE NAME = 'thread/sql/replica_sql'
);Fix 3: Fix GTID Replication Issues
Check GTID state:
-- On primary
SHOW GLOBAL VARIABLES LIKE 'gtid_mode';
SHOW MASTER STATUS;
-- Shows Executed_Gtid_Set: source UUID:1-12345
-- On replica
SHOW REPLICA STATUS\G
-- Executed_Gtid_Set: what replica has applied
-- Retrieved_Gtid_Set: what replica has received but not yet applied
-- Auto_Position: 1 (using GTID auto-positioning)Fix GTID gaps after a restore:
-- If replica is missing transactions that primary has already purged:
-- Option 1: Provision a new replica from a fresh backup
-- This is the safest approach
-- Option 2: Inject empty transactions to fill the gaps
-- Find the missing GTIDs:
-- Primary has: source_uuid:1-1000
-- Replica has: source_uuid:1-800, source_uuid:850-1000
-- Missing: source_uuid:801-849
-- Inject empty transactions on the replica:
STOP REPLICA;
SET GTID_NEXT = 'source_uuid:801';
BEGIN; COMMIT;
SET GTID_NEXT = 'source_uuid:802';
BEGIN; COMMIT;
-- ... repeat for each missing GTID
-- Or use a loop in a shell script for large ranges:
-- for i in $(seq 801 849); do
-- mysql -e "SET GTID_NEXT='source_uuid:$i'; BEGIN; COMMIT;"
-- done
SET GTID_NEXT = 'AUTOMATIC';
START REPLICA;Set up GTID replication correctly:
# my.cnf — both primary and replica
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
log_bin = mysql-bin
log_replica_updates = ON # Required: replica also writes to its own binlog-- On the replica — connect using GTID auto-positioning
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'primary-host',
SOURCE_USER = 'replication_user',
SOURCE_PASSWORD = 'replication_password',
SOURCE_AUTO_POSITION = 1; -- GTID auto-positioning
START REPLICA;Fix 4: Rebuild a Replica from Scratch
When replication is too far out of sync, rebuilding is faster than catching up:
# Method 1: Using mysqldump with GTID state
# On primary:
mysqldump \
--single-transaction \
--master-data=2 \
--set-gtid-purged=ON \
--all-databases \
-u root -p > full_backup.sql
# Transfer to replica and restore:
mysql -u root -p < full_backup.sql
# On replica — check GTID_PURGED was set:
mysql -e "SHOW GLOBAL VARIABLES LIKE 'gtid_purged';"
# Connect to primary and start replication:
mysql -e "
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='primary-host',
SOURCE_USER='repl',
SOURCE_PASSWORD='password',
SOURCE_AUTO_POSITION=1;
START REPLICA;
"# Method 2: Using Percona XtraBackup (faster for large databases)
# On primary:
xtrabackup --backup --target-dir=/backup/full \
--user=root --password=pass
xtrabackup --prepare --target-dir=/backup/full
# Transfer to replica:
rsync -avz /backup/full/ replica:/var/lib/mysql/
# On replica:
xtrabackup --move-back --target-dir=/backup/full
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql
# Get the GTID position from backup:
cat /backup/full/xtrabackup_binlog_info
# mysql-bin.000042 1234567 source_uuid:1-54321
mysql -e "
RESET REPLICA ALL;
SET GLOBAL gtid_purged='source_uuid:1-54321';
CHANGE REPLICATION SOURCE TO SOURCE_HOST='primary', SOURCE_AUTO_POSITION=1;
START REPLICA;
"Fix 5: Monitor Replication Health
Set up proactive monitoring to catch issues before they become critical:
-- Check replication health (run regularly)
SELECT
SERVICE_STATE AS io_state,
LAST_ERROR_NUMBER,
LAST_ERROR_MESSAGE,
LAST_ERROR_TIMESTAMP
FROM performance_schema.replication_connection_status;
SELECT
SERVICE_STATE AS sql_state,
LAST_ERROR_NUMBER,
LAST_ERROR_MESSAGE,
LAST_ERROR_TIMESTAMP,
LAST_APPLIED_TRANSACTION
FROM performance_schema.replication_applier_status_by_worker;
-- Lag monitoring
SELECT
CHANNEL_NAME,
SERVICE_STATE,
LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,
TIMESTAMPDIFF(
SECOND,
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,
NOW()
) AS lag_seconds
FROM performance_schema.replication_applier_status_by_worker;Shell script for monitoring:
#!/bin/bash
# replication_check.sh — alert if replication is broken or lagging
MYSQL="mysql -u monitor -ppassword -h replica-host"
MAX_LAG=30 # Alert if lag > 30 seconds
STATUS=$($MYSQL -e "SHOW REPLICA STATUS\G" 2>/dev/null)
IO_RUNNING=$(echo "$STATUS" | grep "Replica_IO_Running:" | awk '{print $2}')
SQL_RUNNING=$(echo "$STATUS" | grep "Replica_SQL_Running:" | awk '{print $2}')
LAG=$(echo "$STATUS" | grep "Seconds_Behind_Source:" | awk '{print $2}')
ERROR=$(echo "$STATUS" | grep "Last_SQL_Error:" | cut -d: -f2-)
if [ "$IO_RUNNING" != "Yes" ] || [ "$SQL_RUNNING" != "Yes" ]; then
echo "CRITICAL: Replication stopped! Error:$ERROR"
# Send alert here
elif [ "$LAG" != "0" ] && [ "$LAG" -gt "$MAX_LAG" ]; then
echo "WARNING: Replication lag ${LAG}s exceeds threshold of ${MAX_LAG}s"
fiFix 6: Replication User and Permissions
A misconfigured replication user is a common setup mistake:
-- On primary — create replication user
CREATE USER 'repl'@'replica-ip' IDENTIFIED WITH mysql_native_password BY 'strong-password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'replica-ip';
FLUSH PRIVILEGES;
-- Verify the grant
SHOW GRANTS FOR 'repl'@'replica-ip';
-- On replica — connect
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.0.1',
SOURCE_PORT = 3306,
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'strong-password',
SOURCE_AUTO_POSITION = 1;
-- For SSL/TLS replication (recommended for production)
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'primary-host',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'password',
SOURCE_AUTO_POSITION = 1,
SOURCE_SSL = 1,
SOURCE_SSL_CA = '/etc/mysql/ssl/ca.pem',
SOURCE_SSL_CERT = '/etc/mysql/ssl/client-cert.pem',
SOURCE_SSL_KEY = '/etc/mysql/ssl/client-key.pem';Still Not Working?
Binary log format mismatch — if the primary uses binlog_format=STATEMENT and the replica has issues executing non-deterministic queries (NOW(), RAND()), switch the primary to ROW format. ROW format replicates the actual row changes, not the SQL statements, and is more reliable:
-- Check current format
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
-- Change to ROW (requires replica restart to take effect)
SET GLOBAL binlog_format = 'ROW';read_only not set on replica — if the replica doesn’t have read_only = ON, application writes directly to the replica cause data drift and replication conflicts. Always set read_only = ON and super_read_only = ON on replicas:
[mysqld]
read_only = ON
super_read_only = ON # Prevents even SUPER privilege users from writingSemi-synchronous replication timeouts — if rpl_semi_sync_source_timeout is reached and the primary falls back to asynchronous replication, the replica may have missed committed transactions. Check Rpl_semi_sync_source_no_tx counter.
For related MySQL issues, see Fix: MySQL Deadlock Detected and Fix: MySQL Lock Wait Timeout Exceeded.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Redis Cluster Not Working — MOVED, CROSSSLOT, or Connection Errors
How to fix Redis Cluster errors — MOVED redirects, CROSSSLOT multi-key operations, cluster-aware client setup, hash tags for key grouping, and failover handling.
Fix: MySQL Full-Text Search Not Working — MATCH AGAINST Returns No Results
How to fix MySQL full-text search issues — FULLTEXT index creation, minimum word length, stopwords, boolean mode vs natural language mode, InnoDB vs MyISAM, and LIKE fallback.
Fix: MySQL Index Not Being Used — Query Optimizer Skipping Indexes
How to fix MySQL indexes not being used by the query optimizer — EXPLAIN output, implicit conversions, function on columns, composite index order, cardinality issues, and forcing indexes.
Fix: MySQL Slow Query — Diagnosis and Optimization with EXPLAIN
How to diagnose and fix slow MySQL queries — enabling the slow query log, reading EXPLAIN output, adding indexes, fixing N+1 queries, and optimizing JOINs and ORDER BY.