Connection to a Primary/Standby DB Instance Suspended Using pt-osc
Scenario
The pt-online-schema-change (pt-osc) tool can be used to perform online DDL operations on a local single instance, but cannot be used on an RDS for MySQL primary/standby instance and the connection is suspended, as shown in the following figure.
Possible Causes
How pt-osc works:
- Create an empty table with the same structure as the original table but with the suffix _new added to the table name.
- Modify the structure of the empty table created in 1.
- Add three triggers to the original table: delete, update, and insert. The triggers are used to execute the statements to be executed in the original table in the new table during data copy.
- Copy the data in the original table to the new table in the form of data chunks.
- Rename the original table, change the name of the new table to that of the original table, and delete the original table.
- Delete the triggers.
A large amount of data needs to be copied. There will be a replication delay between the RDS for MySQL primary and standby instances. Workloads running on the standby instance may be affected. Considering the replication delay, the pt-osc provides the following options:
- --max-lag
- --check-interval
- --recursion-method
- --check-slave-lag
If the replication delay of the standby instance exceeds the value of max-lag, the tool stops copying data for check-interval seconds. If you specify check-slave-lag, the tool only monitors that particular server for replication delay. It does not monitor other servers. recursion-method is used to control exactly which servers the tool monitors. Its values include processlist (default value, monitoring the primary/standby replication delay), hosts, dsn, and none (ignoring the primary/standby replication delay). For more information, see pt-online-schema-change.
In this case:
- When pt-osc is used to connect to the RDS for MySQL primary/standby instance, the connection is suspended because there is a primary/standby replication delay and the tool stops copying data. You can add --recursion-method=none to solve the problem.
- If the primary/standby replication delay is ignored, data copy becomes fast. To minimize the impact on workloads, you can set the --max-load configuration item.
Solution
Add the --recursion-method=none configuration item to the pt-osc command to ignore the replication delay.
Common uses of pt-osc:
- Adding a field
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "ADD COLUMN content text" D=aaa,t=tmp_test --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --print --execute
- Deleting a field
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "DROP COLUMN content " D=aaa,t=tmp_test --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --quiet --execute
- Modifying a field
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "MODIFY COLUMN age TINYINT NOT NULL DEFAULT 0" D=aaa,t=tmp_test --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --quiet --execute
- Renaming a field
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "CHANGE COLUMN age address varchar(30)" D=aaa,t=tmp_test --no-check-alter --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --quiet --execute
- Adding an index
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "ADD INDEX idx_address(address)" D=aaa,t=tmp_test --no-check-alter --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --print --execute
- Deleting an index
pt-online-schema-change --user=root --password=xxx --host=xxx --alter "DROP INDEX idx_address" D=aaa,t=tmp_test --no-check-alter --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --print --execute
If the primary/standby replication delay is important for your workloads, adjust the following parameters as required: max-lag, check-interval, recursion-method, and check-slave-lag. For more information, see pt-online-schema-change.
- Scenario
- Possible Causes
- Solution