Data Comparison (Comparing Synchronization Items)

Scenarios

This section describes how to compare synchronization items to check if there are any differences between source and destination databases. To minimize the impact on services and shorten the service interruption duration, the following comparison methods are provided:

Object-level comparison: compares objects such as databases, indexes, tables, views, stored procedures, functions, and table sorting rules.
Data-level comparison is classified into row comparison and value comparison.
- Row comparison: It helps you compare the number of rows in the tables to be synchronized. This comparison method is recommended because it is fast.
- Value comparison: It helps you check whether data in the synchronized table is consistent. The comparison process is relatively slow.

When you check data consistency, compare the number of rows first. If the number of rows are inconsistent, you can then compare the data in the table to determine the inconsistent data.

Constraints

You can manually create a comparison task only when the task is in the incremental phase.
During a comparison, the comparison items are case sensitive. If one of the source or destination database is case insensitive and the other one is case sensitive, the comparison result may be inconsistent.
When a full migration task is complete, DRS automatically creates object-level and row comparison tasks. If operations are performed on data in the source database during data comparison, the comparison results may be inconsistent.
If DDL operations were performed on the source database, you need to compare the objects again to ensure the accuracy of the comparison results.
If data in the destination database is modified separately, the comparison results may be inconsistent.
If the encoding of the source database character type is abnormal, the database driver will convert the character type to an abnormal code point during DRS migration or comparison. As a result, the values may be consistent but the bytes may be inconsistent.
Currently, only tables with primary keys support value comparison. For tables that do not support value comparison, you can compare rows. Therefore, you can compare data by row or value based on scenarios.
The DRS task cannot be suspended during value comparison. Otherwise, the comparison task may fail.
To prevent resources from being occupied for a long time, DRS limits the comparison duration. If the comparison duration exceeds the threshold, the comparison task stops automatically.
- When a full migration task is complete, DRS automatically creates object-level and row comparison tasks. The comparison duration limits to 30 minutes. After the threshold, the comparison tasks automatically stop and the full migration task stops.
- For a row comparison task manually created in the incremental phase, if the source database is a relational database, the row comparison duration limits to 60 minutes. If the source database is a non-relational database, for example, MongoDB, the row comparison duration limits to 30 minutes.
To avoid occupying resources, the comparison results of DRS tasks can be retained for a maximum of 60 days. After 60 days, the comparison results are automatically cleared.
For a migration task from MySQL, virtual columns in the source database do not support value comparison. During the comparison, virtual columns are filtered out.

In the many-to-one row comparison scenario, the number of rows in the table in the source database is compared with that in the aggregation table mapped to the destination database.
In the many-to-one synchronization scenario, value comparison is not recommended because data consistency cannot be ensured.
Value comparison is not supported for a task in which tables in one database are mapped to multiple databases.
For a synchronization task from MySQL, GaussDB(for MySQL), or MariaDB, virtual columns in the source database do not support value comparison. During the comparison, virtual columns are filtered out.
If the source is a PostgreSQL database, the index and constraint names will be changed during table mapping. As a result, the index and constraint names are inconsistent.
If a table in the source MySQL database contains a binary field with a fixed length, the MySQL driver adds \0 to the end of the data based on the length. As a result, there may be data inconsistency after the data is synchronized to the destination GaussDB database.
The empty character inserted into an Oracle database is processed as NULL. For tasks whose destination is an Oracle database, an empty string is considered as NULL. If data in the source database is empty and that in the destination database is NULL, the comparison result is consistent.
During value comparison for synchronization from Oracle to GaussDB Distributed, if the LOB comparison policy is set to Compare length, the BLOB comparison is ignored because BLOB data in the distributed GaussDB instance fails to be queried using DBE_LOB.LOB_GET_LENGTH.

Prerequisites

You have logged in to the DRS console.
A synchronization task has been started.

Creating a Comparison Task

On the Data Synchronization Management page, click the target synchronization task name in the Task Name/ID column.
Choose Synchronization Comparison.
Compare synchronization items.
- Create an Object-Level Comparison task. On the Object-Level Comparison tab, check whether the comparison results of the source and destination databases are consistent. Locate a comparison item you want to view and click View Details in the Operation column.
- On the Data-Level Comparison (row comparison and value comparison) tab, click Create Comparison Task. In the displayed dialog box, specify Comparison Type, Comparison Time, and Object. Then, click OK.
  - Comparison Type: compares rows and values.
    - Row comparison: checks whether the source table has the same number of rows as the destination table.
      Note
      After a task enters the incremental comparison phase, you can create a row comparison task.
    - Value comparison: checks whether the source table has the same data as the destination table.
      Note
      After a task enters the incremental synchronization phase, you can create a value comparison task. After the full synchronization is complete, data in the source database cannot be changed. Otherwise, the comparison result will be inconsistent.
      Value comparison only applies to tables with single-column primary key or unique index. You can use row comparison for tables that do not support value comparison. Therefore, you can compare data by row or value based on scenarios.
  - Comparison Policy: DRS supports one-to-one and many-to-one comparison policies.
    - One-to-one: compares the number of rows in a table in the source database with that in the table mapped to the destination database.
    - Many-to-one: compares the number of rows in a table in the source database with that in the aggregate table mapped to the destination database.
      Note
      If you select Row Comparison for Comparison Type, the Comparison Policy option becomes available.
  - Comparison Time: You can select Start upon task creation or Start at a specified time. There is a slight difference in time between the source and destination databases during synchronization. Data inconsistency may occur. You are advised to compare migration items during off-peak hours for more accurate results.
  - LOB Comparison Policy: The value can be Ignore LOB comparison, Compare length, Compare hash values, or Compare content.
    Note
    LOB comparison policy can be set only for data synchronization from Oracle to GaussDB.
    - Ignore LOB comparison: The system ignores LOB data during value comparison. You are advised to select Ignore LOB comparison because comparing LOB data increases the database load, depending on the LOB comparison method and data volume. Evaluate and test the LOB comparison policy based on the source and destination databases to ensure database performance and stability.
    - Compare length: The built-in functions of the source and destination databases are used to obtain the LOB data length for data comparison.
    - Compare hash values: The built-in functions of the source and destination databases are used to obtain the LOB data hash values for data comparison. Oracle databases use the HASH function in the DBMS_CRYPTO package to obtain the LOB data hash values. To use the DBMS_CRYPTO package, grant SYSDBA permissions to the user. Reference statement:
```
GRANT EXECUTE ON DBMS_CRYPTO TO USER;
```
    - Compare content: The source database reads data in streaming mode and then performs hashing. The destination database uses built-in functions to obtain the LOB data hash values. Compared with hash value comparison, this method reduces the pressure on the source database, but it takes longer.
  - Filter Data: After this function is enabled, objects can be compared based on the configured filtering criteria.
    Note
    Data filtering and comparison can be set only for synchronization tasks from Oracle to GaussDB, GaussDB to Oracle, GaussDB to GaussDB, MySQL to MySQL and MySQL to GaussDB.
    1. After enabling Filter Data, add filtering criteria for the table objects to be compared.
    2. In the Filtering Criteria area, enter the filtering criteria, and click Verify.
      Note
      Each table has only one verification rule.
      Up to 512 tables can be filtered at a time. If there are more than 512 tables, perform rule verifications in batches.
      Standard SQL statements can be used to filter records. Each expression cannot contain packages, functions, variables, or constants specific to a database engine.
      Enter the part following WHERE in the SQL statement (excluding WHERE and semicolons), for example, sid > 3 and sname like "G %". A maximum of 512 characters are allowed.
      In SQL statements for setting filter criteria, keywords must be marked with a field identifier, and the values of datetime (including date and time) and character string type must be enclosed in single quotation marks, for example, `update` > '2022-07-13 00:00:00' and age >10, `update` ='abc'.
      If the TIMESTAMP type is used as a filtering condition, the time of the character type must be set to the time value in the UTC time zone. For example, in MySQL, the TIMESTAMP data is stored based on the UTC time zone. You need to use the time value in the UTC time zone for comparison.
      Implicit conversion rules are not supported. Enter filtering criteria of a valid data type. For example, if column c of an Oracle database uses characters of the varchar2 type, the filtering criteria must be set to c > '10' instead of c > 10.
      Filter criteria cannot be configured for large objects, such as CLOB, BLOB, and BYTEA.
      Non-idempotent expressions or functions cannot be used as data processing conditions, such as SYSTIMESTAMP and SYSDATE, because the returned result may be different each time the function is called.
      During data filtering for real-time synchronization with Oracle serving as the source database, the fixed-length character types NCHAR and CHAR must be matched using complete fixed-length characters.
      You are not advised to set filter criteria for fields of approximate numeric types, such as FLOAT, DECIMAL, and DOUBLE.
      Do not use fields containing special characters as a filter condition.
      Objects whose database names, schema names, or table names are case insensitive cannot be filtered and compared.
      Currently, condition-based filtering is not supported when there are more than 50,000 tables in a database.
      For security purposes, keywords or functions with update meanings, such as the for update statement and updatexml function, cannot be used in SQL fragments.
    3. After the verification is successful, click Generate Processing Rule. The rule is displayed.
    4. Click OK.
  - Object: You can select objects to be compared based on the scenarios.
After the comparison creation task is submitted, the Data-Level Comparison tab is displayed. Click to refresh the list and view the comparison result of the specified comparison type.
Value comparison only applies to tables with single-column primary key or unique index. You can use row comparison for tables that do not support value comparison. Therefore, you can compare data by row or value based on scenarios.

If you want to view the row or value comparison details, click View Results.

If you want to download the row comparison or value comparison result, locate a specified comparison type and click Export Report in the Operation column.
Note
You can cancel a running task at any time and view the comparison report of a canceled comparison task.
You can sort the row comparison results displayed on the current page in ascending or descending order based on the number of rows in the source database table or the destination database table.
If a negative number is displayed in the differences column, the number of rows in the destination database table is greater than that in the source database table. If a positive number is displayed in the differences column, the number of rows in the source database table is greater than that in the destination database table.

Periodic Comparison

Periodic comparison indicates that DRS periodically compares the number of rows in the source database table with those in the destination database table and displays the comparison results.

On the Data Synchronization Management page, click the target synchronization task name in the Task Name/ID column.
Click the Synchronization Comparison tab.
Click the Periodic Comparison tab and click Modify Comparison Policy to modify the comparison policy.
In the Modify Comparison Policy dialog box, enable periodic comparison, configure the comparison frequency and time, and click Yes.
Note
After periodic comparison is enabled, DRS compares the number of rows at the scheduled time. You can view the comparison results on the Data-Level Comparison tab.
After periodic comparison is disabled, only historical comparison results can be viewed.
Modifications to the comparison policy settings take effect from the next comparison and do not affect the on-going periodic comparison tasks.
During periodic comparison, the source and destination databases will be read. Perform the comparison during off-peak hours.
During periodic comparison, ultra-large tables (those with more than 100 million rows) are automatically filtered out. You can use data-level comparison to spot check such large tables. It is not recommended that these large tables be compared periodically.

Parent topic: Task Management

Предыдущая статья

Viewing Synchronization Logs

Следующая статья

Managing Objects

Была ли эта статья полезна?

Поддержка Юридические документы