TAP-MYSQL: Support full GTID set tracking for multi-source MySQL repl…#1271
TAP-MYSQL: Support full GTID set tracking for multi-source MySQL repl…#1271DrashtiChhatralia wants to merge 1 commit intotransferwise:masterfrom
Conversation
|
Comment
You can also request review from a specific team by commenting 💡 If you see something that doesn't look right, check the configuration guide. |
There was a problem hiding this comment.
Pull request overview
Changes: Bugfix (1), Test improvement (1), Documentation update (1)
This PR updates tap-mysql’s MySQL GTID handling to persist and resume from the full GTID set (multi-UUID), which prevents reprocessing/duplicate events in multi-source or post-migration topologies.
Changes:
- Return/store the full MySQL
@@GLOBAL.gtid_executedset (whitespace-normalized) rather than filtering to a single UUID. - Accumulate MySQL GTID state as a growing GTID set during binlog consumption, and normalize legacy/contaminated state before resuming.
- Extend unit + integration coverage and update README documentation for GTID state behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| singer-connectors/tap-mysql/tap_mysql/sync_strategies/binlog.py | Implements full GTID set tracking/normalization and GTID-set merging during MySQL binlog sync. |
| singer-connectors/tap-mysql/tests/unit/sync_strategies/test_binlog.py | Adds/updates unit tests for full-set GTID fetch, normalization, merging, and pre-bookmark normalization. |
| singer-connectors/tap-mysql/tests/integration/test_tap_mysql.py | Updates GTID integration assertions for MySQL’s shared GTID set behavior across streams. |
| singer-connectors/tap-mysql/README.md | Documents MySQL full GTID set state semantics and separates MySQL vs MariaDB GTID state examples. |
Comments suppressed due to low confidence (1)
singer-connectors/tap-mysql/README.md:360
- The MariaDB GTID format described above (
domain-serverid-sequencewith hyphens) doesn’t match the example values below, which use colon-separated0:...:.... Please update the example to use the correct MariaDB GTID format (e.g.0-<server_id>-<sequence>), consistent with the tap’s state and unit tests.
{
"bookmarks": {
"example_db-table1": {"log_file": "mysql-binlog.0003", "log_pos": 3244, "gtid": "0:364864374:599"},
"example_db-table2": {"log_file": "mysql-binlog.0001", "log_pos": 42, "gtid": "0:364864374:375"},
"example_db-table3": {"log_file": "mysql-binlog.0003", "log_pos": 100, "gtid": "0:364864374:399"}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1d8fe4e to
024652d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
024652d to
18a0ceb
Compare
|
Hello team, I’ve submitted this PR for review. Could a maintainer please review it? |
18a0ceb to
6388a99
Compare
|
/request-review Could a maintainer please review it? |
|
Success 🎉 The review request was sent to the following teams:
If you see something that doesn't look right, follow this doc to improve our slack channel mapping. Thank you! |
Context
The current use_gtid implementation for MySQL only tracks a single server UUID from @@GLOBAL.gtid_executed.
In multi-source MySQL setups or after server migrations, the GTID set spans multiple UUIDs (e.g. uuid1:1-5291,uuid2:1-81). The existing behavior:
Drops non-matching UUIDs from the state
Uses only one UUID for auto_position on resume, causing MySQL to replay events from other UUIDs → duplicate data/reprocessing data
This PR updates the implementation to support full GTID set tracking, ensuring correct resume behavior and eliminating duplicate data issues.
MariaDB and non-GTID MySQL pipelines remain unaffected.
Types of changes
What types of changes does your code introduce to PipelineWise?
Put an
xin the boxes that applyChecklist