-
Notifications
You must be signed in to change notification settings - Fork 25
Feature/external deployment replication #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
77a0036
d4c3577
5df5c33
f8dfd0a
a83d48a
227873b
9a03023
57f4179
85edbc4
c7e6807
b35b8bf
a02db63
18edd0e
01ef8f5
6f04ee5
1aa3c6a
40bcb0c
02c0ae7
edfd160
f8ab819
1b3781e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,13 +22,19 @@ DROP USER IF EXISTS '<%= p('admin_username') %>'@'<%= host %>'; | |
| <%- end -%> | ||
| DROP USER IF EXISTS 'roadmin'@'<%= host %>'; | ||
|
|
||
| <%- end -%> | ||
| <%- if_p('mysql_backup_password') do |password| -%> | ||
| CREATE USER IF NOT EXISTS '<%= p('mysql_backup_username') %>'@'localhost'; | ||
| ALTER USER '<%= p('mysql_backup_username') %>'@'localhost' IDENTIFIED WITH <%= p('engine_config.user_authentication_policy') %> BY '<%= password %>'; | ||
| GRANT RELOAD, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT, /*!80001 BACKUP_ADMIN,*/ PROCESS ON *.* to '<%= p('mysql_backup_username') %>'@'localhost'; | ||
| GRANT SELECT on performance_schema.keyring_component_status to '<%= p('mysql_backup_username') %>'@'localhost'; | ||
| GRANT SELECT ON performance_schema.log_status TO '<%= p('mysql_backup_username') %>'@'localhost'; | ||
| <%- end | ||
| allowed_remote_backup_hosts='localhost' | ||
|
|
||
| if p('engine_config.enable_replication_source') | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it makes sense to just allow |
||
| allowed_remote_backup_hosts='%' | ||
| end | ||
|
|
||
| if_p('mysql_backup_password') do |password| -%> | ||
| CREATE USER IF NOT EXISTS '<%= p('mysql_backup_username') %>'@'<%= allowed_remote_backup_hosts %>'; | ||
| ALTER USER '<%= p('mysql_backup_username') %>'@'<%= allowed_remote_backup_hosts %>' IDENTIFIED WITH <%= p('engine_config.user_authentication_policy') %> BY '<%= password %>'; | ||
| GRANT RELOAD, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT, /*!80001 BACKUP_ADMIN,*/ PROCESS ON *.* to '<%= p('mysql_backup_username') %>'@'<%= allowed_remote_backup_hosts %>'; | ||
| GRANT SELECT on performance_schema.keyring_component_status to '<%= p('mysql_backup_username') %>'@'<%= allowed_remote_backup_hosts %>'; | ||
| GRANT SELECT ON performance_schema.log_status TO '<%= p('mysql_backup_username') %>'@'<%= allowed_remote_backup_hosts %>'; | ||
|
|
||
| <%- end -%> | ||
| <%- hosts.each do |host| -%> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| <% if p('replicator_enabled') == true %> | ||
| check process pxc-replicator | ||
| with pidfile /var/vcap/sys/run/bpm/pxc-replicator/pxc-replicator.pid | ||
| start program "/var/vcap/jobs/bpm/bin/bpm start pxc-replicator" with timeout <%= p('monit_startup_timeout') %> seconds | ||
| stop program "/var/vcap/jobs/bpm/bin/bpm stop pxc-replicator" | ||
| group vcap | ||
| <% end %> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this need to be a long-lived process? Could setup be a "one shot" task? Maybe okay to keep it in bpm, but maybe use
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought it's useful to have it running to be aware when it stops working. a one shot would probably do the job just as well, until the remote changes in a way that replication breaks. At this point the replica instance would not indicate at all that the replication broke. Currently, if the source instance would be restored to another state, the job would notice and restart the replication by getting a fresh dump of the changed upstream. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| --- | ||
| name: pxc-replicator | ||
|
|
||
| templates: | ||
| bpm.yml.erb: config/bpm.yml | ||
| bin/setup: bin/setup | ||
| bin/post-start: bin/post-start | ||
| source.mysql.cnf.erb: config/source.mysql.cnf | ||
|
|
||
| packages: | ||
| - percona-xtradb-cluster-8.0 | ||
| - percona-xtradb-cluster-8.4 | ||
| - percona-xtrabackup-2.4 | ||
| - percona-xtrabackup-8.0 | ||
| - percona-xtrabackup-8.4 | ||
| - pxc-utils | ||
|
|
||
| consumes: | ||
| - name: replica | ||
| type: conn | ||
| properties: | ||
| - port | ||
| - mysql_version | ||
| - mysql_backup_username | ||
| - mysql_backup_password | ||
| - tls | ||
|
|
||
| properties: | ||
| monit_startup_timeout: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With bpm this is probably not super useful - the pid gets dropped as soon as the job starts and monit will treat the job as healthy. But this made me wonder if we even really need monit for this use case. See my comments on the monit template. |
||
| description: 'How long to wait for monit to show running for the process' | ||
| default: 5 | ||
| mysql_version: | ||
| description: 'deployed version' | ||
| default: 8.0 | ||
| logging.format.timestamp: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't look like this is used anywhere. |
||
| default: rfc3339 | ||
| replicator_enabled: | ||
| description: 'Whether to enable the job ( skips writing monit file if set to false )' | ||
| default: true | ||
| config.source.tls: | ||
| description: 'TLS certificates to use for connections' | ||
| example: { | ||
| "ca": "...", | ||
| "certificate": "...", | ||
| "private_key": "..." | ||
| } | ||
| default: {} | ||
| config.source.tls: | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| description: 'TLS certificates to use for connections' | ||
| example: { | ||
| "ca": "...", | ||
| "certificate": "...", | ||
| "private_key": "..." | ||
| } | ||
| default: {} | ||
| config.tls.enabled: | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| description: 'Use TLS for connections & authentication' | ||
| default: false | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| . /var/vcap/packages/pxc-utils/logging.sh | ||
|
|
||
| #it's a bash feature. | ||
| #https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#index-SECONDS | ||
|
|
||
| SECONDS=0 | ||
|
|
||
| while [[ $SECONDS -lt <%= p('monit_startup_timeout') %> ]]; do | ||
| if grep "$(date +%Y-%m-%dT%H:%M).*replication healthy" /var/vcap/sys/log/pxc-replicator/pxc-replicator.stdout.log > /dev/null; then | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems a little brittle. I would probably actually query the replica and validate the IO_Thread and SQL_Thread are in a healthy state. |
||
| log "replication reported healthy within the laset minute" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: laset => last
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 |
||
| exit 0 | ||
| fi | ||
| sleep 5 | ||
| done | ||
|
|
||
| log "timed out waiting for replication to report healthy" | ||
| exit -1 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| #!/usr/bin/env bash | ||
| set -e | ||
|
|
||
| #shellcheck disable=1091 | ||
| . /var/vcap/packages/pxc-utils/logging.sh | ||
|
|
||
| function is_replication_configured(){ | ||
|
|
||
| ! test "$($self_mysql <<< "SHOW REPLICA STATUS\G")" = "" | ||
| } | ||
| function is_replica_io_running (){ | ||
| $self_mysql <<< "SHOW REPLICA STATUS\G" | grep 'Replica_IO_Running: Yes' | ||
| } | ||
| function is_replica_sql_running (){ | ||
| $self_mysql <<< "SHOW REPLICA STATUS\G" | grep 'Replica_SQL_Running: Yes' | ||
| } | ||
| function enable_replication (){ | ||
| $self_mysql <<< "STOP REPLICA; | ||
| CHANGE REPLICATION SOURCE TO | ||
| SOURCE_HOST='${SOURCE_ADDR}', | ||
| SOURCE_USER='${SOURCE_USER}', | ||
| SOURCE_PASSWORD='${SOURCE_PASS}', | ||
| SOURCE_AUTO_POSITION=1; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| START REPLICA;" | ||
| } | ||
| function sync_databases(){ | ||
| log "resyncing with full backup" | ||
| log $($self_mysql <<< "STOP REPLICA;") | ||
| log $($self_mysql <<< "RESET REPLICA;") | ||
| log $($self_mysql <<< "RESET MASTER;") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a note: RESET MASTER is deprecated in v8.4 and removed in MySQL v9, replaced with I understand this is likely used here since pxc-release still supports MySQL v8.0 which still requires this syntax.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, what's the preferred way of handling this? I noticed while reading docs that they're still https://dev.mysql.com/doc/refman/8.4/en/replication-howto-repuser.html it seems that the permissions still does not have an alternative?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, v8.0 (and earlier) only support We've typically dealt with this kind of thing with version checks: if mysql_major_minor() == "8.0": # <- Delete branch when we don't care about old version anymore
do_legacy_sql_stuff()
else:
do_latest_sql_stuff()
endThat could potentially be done at template rendering time by looking at the MySQL v8.0 is EOL (although we'll probably get one more patch release from Percona for v8.0.46), so there is an argument to only support v8.4 and onwards. Maybe guard against using this feature on v8.0? # In some pxc-replicator template
<%- if p('mysql_version') == "8.0" -%>
raise "Unsupported <helpful error message>"
<%- end -%>
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm. How'd that play with cf-depl? It seems that 8.4 is still experimental cloudfoundry/cf-deployment@3870c60 I'm not sure if there are any reasons why it's still setup to default to 8.0 outside of being cautious. I'm going to ask around what the plans for runtime are in regards to switching the default. For now I guess I'll try to find all relevant places and sprinkle some
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking here was that it could be reasonable to not support this replication feature for EOL MySQL versions to simplify things. However this exposes that, as pxc-release maintainers, we have been bad netizens of cf-d and have not updated the default MySQL version to v8.4. I'll try to follow-up on that. I think it's okay to add a version check to support both the old and new syntax for now. |
||
| log $($source_dump --all-databases --triggers --routines --single-transaction | $self_mysql) | ||
|
abg marked this conversation as resolved.
|
||
| } | ||
| function source_upcheck(){ | ||
| $source_mysql <<< "exit" | ||
| } | ||
| function self_upcheck(){ | ||
| $self_mysql <<< "exit" | ||
| } | ||
|
|
||
| while ! source_upcheck; do | ||
| log "waiting for source to be available" | ||
| sleep 5 | ||
| done | ||
|
|
||
| while ! self_upcheck; do | ||
| log "waiting for self to be available" | ||
| sleep 5 | ||
| done | ||
|
|
||
| log "starting replication setup" | ||
|
|
||
| log "checking existing databases in source instance" | ||
|
|
||
| log "checking replication status" | ||
|
|
||
| if [[ ! is_replication_configured ]]; then | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| log "replication is not yet enabled" | ||
| sync_databases | ||
|
|
||
| log "replication is not configured. enabling" | ||
| OUT=$(enable_replication); | ||
| if [[ $(enable_replication) != "" ]]; then | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| log "failed enabling replication: '$OUT'" | sed "s/${PASS}/<REDACTED>/g" | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| else | ||
| log "replication enabled" | ||
| fi | ||
| else | ||
| log "replicaton already enabled, skipping" | ||
| fi | ||
|
|
||
| while true; do | ||
| log "checking running status" | ||
| RUNNING_STATUS="$($self_mysql <<< "SHOW REPLICA STATUS\G" | grep 'Replica_.*_Running: ' | xargs )" | ||
| if is_replica_io_running && is_replica_sql_running; then | ||
| log "replication healthy" | ||
| elif ! is_replica_io_running; then | ||
| log "restarting replica" | ||
| $self_mysql <<< "START REPLICA;" | ||
| elif ! is_replica_sql_running ; then | ||
| log "replication Replica_SQL_Running marked as no. Attempting to resync" | ||
| sync_databases | ||
| else | ||
| log "$RUNNING_STATUS" | ||
| for line in $($self_mysql <<< "SHOW REPLICA STATUS\G" | grep 'Err' | tr -d '[:blank:]' ); do | ||
| log "${line/$SOURCE_PASS/<redacted>/}"; | ||
| done | ||
| fi | ||
| log "$RUNNING_STATUS" | ||
| sleep 5 | ||
| done | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| <%- | ||
| if !['rfc3339', 'unix-epoch'].include?(p('logging.format.timestamp')) | ||
| raise "'#{p('logging.format.timestamp')}' is not a valid timestamp format for the property 'logging.format.timestamp'." + | ||
| " Valid options are: 'rfc3339' and 'unix-epoch'." | ||
| end | ||
| path = [ | ||
| "/usr/bin", | ||
| "/bin", | ||
| "/var/vcap/packages/percona-xtradb-cluster-#{p('mysql_version')}/bin", | ||
| ] | ||
| -%> | ||
| --- | ||
| processes: | ||
| - name: pxc-replicator | ||
| executable: /var/vcap/jobs/pxc-replicator/bin/setup | ||
| args: [] | ||
| env: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would probably be clearer to just set these env vars in the setup template itself rather than here. I am bouncing around between bpm and setup to figure out the magic variables set in the bpm to understand what's being done in setup. Some of it feels a little magical. |
||
| PATH: <%= path.join(":") %> | ||
| SOURCE_USER: <%= link('replica').p('mysql_backup_username') %> | ||
| SOURCE_PASS: <%= link('replica').p('mysql_backup_password') %> | ||
| SOURCE_ADDR: <%= link('replica').address %> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consuming this information from a link is handy, but means cross-environment replication is not supported - only replicas under a single bosh director. |
||
| self_mysql: 'mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf' | ||
| source_dump: 'mysqldump --defaults-file=/var/vcap/jobs/pxc-replicator/config/source.mysql.cnf' | ||
| source_mysql: 'mysql --defaults-file=/var/vcap/jobs/pxc-replicator/config/source.mysql.cnf' | ||
|
|
||
| persistent_disk: true | ||
| ephemeral_disk: true | ||
|
Comment on lines
+44
to
+45
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these used anywhere? I did not see a |
||
| additional_volumes: | ||
| - path: /var/vcap/packages/pxc-utils | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You shouldn't need explicit access to /var/vcap/packages here.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 |
||
| writeable: false | ||
| - path: /var/vcap/sys/run/pxc-mysql | ||
|
abg marked this conversation as resolved.
|
||
| writeable: false | ||
| - path: /var/vcap/jobs/pxc-mysql/config | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is needed, I would mark this I think better would be to use a "replication admin" role. Maybe worth adding a
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my current thinking is to just generate a bosh user for the initial root account of the replica target and set that via the set-repilcator-target ops file.. Mostly because I noticed that if I don't change the initial root user name between cf-d and my replica deployments, they both used defaults for naming the root user. So after initial sync, the targets root account used the pw from cf-d.. |
||
| - path: /var/vcap/jobs/pxc-mysql/certificates | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems unused right now. Presumably this might be used for the local replication <-> pxc-mysql replication admin connection and we should probably consume that via a (deployment local) link. We should probably update pxc-mysql to expose |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| <%- | ||
| require 'json' | ||
| if_link('replica') do |source| %> | ||
| [client] | ||
| user = <%= source.p('mysql_backup_username') %> | ||
| password = <%= source.p('mysql_backup_password') %> | ||
| host = <%= source.instances[0].address %> | ||
| port = <%= source.p('port') %> | ||
| <%- end %> | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-replicator? | ||
| value: | ||
| name: pxc-replicator | ||
| release: pxc | ||
| properties: | ||
| config.tls.enabled: true | ||
|
|
||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/properties/engine_config?/binlog?/enable_gtid_mode? | ||
| value: true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| - type: replace | ||
| path: /releases/name=pxc | ||
| value: | ||
| name: pxc | ||
| version: create | ||
| url: file://./ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this the same as operations/dev-release.yml?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I seem to have ignored the existing one. Thanks 🧹 |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/properties/engine_config?/enable_replication_source | ||
| value: true | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/provides?/replica | ||
| value: | ||
| shared: true | ||
| as: source | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/properties/mysql_backup_username? | ||
| value: ((mysql_backup_user.username)) | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/properties/mysql_backup_password? | ||
| value: ((mysql_backup_user.password)) | ||
| - type: replace | ||
| path: /variables/name=mysql_backup_user? | ||
| value: | ||
| name: mysql_backup_user | ||
| type: user | ||
| parameters: | ||
| length: 32 | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-mysql/properties/engine_config?/enable_replication_target | ||
| value: true | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/jobs/name=pxc-replicator/consumes?/replica | ||
| value: | ||
| from: source | ||
| deployment: ((source_deployment_name)) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/azs? | ||
| value: ((azs)) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - type: replace | ||
| path: /name | ||
| value: ((deployment_name)) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/networks? | ||
| value: [{name: ((network_name))}] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/vm_type? | ||
|
abg marked this conversation as resolved.
Outdated
|
||
| value: ((vm_type)) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - type: replace | ||
| path: /instance_groups/name=mysql/vm_type? | ||
| value: ((vm_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are used anymore and should be cleaned up.