Add batch DELETE/UPDATE samples for datasets exceeding 3k row limit#698
Add batch DELETE/UPDATE samples for datasets exceeding 3k row limit#698rmconstantin wants to merge 8 commits intoaws-samples:mainfrom
Conversation
There was a problem hiding this comment.
Can you add the pycache path to gitignore?
| while (true) { | ||
| try (Connection conn = pool.getConnection()) { | ||
| conn.setAutoCommit(false); | ||
| String sql = "UPDATE " + table + " SET " + setClause + ", updated_at = NOW()" |
There was a problem hiding this comment.
How does this ensure progress over all of the source items?
| * gradle run --args="--endpoint <cluster-endpoint> [--user admin] | ||
| * [--batch-size 1000] [--num-workers 4]" | ||
| */ | ||
| public class Main { |
There was a problem hiding this comment.
Could you add an integ test that runs these batch ops?
| ); | ||
|
|
||
| -- Create an asynchronous index on the category column. | ||
| -- Aurora DSQL requires CREATE INDEX ASYNC for tables with existing rows. |
There was a problem hiding this comment.
For all tables, maybe delete this comment
| @@ -0,0 +1,52 @@ | |||
| # Aurora DSQL Batch Operations | |||
There was a problem hiding this comment.
I think we might be better organizing these examples under the specific language/driver pairing instead of having it as a top level dir.
Can we also add integ tests for each example? There should be patterns for how to do that in each language
| * @param connection a JDBC connection (autoCommit should be false) | ||
| * @param operation the database operation to execute | ||
| * @param maxRetries maximum retry attempts (default 3) | ||
| * @param baseDelay base delay in seconds for backoff (default 0.1) |
There was a problem hiding this comment.
Nit: can we make baseDelay milliseconds instead?
| */ | ||
| public class Repopulate { | ||
|
|
||
| private static final String INSERT_SQL = |
There was a problem hiding this comment.
What's going on with the repopulate fn vs the batch setup script?
8bc247c to
a3525c4
Compare
|
Updated the code to address all comments.
Ready for another look. |
Demonstrates sequential and parallel batch processing patterns for Aurora DSQL with OCC retry logic and hashtext() partitioning. Includes Python (psycopg2), Java (pgJDBC), and Node.js (node-postgres) implementations.
- Add SELECT COUNT(*) post-check after each batch loop to verify all matching rows were processed (sequential and parallel, all 3 languages) - Update integration tests to seed data via psql -f batch_test_setup.sql - Add connect_timeout to Python pool creation for IPv6 fallback
63b7d16 to
94efc68
Compare
There was a problem hiding this comment.
What's this jar file for? Should we be shipping it?
There was a problem hiding this comment.
Should this and gradelw be checked in or gitignored?
Demonstrates sequential and parallel batch processing patterns for Aurora DSQL with OCC retry logic and recommended connection management. Includes Python (psycopg2), Java (pgJDBC), and Node.js (node-postgres) implementations.
Fixes #693 .
By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT-0 license.
Thank you for your contribution!