Avoid pulling pre-2023 data into movedBoxes stat#2704
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts the backend SQL used for the movedBoxes statistics so that, in production, all history rows considered for box reconstruction (including “Record created/deleted/undeleted” events) are constrained by the configured “earliest 2023” history.id cutoff, preventing pre-2023 history events from being pulled into the dataset.
Changes:
- Tightened the
BoxHistoryCTE filter soh.id >= %sapplies uniformly (including to “Record created/deleted/undeleted” change types). - Simplified the boolean logic by grouping the change-type checks under a single
h.id >= %sguard.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2704 +/- ##
=======================================
Coverage 76.83% 76.83%
=======================================
Files 300 300
Lines 22233 22233
Branches 2245 2245
=======================================
Hits 17082 17082
Misses 5104 5104
Partials 47 47
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@pylipp I assume that %s is a variable - is there a reason the computation date isn't hardcoded in? Or are you setting this in a central module where it's then called by different functions? |
@aerinsol yes, the query is invoked from one place which passes the cut-off history ID. Since that's not compatible for testing, this variable is only set in production. |
@aerinsol the changedate column is not indexed (only composite index with tablename or record_id), so I stick with the approach to filter for a variable |
https://trello.com/c/EFSdLs3F
This didn't affect data display because in the FE we filter by a date range (earliest 2023-01-01).
It reduces the querying time and request size (and hence latency) for bases with a considerable amount of pre-2023 boxes.