Trainging mod implementation (WIP)#2663
Conversation
f418c55 to
b86628c
Compare
const-t
left a comment
There was a problem hiding this comment.
I see that PR is WIP, but I have few comments for the future.
| */ | ||
| if (likely(!tfw_mode_is_disabled())) { | ||
| s = rcu_dereference(g_stats); | ||
| percpu_counter_add(&s->sum, delta1); |
There was a problem hiding this comment.
What a reason to use percpu_counter instead of simple per-cpu var? percpu_counter pretty large and has overhead, must be a reason to use it.
| @@ -0,0 +1,181 @@ | |||
| /** | |||
There was a problem hiding this comment.
I suggest renaming this to adaptive_limits.c or similar and use word "training" only in sense of "training mode" as the state of the adaptive limits.
| atomic_long_t max; | ||
| s64 __percpu *counter; | ||
| u16 epoch; | ||
| } TfwClientCounter; |
There was a problem hiding this comment.
From my point of view we should move this to training.h. All other related structs as well
| } | ||
|
|
||
| static bool | ||
| tfw_client_counter_training_check(TfwClientCounter *counter, |
There was a problem hiding this comment.
It seems client.c not the right place for this function. I would prefer to have it in training.c
| return defence(curr); | ||
|
|
||
| if (tfw_client_counter_change_max(counter, curr, &delta1, &delta2)) | ||
| adjust_num(delta1, delta2); |
There was a problem hiding this comment.
I would suggest moving update of the global stats to the tfw_http_conn_recv_finish(), we don't need live update of the counter during training
4b8f8f9 to
96e0ae8
Compare
4681521 to
40ac0a7
Compare
e48e696 to
cd3f102
Compare
cd3f102 to
5f843e6
Compare
Introduce library for 128 bit calculations which are not spupported in linux kernel: - 128/32 division using bitwise long division - integer square root via binary search Needed for training mode statistics collection where large numbers of clients can cause 64-bit counter overflow during aggregation.
Add a generic training/defence subsystem used to detect abnormal
behavior based on z-score statistics.
The implementation provides:
- training mode: collect per-event statistics (sum, sumsq, count)
using percpu counters to minimize contention;
- defence mode: evaluate incoming values against calculated mean/std
and reject anomalies exceeding configured z-score threshold (drop
connection with TCP RST);
Use adaptive limits (training/defence) library with per-client connection
tracking. Maintain current and maximum number of concurrent connections
per client and update statistic on each new maximum of concurrent
client connections. In defence mode calculate z-score for the
client on each new established connection and drop connection if
z-score exceeded configured threshold.
Use adaptive limits library for non-idempotent requests tracking (we account only non-idempotent requests since they really block an upstream connection). Implement new structure `TfwAdaptiveLimitLock` with per-cpu counter to track current count of non-answered non-idempotent requests. In defence mode in `tfw_http_conn_recv_finish` callback calculate z-score, compare it with configured `threshold` and drop client connection if necessary. Current approach with per-cpu request accounting prevent performance degradation.
Add per-socket training_epoch field to track the training generation for connection-related statistics. This allows associating socket events with a specific training period and prevents mixing measurements across training epochs when switching between TRAINING and DEFENCE modes.
Use adaptive limits library for client cpu usage tracking. Use `TfwAdaptiveLimitLock` structure for cpu usage tracking. We calculate time at the beginning of the `ss_tcp_process_data`, then calculate time in the `conn_recv_finish`. Use delta time for client cpu usage tracking.
Use training library for client memory usage tracking. Use `TfwAdaptiveLimitLock` structure for client memory usage tracking. In defence mode in `tfw_http_conn_recv_finish` callback calculate z-score, compare it with configured `threshold` and drop client connection if necessary (same as we do for non-idempotent requests). Current approach with per-cpu request accounting prevent performance degradation. Pay attention that we also adjust memory usage in per-cpu `mem` storage to check `soft` and `hard` mem limits. We should do it in other storage, because we zero `TfwAdaptiveLimitLock` on the start of the new training and do not account events from previous trainging in `TfwAdaptiveLimitLock`. Performance measurements for the whole patchset were made and show no measurable regression: Training: 1262705 req/s 1272613 req/s 1264688 req/s Defense: 1272456 req/s 1263205 req/s 1256504 req/s Master: 1253438 req/s 1253207 req/s 1248473 req/s Although training and defense modes appear slightly faster than master, the difference is below 2% and falls within normal run-to-run variation. No statistically significant performance impact was observed.
55c3eab to
127ad54
Compare
No description provided.