Skip to content

KIP-932: Add Share Consumer soak tests + high-throughput perf variant#2230

Open
Ankith L (Ankith-Confluent) wants to merge 24 commits into
masterfrom
dev_kip-932_soak_tests
Open

KIP-932: Add Share Consumer soak tests + high-throughput perf variant#2230
Ankith L (Ankith-Confluent) wants to merge 24 commits into
masterfrom
dev_kip-932_soak_tests

Conversation

@Ankith-Confluent

@Ankith-Confluent Ankith L (Ankith-Confluent) commented Apr 14, 2026

Copy link
Copy Markdown
Member

What

  • Add --share flag to soakclient.py that runs Producer + ShareConsumer
  • Implicit and Explicit
  • Adds ShareConsumer (KIP-932) coverage to the CKPy soak client and introduces a
    higher-throughput soakclient_perf.py variant for the HT soak

Checklist

  • Contains customer facing changes? Including API/behavior changes
  • Did you add sufficient unit test and/or integration test coverage for this PR?
    • If not, please explain why it is not required

References

JIRA:

Test & Review

Open questions / Follow-ups

@confluent-cla-assistant

Copy link
Copy Markdown

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@Ankith-Confluent Ankith L (Ankith-Confluent) changed the base branch from dev_kip-932_queues-for-kafka to dev_kip-932_share_consumer_poll April 27, 2026 10:01
Base automatically changed from dev_kip-932_share_consumer_poll to dev_kip-932_queues-for-kafka May 13, 2026 11:41
@k-raina

Copy link
Copy Markdown
Member

Ankith L (@Ankith-Confluent)
Could you please rebase this branch?

@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_queues-for-kafka branch from a980019 to b5cb017 Compare May 20, 2026 05:58
@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from 78301f5 to 8e9dcd0 Compare May 20, 2026 12:50
@sonarqube-confluent

Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from 2978328 to 2153fe2 Compare June 4, 2026 10:24
@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_queues-for-kafka branch from d600e92 to 5e41108 Compare June 9, 2026 09:55
@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from 770ec69 to 449ec68 Compare June 11, 2026 16:35
@sonarqube-confluent

Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
1.1% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from 2ad627d to 4f8327e Compare June 19, 2026 15:54
Base automatically changed from dev_kip-932_queues-for-kafka to master June 23, 2026 08:53
Copilot AI review requested due to automatic review settings June 23, 2026 16:30
@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from 65ff773 to 067298d Compare June 23, 2026 16:30
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot June 23, 2026 16:30
Copilot AI review requested due to automatic review settings June 23, 2026 16:39
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot June 23, 2026 16:39
Copilot AI review requested due to automatic review settings June 23, 2026 16:58
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot June 23, 2026 16:58
@sonarqube-confluent

Copy link
Copy Markdown

Copilot AI review requested due to automatic review settings July 1, 2026 05:04
@airlock-confluentinc airlock-confluentinc Bot force-pushed the dev_kip-932_soak_tests branch from c2aad2f to b219457 Compare July 1, 2026 05:04
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 1, 2026 05:04
Copilot AI review requested due to automatic review settings July 1, 2026 05:05
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 1, 2026 05:05
Copilot AI review requested due to automatic review settings July 1, 2026 05:20
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 1, 2026 05:20
@Ankith-Confluent Ankith L (Ankith-Confluent) changed the title KIP-932: Add ShareConsumer support to soak tests KIP-932: Add ShareConsumer soak tests + high-throughput perf variant Jul 1, 2026
@Ankith-Confluent Ankith L (Ankith-Confluent) changed the title KIP-932: Add ShareConsumer soak tests + high-throughput perf variant KIP-932: Add Share Consumer soak tests + high-throughput perf variant Jul 1, 2026
Copilot AI review requested due to automatic review settings July 1, 2026 08:24
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 1, 2026 08:24

@k-raina Kaushik Raina (k-raina) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there scope to common out code between soakclient.py and soakclient_perf.py?

Comment thread tests/soak/soakclient.py Outdated

if enable_share:
if not HAS_SHARE_CONSUMER:
raise RuntimeError("ShareConsumer requested but not available in this " "confluent_kafka build.")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case share consumer fails, should producer also shut down?

Comment thread tests/soak/soakclient.py
self.share_err_cnt += 1
self.incr_counter("consumer.error", 1)
else:
self.share_consumer.commit_async()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we plan to add ack callback to catch errors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that, when I implemented this before, that mechanism was not implemented yet in Librdkafka side.
Will add.

Comment thread tests/soak/soakclient.py
if hw > 0:
if msg.offset() <= hw:
self.logger.warning(
"share: Old or duplicate message {} "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic prints "Old or duplicate message" for delivery retries?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will change the the log text.

Comment thread tests/soak/soakclient.py
self.msg_err_cnt += 1
self.incr_counter("consumer.msgerr", 1)

self.msg_cnt += 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to count corrupt messages also?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not count them because we already have another metric counting the erros.
So it would pollute the readings.

Comment thread tests/soak/soakclient.py Outdated
self.share_err_cnt += 1
self.incr_counter("consumer.error", 1)

self.share_consumer.close()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should close be moved into finally?

Comment thread tests/soak/soakclient.py Outdated
self.logger.info("share: aborted by user")
self.run = False
except Exception as ex:
self.logger.fatal("share: fatal exception: {}\n{}".format(ex, traceback.print_exc()))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.logger.fatal("share: fatal exception: {}\n{}".format(ex, traceback.print_exc()))
self.logger.fatal("share: fatal exception: {}\n{}".format(ex, traceback.format_exc()))

Check if format_exec is more relavant? Same for rest of files

Comment thread tests/soak/soakclient.py

# Create topic (might already exist)
aconf = filter_config(conf, ["consumer.", "producer."], "admin.")
aconf = filter_config(conf, ["consumer.", "producer.", "share."], "admin.")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ".share" in filter config changes share.acknowledgement.mode -> acknowledgement.mode, which is an invalid config. i think this will cause crash for explicit mode. Is my understanding correct?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second arg to filter_config is the drop list, not a rename. So share.acknowledgement.mode is dropped from aconf, not renamed to acknowledgement.mode.
Explicit mode still works because sconf['share.acknowledgement.mode'] = 'explicit' is set directly on the share conf after filter_config
I verified it end-to-end with --share --explicit.

Copilot AI review requested due to automatic review settings July 1, 2026 10:26
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 1, 2026 10:26
@Ankith-Confluent

Copy link
Copy Markdown
Member Author

Thanks! Kaushik Raina (@k-raina)
I have addressed the comments.
Regarding the point to common out the code, I think it's better to keep it separated and simple, in the future if we want to modify the perf script, we will be required to change the soak client as well.
What are your thoughts about this?

@k-raina

Kaushik Raina (k-raina) commented Jul 1, 2026

Copy link
Copy Markdown
Member

In the current codebase, the two files (soakclient.py and soakclient_perf.py) overlap heavily:

  • Out of the 31 function definitions, 28 are byte-for-byte identical.
  • Only 3 functions differ in actual behavior: SoakRecord.init, SoakRecord.serialize, producer_run

Because the two files are largely copies, any fix applied to one must be manually ported to the other. Over time this leads to silent, unintended drift (for example, divergence in logging detail, error handing etc.).

One way to address this is to make soakclient.py the base class and soakclient_perf.py a subclass.

Copilot AI review requested due to automatic review settings July 2, 2026 04:20
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 2, 2026 04:20
Copilot AI review requested due to automatic review settings July 2, 2026 05:01
@Ankith-Confluent Ankith L (Ankith-Confluent) removed the request for review from Copilot July 2, 2026 05:01
@sonarqube-confluent

Copy link
Copy Markdown

@k-raina Kaushik Raina (k-raina) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM!
Since delta btw files is tiny, runtime branching works fine for this PR. In future if there are many hooks or variants, we might need to revisit and adopt Inheritance.

Comment thread tests/soak/soakclient.py
self.msg_err_cnt = 0
self.consumer_err_cnt = 0
self.consumer_error_cb_cnt = 0
self.last_commited = None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be?

Suggested change
self.last_committed = None

Comment thread tests/soak/soakclient.py
self.incr_counter("consumer.msg", 1)

# end-to-end latency
headers = dict(msg.headers())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Does this crash for msg with no header?

Comment thread tests/soak/soakclient.py
# duration the OS can actually honor. -r remains the target rate
# (an upper bound: if a batch takes longer than its time budget,
# we run as fast as we can, below -r).
batch = max(1, int(self.rate / 100)) # ~100 batches/sec

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note : By default , rate is hardcoded to 80 in run.sh:

time opentelemetry-instrument $testdir/soakclient.py -i $TESTID -t $topic -r 80 -f $1 $share_flag $explicit_flag $perf_flag

So each batch is one message and batching only kicks in at -r ≥ 200.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants