Skip to content

optimize performance of array_to_qualitystring#1363

Merged
jmarshall merged 3 commits intopysam-developers:masterfrom
jchorl:jchorl/perf
Apr 17, 2026
Merged

optimize performance of array_to_qualitystring#1363
jmarshall merged 3 commits intopysam-developers:masterfrom
jchorl:jchorl/perf

Conversation

@jchorl
Copy link
Copy Markdown
Contributor

@jchorl jchorl commented Oct 2, 2025

I was profiling some code and found the majority of time is spent in array_to_qualitystring. This is particularly impactful on huge files with tons of reads.

The culprit is the allocation, copying, and computation in python. This optimization should allow the logic to all be compiled down to C.

Bench results:

Before:

---------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------
Name (time in us)                           Min       Max     Mean  StdDev   Median     IQR   Outliers  OPS (Kops/s)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     75.7550  126.3460  78.9250  2.1202  78.4110  0.8720  1160;1541       12.6703   11453           1
----------------------------------------------------------------------------------------------------------------------------------------

After:

-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us)                          Min      Max    Mean  StdDev  Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     1.2620  14.7180  1.3264  0.1447  1.3130  0.0200  409;1397      753.9372   45268           1
-----------------------------------------------------------------------------------------------------------------------------------

@jmarshall
Copy link
Copy Markdown
Member

Thanks, this looks like a good approach.

Eventually I want to add entry points to HTSlib so that we can just call HTSlib's SIMD-optimised versions of these conversions, but this is a big win in the meantime.

@jchorl
Copy link
Copy Markdown
Contributor Author

jchorl commented Oct 14, 2025

@jmarshall what would be the process to get this merged/released?

@jchorl
Copy link
Copy Markdown
Contributor Author

jchorl commented Feb 5, 2026

@jmarshall what would be the process to get this merged/released?

@jmarshall I was just profiling a process and again found this to be a bottleneck. Any chance we can get this merged?

The data is contiguous so use [::1] to omit stride calculations;
use size_t rather than ssize_t to omit check for end-relative indexing.
@jmarshall jmarshall merged commit 48688fc into pysam-developers:master Apr 17, 2026
16 of 17 checks passed
@jchorl
Copy link
Copy Markdown
Contributor Author

jchorl commented Apr 17, 2026

Thank you for pushing this through!

@jmarshall
Copy link
Copy Markdown
Member

Thanks for diving into memoryviews! Clearly we should see if there is other pysam code that would benefit from them as well.

The HTSlib approach I mentioned is now samtools/htslib#1974 but it will be a while before that lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants