Skip to content

Provide a way to prevent overshooting past IEND (fixes #531)#684

Open
ElliotSis wants to merge 3 commits into
image-rs:masterfrom
ElliotSis:boundary-preserving-mode
Open

Provide a way to prevent overshooting past IEND (fixes #531)#684
ElliotSis wants to merge 3 commits into
image-rs:masterfrom
ElliotSis:boundary-preserving-mode

Conversation

@ElliotSis
Copy link
Copy Markdown
Contributor

@ElliotSis ElliotSis commented May 5, 2026

Buffering issues addressed

  1. Overshoot (fixes Avoid reading beyond the IEND chunk #531): Standard BufReader can overshoot and read past the IEND chunk boundary. This fix is needed to support decoding concatenated images from a single stream (simulated in Skia's Codec_end test), where the stream must be left positioned exactly at the end of the first image's IEND chunk for subsequent decodes to work.
  2. Blocking reads from pipe-like inputs: When decoding PNGs from blocking, non-seekable inputs (like pipes), standard buffering (e.g., standard BufReader's 8KB refills) causes JNI/Skia layers to greedily request more bytes. Once the PNG data is fully consumed, the JNI loop calls read() again to satisfy the remaining buffer capacity, causing it to block indefinitely. This is highlighted by a timeout failure in AOSP CTS testDecodePngFromPipe test.

Seek vs limited reads approach

Although issue 1 (overshoot) could theoretically be resolved by seeking backward to the correct boundary after a read that's too wide, this approach does not seem adequate for issue 2 because pipes are not genuinely seekable.

The AOSP CTS test deadlock occurs because the main test thread retains an open reference to the pipe's writeFd even after the writeThread closes the FileOutputStream, preventing the pipe from naturally closing and returning EOF. Theoretically, we could fix the test by closing writeFd in the main thread, but that would only hide a deeper production issue: if a standard decoder eagerly over-reads on an open, blocking stream (where the writer remains alive), the read blocks the entire decode thread indefinitely even after the image has been fully received.

To address both issues with a single solution, a limited buffering approach is implemented that limits reads at the dynamic state boundary, and the currently unused Seek bound is removed.

How it's achieved

  • Introduced a new public trait LimitBufRead that abstracts controlled buffering, replacing the generic BufRead bound on Decoder and Reader in a backwards-compatible manner. The trait is designed without sealed constraints to potentially be implementable by downstream callers (e.g., potentially a future SkStream integration).
  • Exposes LimitBufReader wrapper which restricts buffer refills to the bytes expected by the decoder state machine.
  • Added expected_read_limit to StreamingDecoder to calculate expected bytes dynamically.
  • Added regression tests in tests/limit_buf_read.rs verifying concatenated stream decodes and eager-buffered blocking source handling (via BlockOnLimit simulating blocking buffering wrappers).

Testing

To verify the correctness of the new LimitBufRead usage and LimitBufReader APIs, I have run all existing integration and unit tests by replacing the BufReaders with LimitBufReader, and wrapping the Cursors in LimitBufReader. All tests passed successfully.

Benchmarks

Additionally, I ran the decoder criterion benchmark suite across four distinct configurations to collect comparative metrics:

  1. Memory standard (Cursor<&[u8]>)
  2. Memory limit (Cursor<&[u8]> wrapped in a LimitBufReader)
  3. File standard (File wrapped in a BufReader)
  4. File limit (File wrapped in a LimitBufReader)

The raw benchmark metrics are compiled below:

Click to expand benchmark results table (On my machine)
Benchmark Image / Case Memory Standard Memory Limit Memory Overhead File Standard File Limit File Overhead
1. Real-world files (decode)
Fantasy_Digital_Painting.png 23.75 ms 23.99 ms +1.0% 24.51 ms 24.56 ms +0.2%
Exoplanet_Phase_Curve_(Diagram).png 29.20 ms 28.86 ms -1.1% 29.32 ms 29.48 ms +0.5%
Lohengrin_Illustrated.png 23.96 ms 24.90 ms +3.9% 26.96 ms 27.36 ms +1.5%
paletted-zune.png 11.06 ms 11.16 ms +1.0% 11.52 ms 11.68 ms +1.3%
kodim02.png 3.68 ms 3.72 ms +1.1% 3.86 ms 3.92 ms +1.5%
kodim07.png 4.70 ms 4.71 ms +0.4% 4.86 ms 4.91 ms +1.0%
kodim17.png 3.91 ms 3.92 ms +0.3% 4.06 ms 4.08 ms +0.6%
kodim23.png 3.58 ms 3.63 ms +1.4% 3.79 ms 3.90 ms +2.8%
Exoplanet_indexed_gimp.png 8.89 ms 8.93 ms +0.4% 9.28 ms 9.31 ms +0.4%
lorem_ipsum_screenshot.png 1.22 ms 1.23 ms +1.2% 1.29 ms 1.36 ms +5.8%
lorem_ipsum_oxipng.png 800.91 µs 800.60 µs 0.0% 833.58 µs 850.02 µs +2.0%
tango-icon-new-128.png 109.17 µs 109.57 µs +0.4% 121.41 µs 137.98 µs +13.7%
tango-icon-new-32.png 13.87 µs 14.16 µs +2.1% 23.34 µs 37.21 µs +59.4%
tango-icon-new-16.png 7.73 µs 8.04 µs +4.1% 17.15 µs 41.73 µs +143.4%
Transparency.png 87.30 µs 88.26 µs +1.1% 97.66 µs 125.38 µs +28.4%
2. 4K chunks (Fragmented)
8x8.png 972.78 ns 1.18 µs +21.3% 10.07 µs 20.09 µs +99.5%
128x128.png 13.02 µs 14.33 µs +10.1% 32.41 µs 94.49 µs +191.5%
2048x2048.png 3.99 ms 4.48 ms +12.3% 8.42 ms 21.92 ms +160.3%
12288x12288.png 195.50 ms 171.30 ms -12.4% 316.29 ms 815.29 ms +157.8%
3. 64K chunks (Stable)
128x128.png 11.27 µs 11.99 µs +6.4% 29.19 µs 43.61 µs +49.4%
2048x2048.png 3.34 ms 3.26 ms -2.3% 6.89 ms 7.89 ms +14.5%
12288x12288.png 133.93 ms 141.75 ms +5.8% 243.02 ms 275.64 ms +13.4%
4. 2G chunks (Massive-chunk)
2048x2048.png 2.79 ms 3.04 ms +8.8% 6.51 ms 7.25 ms +11.5%
12288x12288.png 131.59 ms 142.90 ms +8.6% 242.90 ms 251.84 ms +3.7%
5. Incremental row-by-row
128x128-4k-idat 13.81 µs 15.16 µs +9.8% 33.42 µs 92.27 µs +176.1%

Duplicating all the tests and benchmarks on CI felt redundant, and helps keep the PR diff clean and focused. Let me know if you think otherwise.

Conclusions and BufReader by-default rationale

  1. Negligible memory CPU overhead: In memory-to-memory decoding (Cursor), LimitBufReader introduces low overhead (+20% on the tiny fragmented 8x8.png, otherwise negligible on most files), proving that the limit tracking has little CPU cost.
  2. OS system calls overhead: Under File I/O, the overhead is dictated by context-switch costs. When chunk structures are highly fragmented (4K chunks), LimitBufReader must refill the buffer in smaller increments to respect boundaries safely, resulting in higher overhead (e.g., +160% to +190% regression in pathological cases).
  3. Low overhead on standard real-world files: For normal real-world photos and diagrams which use larger, continuous IDAT chunks, the File I/O overhead of LimitBufReader remains low (mostly between 1% and 15%).

This data supports keeping standard BufReader as the default decoder baseline for general performance, while exposing LimitBufReader as an opt-in wrapper with relatively low overhead for most files.

@ElliotSis ElliotSis force-pushed the boundary-preserving-mode branch 4 times, most recently from 703f7ca to a88c3a9 Compare May 6, 2026 03:20
## Buffering issues addressed
1. Overshoot (fixes image-rs#531): Standard `BufReader` can overshoot and read past the `IEND` chunk boundary. This fix is needed to support decoding concatenated images from a single stream (simulated in Skia's [Codec_end](https://source.chromium.org/chromium/chromium/src/+/main:third_party/skia/tests/CodecExactReadTest.cpp;l=50-94) test), where the stream must be left positioned exactly at the end of the first image's `IEND` chunk for subsequent decodes to work.
2. Blocking reads from pipe-like inputs: Android allows decoding PNGs from blocking, non-seekable inputs such as pipes, through additional buffering layers (like Skia’s `SkFrontBufferedStream`). If the decoder asks for a large refill when only a small amount of PNG bytes remain, the lower layer may perform an over-wide read and block waiting for more data. This is highlighted by a timeout failure in AOSP CTS [testDecodePngFromPipe](https://android.googlesource.com/platform/cts/+/master/tests/tests/graphics/src/android/graphics/cts/BitmapFactoryTest.java#859) test.

## Seek vs limited reads approach
Although issue 1 (overshoot) could theoretically be resolved by seeking backward to the correct boundary after a read that's too wide, this approach does not seem adequate for issue 2 because pipes are not genuinely seekable. To address both issues with a single solution, a limited buffering approach is implemented that limits reads at the dynamic state boundary. The decoder APIs retain their currently unused `Seek` bound, although some integrations might not fully support it.

## How it's achieved
- Introduced a new public trait `LimitBufRead` that abstracts controlled buffering, replacing the generic `BufRead` bound on `Decoder` and `Reader` (retaining the `Seek` bound) in a backwards-compatible manner. The trait is designed without sealed constraints to potentially be implementable by downstream callers (e.g., potentially a future `SkStream` integration).
- Exposes `LimitBufReader` wrapper which restricts buffer refills to the bytes expected by the decoder state machine.
- Added `expected_read_limit` to `StreamingDecoder` to calculate expected bytes dynamically.
- Added regression tests in `tests/limit_buf_read.rs` verifying concatenated stream decodes and eager-buffered blocking source handling (via `BlockOnLimit` simulating blocking buffering wrappers).

## Testing

To verify the correctness of the new `LimitBufRead` usage and `LimitBufReader` APIs, I have run all existing integration and unit tests by replacing the `BufReader`s with `LimitBufReader`, and wrapping the `Cursor`s in `LimitBufReader`. All tests passed successfully.

## Benchmarks

Additionally, I ran the decoder criterion benchmark suite across four distinct configurations to collect comparative metrics:

1. **Memory standard** (`Cursor<&[u8]>`)
2. **Memory limit** (`Cursor<&[u8]>` wrapped in a `LimitBufReader`)
3. **File standard** (`File` wrapped in a `BufReader`)
4. **File limit** (`File` wrapped in a `LimitBufReader`)

The raw benchmark metrics are compiled below:

<details>
<summary>Click to expand benchmark results table</summary>

(On my machine)

| Benchmark Image / Case | Memory Standard | Memory Limit | Memory Overhead | File Standard | File Limit | File Overhead |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **1. Real-world files (`decode`)** | | | | | | |
| `Fantasy_Digital_Painting.png` | `23.61 ms` | `23.88 ms` | **+1.2%** | `24.76 ms` | `24.72 ms` | **-0.2%** |
| `Exoplanet_Phase_Curve_(Diagram).png` | `28.64 ms` | `28.77 ms` | **+0.5%** | `28.97 ms` | `29.01 ms` | **+0.2%** |
| `Lohengrin_Illustrated.png` | `23.87 ms` | `24.69 ms` | **+3.4%** | `26.63 ms` | `26.90 ms` | **+1.0%** |
| `paletted-zune.png` | `10.99 ms` | `11.12 ms` | **+1.2%** | `11.51 ms` | `11.68 ms` | **+1.5%** |
| `kodim02.png` | `3.66 ms` | `3.73 ms` | **+1.4%** | `3.87 ms` | `3.89 ms` | **+0.7%** |
| `kodim07.png` | `4.64 ms` | `4.70 ms` | **+1.3%** | `4.83 ms` | `4.91 ms` | **+1.7%** |
| `kodim17.png` | `3.86 ms` | `3.89 ms` | **+0.9%** | `4.02 ms` | `4.08 ms` | **+1.3%** |
| `kodim23.png` | `3.55 ms` | `3.62 ms` | **+1.1%** | `3.75 ms` | `3.79 ms` | **+1.1%** |
| `Exoplanet_indexed_gimp.png` | `8.90 ms` | `8.97 ms` | **+0.8%** | `9.26 ms` | `9.28 ms` | **+0.2%** |
| `lorem_ipsum_screenshot.png` | `1.23 ms` | `1.23 ms` | **0.0%** | `1.30 ms` | `1.35 ms` | **+4.0%** |
| `lorem_ipsum_oxipng.png` | `795.66 µs` | `801.32 µs` | **+0.7%** | `832.11 µs` | `848.90 µs` | **+2.0%** |
| `tango-icon-new-128.png` | `108.77 µs` | `109.29 µs` | **+0.4%** | `120.16 µs` | `137.03 µs` | **+14.0%** |
| `tango-icon-new-32.png` | `14.08 µs` | `14.31 µs` | **+1.6%** | `23.21 µs` | `37.15 µs` | **+60.1%** |
| `tango-icon-new-16.png` | `7.75 µs` | `8.10 µs` | **+4.5%** | `16.86 µs` | `42.04 µs` | **+149.3%** |
| `Transparency.png` | `87.43 µs` | `88.17 µs` | **+0.9%** | `97.67 µs` | `125.70 µs` | **+28.7%** |
| **2. 4K chunks (Fragmented)** | | | | | | |
| `8x8.png` | `964.47 ns` | `1.22 µs` | **+27.0%** | `9.83 µs` | `20.10 µs` | **+104.4%** |
| `128x128.png` | `13.22 µs` | `15.04 µs` | **+13.7%** | `33.48 µs` | `94.32 µs` | **+181.7%** |
| `2048x2048.png` | `3.14 ms` | `3.57 ms` | **+13.7%** | `7.08 ms` | `20.98 ms` | **+196.3%** |
| `12288x12288.png` | `177.81 ms` | `191.36 ms` | **+7.6%** | `308.89 ms` | `772.36 ms` | **+150.0%** |
| **3. 64K chunks (Stable)** | | | | | | |
| `128x128.png` | `11.30 µs` | `12.20 µs` | **+7.9%** | `29.20 µs` | `44.02 µs` | **+50.7%** |
| `2048x2048.png` | `3.55 ms` | `3.72 ms` | **+4.9%** | `7.90 ms` | `8.79 ms` | **+11.2%** |
| `12288x12288.png` | `143.38 ms` | `157.81 ms` | **+10.1%** | `249.59 ms` | `283.96 ms` | **+13.7%** |
| **4. 2G chunks (Massive-chunk)** | | | | | | |
| `2048x2048.png` | `3.44 ms` | `3.93 ms` | **+14.1%** | `7.63 ms` | `8.03 ms` | **+5.2%** |
| `12288x12288.png` | `141.82 ms` | `152.83 ms` | **+7.7%** | `255.86 ms` | `262.81 ms` | **+2.7%** |
| **5. Incremental row-by-row** | | | | | | |
| `128x128-4k-idat` | `12.59 µs` | `13.93 µs` | **+10.7%** | `32.71 µs` | `91.12 µs` | **+178.5%** |
</details>

Duplicating all the tests and benchmarks on CI felt redundant, and helps keep the PR diff clean and focused. Let me know if you think otherwise.

## Conclusions and BufReader by-default rationale

1. **Negligible memory CPU overhead**: In memory-to-memory decoding (`Cursor`), `LimitBufReader` introduces low overhead (+27% in the worst fragmented case), proving that the limit tracking has little CPU cost.
2. **OS system calls overhead**: Under File I/O, the overhead is dictated by context-switch costs. When chunk structures are highly fragmented (4K chunks), `LimitBufReader` must refill the buffer in smaller increments to respect boundaries safely, resulting in higher overhead (e.g., +160% to +200% regression in pathological cases).
3. **Low overhead on standard real-world files**: For normal real-world photos and diagrams which use larger, continuous `IDAT` chunks, the File I/O overhead of `LimitBufReader` remains low (mostly between 1.5% and 15%).

This data supports keeping standard `BufReader` as the default decoder baseline for general performance, while exposing `LimitBufReader` as an opt-in wrapper with relatively low overhead for most files.
@ElliotSis ElliotSis force-pushed the boundary-preserving-mode branch from a88c3a9 to 1e33fd5 Compare May 6, 2026 09:59
@Shnatsel
Copy link
Copy Markdown
Member

Shnatsel commented May 8, 2026

If we're looking at supporting reading directly from pipes, I think we should go ahead and drop the Seek bound, otherwise we'll just get runtime errors on non-seekable sources if we ever do make use of it in the future. Right now the Seek bound seems to be unused. Removing it would be a non-semver-breaking change as far as I can tell, since we're only relaxing the bounds on the inputs.

The original PR that introduced the Seek bound was #558. The motivation was the ability to delay extracting metadata and not wasting memory on reading it when it's not needed, specifically #604

@197g has just wrapped up and merged a rework of metadata API in image in image-rs/image#2672 so I'd like to hear from them about the image requirements.

@Shnatsel
Copy link
Copy Markdown
Member

Shnatsel commented May 8, 2026

@ElliotSis could you document the motivation for making a custom limited reader with custom bookkeeping instead of using std::io::Read::take()? Looking at the code, I can't help but wonder if this can be achieved with less complexity by leveraging the standard library.

@ElliotSis
Copy link
Copy Markdown
Contributor Author

If we're looking at supporting reading directly from pipes, I think we should go ahead and drop the Seek bound, otherwise we'll just get runtime errors on non-seekable sources if we ever do make use of it in the future. Right now the Seek bound seems to be unused. Removing it would be a non-semver-breaking change as far as I can tell, since we're only relaxing the bounds on the inputs.

Yeah, I’ve considered doing that but I didn’t have the context as to why the Seek bound was introduced in the first place. If the bound is necessary to support future functionality I wonder if we could handle that by specializing the implementation later instead?

Since removing it is a non-semver-breaking relaxation and safe to do, I went ahead and updated the PR to drop the Seek bound.

could you document the motivation for making a custom limited reader with custom bookkeeping instead of using std::io::Read::take()? Looking at the code, I can't help but wonder if this can be achieved with less complexity by leveraging the standard library.

To give some context, the requirements I was trying to satisfy were:

  1. Backwards compatibility: Avoid breaking existing BufRead callers, requiring major version bump, or introducing a new API to use the decoder
  2. Opt-in flexibility: As highlighted in the benchmark results, limiting reads (while still buffering chunk/image data) can cause up to a 200% performance regression in pathological cases (like highly fragmented PNGs), so we may want to make this opt-in
  3. Still buffer large chunks: The wrapper must still rely on buffering for large chunks to minimize syscalls (keeping the performance regression on real-world images to a negligible 1.5% to 15%)
  4. Buffering capacity flexibility: Give the caller the flexibility of choosing the buffer capacity, just like BufReader allows today
  5. Buffering implementation flexibility: Just like with BufRead, the caller should be able to implement their own custom buffering layers (e.g., to potentially avoid double-buffering)

Considering the above, I think introducing a new trait and blanket impl for existing BufRead would be necessary (if you generally agree with these requirements).

However, I think you’re right that the LimitBufReader itself does not have to implement manual bookkeeping. It can likely delegate to Take by using a BufReader<Take<R>>. This works by allowing dynamic limit updates via buf_reader.get_mut().set_limit(...). I’ve updated the PR to do that too.

@ElliotSis ElliotSis force-pushed the boundary-preserving-mode branch from 496f0ae to 5735215 Compare May 8, 2026 14:09
@ElliotSis
Copy link
Copy Markdown
Contributor Author

FYI: I also ended up removing the Seek impl from LimitBufReader entirely. Take implements Seek starting Rust 1.80, but the toolchain used here is 1.73. Given we’ve just removed the Seek bound from the API, I think that’s reasonable.

@ElliotSis ElliotSis force-pushed the boundary-preserving-mode branch from 5735215 to f058761 Compare May 8, 2026 20:17
…port pipes

- Refactored `LimitBufReader` to be a newtype wrapper around `std::io::BufReader<std::io::Take<R>>`, completely eliminating custom buffering bookkeeping while preserving correctness.
- Dropped the `Seek` bound from `Decoder`, `Reader`, and `ReadDecoder` since it was unused by the decoding logic. Relaxing this bound enables natively decoding from non-seekable sources like pipes without compilation or runtime blockers.
- Cleaned up tests by removing unnecessary `Seek` bounds and mock `Seek` implementations.
@ElliotSis ElliotSis force-pushed the boundary-preserving-mode branch from f058761 to bacbe88 Compare May 8, 2026 20:18
@197g
Copy link
Copy Markdown
Member

197g commented May 8, 2026

Well, technically we may only need Seek-bounds on some methods for going back to the first IDAT frame. So if that needs to be dropped from other parts to allow for an easier path

@ElliotSis
Copy link
Copy Markdown
Contributor Author

@197g the PR currently drops the Seek bound from the public interface, I assume this is a comment about future changes when/if we actually introduce Seek-based methods? Or did you have a specific method in mind that should keep the bound in this PR?

Thanks both for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid reading beyond the IEND chunk

3 participants