LibrePDF · andreasrosdal · May 18, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/openpdf-renderer/README.md b/openpdf-renderer/README.md
@@ -95,22 +95,98 @@ in-tree legacy parser (`PDFFile`, `PDFPage`, `PDFParser`,
 | Content-stream operator listing (`getContentOperators`) | `openpdf-core` (`PdfContentParser`) |
 | Page rasterization (`renderPage`) | `openpdf-core` (`PdfContentParser`) → Java2D via `OpenPdfCorePageRenderer` |
 
-The Java2D rasterizer (`OpenPdfCorePageRenderer`) supports the standard subset
-of PDF operators needed for typical text + simple-vector PDFs: graphics state
-(`q`/`Q`/`cm`), path construction (`m`/`l`/`c`/`v`/`y`/`re`/`h`), path
-painting (`S`/`s`/`f`/`f*`/`B`/`B*`/`b`/`b*`/`n`), line width (`w`),
-DeviceGray/DeviceRGB colors (`g`/`G`/`rg`/`RG`), and the full text-object
-machinery (`BT`/`ET`/`Tf`/`Tc`/`Tw`/`TL`/`Tz`/`Td`/`TD`/`Tm`/`T*`/`Tj`/`TJ`/`'`/`"`).
-Operators outside this subset (extended graphics state `gs`, CMYK / pattern /
-shading colors, XObject `Do`, inline images, marked content, clipping
-`W`/`W*`, ...) are parsed but currently ignored — pages that rely heavily on
-them may render with missing content. Adding more operators is a localized
-change in `OpenPdfCorePageRenderer`.
-
-For pages that exercise features outside the supported subset and need
+The Java2D rasterizer (`OpenPdfCorePageRenderer`) supports a broad subset of
+PDF content-stream operators &mdash; sufficient for typical text + vector PDFs:
+
+| Category | Operators |
+|---|---|
+| Graphics state | `q`, `Q`, `cm`, `gs` (alpha `CA`/`ca`, line styling `LW`/`ML`/`LC`/`LJ`/`D`, stroke-adjust `SA`) |
+| Line style | `w` (including the PDF §8.4.3.2 zero-width hairline rule), `J`, `j`, `M`, `d`, `i` |
+| Path construction | `m`, `l`, `c`, `v`, `y`, `re`, `h` |
+| Path painting | `S`, `s`, `f`, `F`, `f*`, `B`, `B*`, `b`, `b*`, `n` |
+| Clipping | `W`, `W*` |
+| Colors (DeviceGray / DeviceRGB / DeviceCMYK) | `g`, `G`, `rg`, `RG`, `k`, `K`, `cs`, `CS`, `sc`, `SC`, `scn`, `SCN` |
+| Text state | `BT`, `ET`, `Tf`, `Tc`, `Tw`, `TL`, `Tz`, `Td`, `TD`, `Tm`, `T*`, `Ts` |
+| Text showing | `Tj`, `TJ`, `'`, `"` |
+| XObjects | `Do` (see below) |
+
+| Marked content / compatibility (no-op) | `BMC`, `BDC`, `EMC`, `MP`, `DP`, `BX`, `EX` |
+
+XObject coverage:
+
+- Form XObjects render recursively, applying their own `/Matrix` and `/BBox`
+  under the current CTM with full state save/restore.
+- Image XObjects decode via `ImageIO` for JPEG (`DCTDecode`) and JPEG 2000
+  (`JPXDecode`, where the runtime supports it), and via a manual raster
+  builder for uncompressed / Flate-decoded 8-bit DeviceGray, DeviceRGB and
+  DeviceCMYK streams (CMYK approximated to sRGB on the fly). 8-bit Indexed
+  color images are expanded through their palette into the base color space
+  (DeviceGray / DeviceRGB / DeviceCMYK).
+
+Text rendering: for each `Tf`-selected font, the renderer pulls the
+embedded font program (`FontFile2`/`FontFile3`/`FontFile`) out of the
+FontDescriptor and loads it via `java.awt.Font.createFont`. Embedded
+TrueType fonts therefore render with their own glyph shapes. When a
+font isn't embedded (or the embedded program can't be loaded), the
+renderer falls back to a generic Java2D family picked by PostScript-name
+heuristics &mdash; glyph widths from the PDF font are still respected,
+but shapes are only approximate.
+
+Tables: `OpenPdfCorePageRenderer` honors the PDF §8.4.3.2 zero-width hairline
+rule (`w 0` strokes are rendered as one device pixel rather than collapsing to
+nothing under the page CTM), reads dash patterns and the stroke-adjust flag
+from ExtGState (`D`, `SA`), and enables Java2D `KEY_STROKE_CONTROL =
+VALUE_STROKE_NORMALIZE` so that 0.5pt table borders snap to integer device
+pixels instead of smearing across two rows of antialiased pixels. Full
+`PdfPTable` output (cell-background fills, colored borders, header rows and
+cell text) is exercised by the renderer's test suite.
+
+Inline images (`BI`/`ID`/`EI`) are now rendered: a preprocess pass promotes
+each inline image into a synthetic Image XObject (with JPEG framing detected
+by the JPEG `FFD9` end-of-image marker when the filter is `DCTDecode` to
+sidestep the ambiguous whitespace-bounded `EI` heuristic), then the rest of
+the renderer treats it like any other XObject. Uncompressed, Flate-decoded
+and JPEG inline images are supported. Shading (`sh`), pattern / shading
+colors and type 3 font glyph operators are silently ignored. Pages that
+rely heavily on those features may render with missing content. Adding more
+operators is a localized change in `OpenPdfCorePageRenderer`.
+
+For pages that need features outside this supported subset and you want
 pixel-perfect output today, the deprecated `PDFFile` / `PDFPage.getImage(...)`
 API still works.
 
+### Honest limitations &amp; roadmap
+
+`OpenPdfCoreRenderer` is intentionally a focused, lightweight renderer.
+The legacy in-tree parser still wins on real-world PDFs that exercise:
+
+- **Embedded Type 1 / CFF / OpenType-CFF fonts.** `Font.createFont` only
+  loads TrueType reliably; `FontFile3` (CFF/OpenType) is attempted but
+  often falls back to the name-heuristic path. Subsetted TrueType fonts
+  with non-Unicode CMaps draw `.notdef` for codes their `cmap` table
+  doesn't list. Real fix: drive glyph dispatch from the PDF's encoding /
+  CMap to glyph IDs and render via `Font#createGlyphVector(int[])`.
+- **Type 3 fonts.** Glyph operators (`d0`, `d1` + nested content streams)
+  are ignored.
+- **Color management.** CMYK uses the textbook `(1-c)(1-k)` approximation;
+  no ICC profile, no UCR/BG. Anything color-managed will look noticeably
+  wrong. Real fix: respect the ICCBased profile via `java.awt.color.ICC_Profile`.
+- **Pattern and shading paint** (`pattern`, `sh`). Ignored.
+- **Soft masks (`SMask`) and transparency groups.** Ignored; image alpha
+  honors `ca` only, not per-pixel masks.
+- **Separation / DeviceN color spaces** for images and paths. Ignored; falls
+  back to filling with the color-space default. (Indexed images are now
+  supported.)
+- **Sub-byte bit depths** (1/2/4-bit indexed images, 1-bit image masks).
+  Currently only 8-bit indices are decoded.
+- **Encrypted PDFs.** Out of scope for this module (see "Encryption: removed"
+  below).
+
+These gaps are why the legacy `PDFFile` / `PDFPage` path remains the
+production renderer for the time being. Each item above is a fairly
+localized addition to `OpenPdfCorePageRenderer`; the order above is
+roughly highest-impact first.
+
 ## Quick Start
 
 ### Basic PDF to Image Conversion
@@ -155,6 +231,34 @@ try (OpenPdfCoreRenderer renderer = new OpenPdfCoreRenderer(new File("document.p
 }
 ```
 
+### Rendering directly to a `Graphics2D`
+
+Avoid the intermediate `BufferedImage` when the caller already has a target
+surface (Swing component, printer, SVG-backed graphics, ...):
+
+```java
+try (OpenPdfCoreRenderer renderer = new OpenPdfCoreRenderer(new File("document.pdf"))) {
+    BufferedImage out = new BufferedImage(800, 1000, BufferedImage.TYPE_INT_ARGB);
+    Graphics2D g2 = out.createGraphics();
+    try {
+        renderer.renderPage(1, g2, 800, 1000); // fit page to the box, preserve aspect
+    } finally {
+        g2.dispose();
+    }
+}
+```
+
+### Batch rendering
+
+```java
+try (OpenPdfCoreRenderer renderer = new OpenPdfCoreRenderer(new File("document.pdf"))) {
+    List<BufferedImage> pages = renderer.renderAllPages(150f);
+    for (int i = 0; i < pages.size(); i++) {
+        ImageIO.write(pages.get(i), "png", new File("page-" + (i + 1) + ".png"));
+    }
+}
+```
+
 ## Using the legacy `PDFFile` / `PDFPage` API (deprecated)
 
 The pre-3.0.5 entry point still works but is now `@Deprecated`. New code should