Skip to content

phax/ph-pdf-layout

Repository files navigation

ph-pdf-layout

Sonatype Central javadoc

If this project saved you some time or made your day a little easier, a star would mean a lot — it helps others find it too.

Java library for creating fluid page layouts with Apache PDFBox 3.x.

Please check the examples files to see what kind of PDFs may be created. Please see the unit tests on how to create and use the different elements.

System requirements:

  • At least Java 17 - newer versions should work as well
  • GitHub actions test with all LTS version (currently 17, 21 and 24)

The basic elements provided are:

  • PageLayoutPDF - the entry class, having a list of page sets
  • PLPageSet - a set of pages that share the same size and orientation and contain a set of elements. The assignments to pages happens dynamically.
  • PL elements - basic or complex layout elements ("PL" is short for "PDF Layout")
    • Basic (inline) elements are
      • plain text in class PLText (Unicode of course)
        • For custom Open Source fonts to be used see the https://github.com/phax/ph-fonts project
        • Note: the available characters heavily depend on the used font. So if you get a "?" character, try loading a different font
      • and image in classes PLImage and PLStreamImage (whatever ImageIO can load).
    • Basic (block) element is box (class PLBox)
    • Layout elements are
      • horizontal box or h-box in class PLHBox - like a row of a table
      • vertical box or v-box in class PLVBox - like a column of a table
      • spacer-x in class PLSpacerX - a horizontal spacer - just in case you need explicit distance to a certain element
      • spacer-y in class PLSpacerY - a vertical spacer - just in case you need explicit distance to a certain element
      • page break in class PLPageBreak - an explicit page break that starts a new page
    • The most complex element is a table, which consists of a number of "h-boxes" (rows) which itself consist of a number of "v-boxes" (columns) plus comes with repeating headlines etc.
      • See classes PLTable, PLTableRow and PLTableCell for details
    • Elements can have the following properties - if you know CSS you should be familiar with it:
      • "min-size" - the minimum element size
      • "max-size" - the maximum element size
      • "margin" - a transparent outer border (outside of the border)
      • "border" - a visible border with different styles (between padding and margin)
      • "padding" - a transparent inner border (inside of the border)
      • "fill-color" - the background or fill color of an element

A set of example files as created from the unit test can be found in folder example-files. The source code for these examples is https://github.com/phax/ph-pdf-layout/tree/master/ph-pdf-layout/src/test/java/com/helger/pdflayout

Rich text (markup) — module ph-pdf-layout-richtext

The optional sibling module ph-pdf-layout-richtext adds multi-style runs inside a single paragraph — i.e. a single text element can carry mixed bold/italic, per-segment colors, in-line hyperlinks, anchors, underlines, sub/superscript and background highlights. The base PLText is single-style by design; rich text fills that gap.

Huge credit where it's due

The grammar shape (__underline__, {color:#rrggbb}, {_}sub{_}, {^}sup{^}, {link:style[uri]}, {anchor:name}, the --/-+/-#/-! indentation prefixes, the backslash-escape rules, and the parameterised forms like __{0.25:1.5}__ and {_:0.5|0.2}sub{_}), the regex catalog, the multi-pass split-by-marker parsing strategy, and the open/close annotation toggle model are a port of Ralf Stuckert's pdfbox-layout (MIT license). Bold and italic were re-spelled to the Markdown / CommonMark form (**bold**, *italic* / _italic_) in v8.3.1, and CommonMark line-break semantics were adopted at the same time — see the changelog.

If you find this module useful, please go give Ralf's project a star — none of this would exist without his original work. The migration to ph-pdf-layout exists only because Ralf's library is pinned to PDFBox 1.x/2.x and ph-pdf-layout has already done the PDFBox 3.x work plus the broader element/layout/render lifecycle.

What's supported

Markup syntax (Markdown / CommonMark style for bold, italic and line breaks; the rest follows Ralf Stuckert's grammar):

Markup Effect
**bold** toggles bold (CommonMark)
*italic* or _italic_ toggles italic (CommonMark; either form works)
***bold-italic*** combined bold + italic
__text__ underlines text. Differs from CommonMark, which uses __ as an alternate bold marker — here __ is reserved for underline because there is no standard Markdown for it.
__{0.25:1.5}text__ underline with custom baseline offset and line weight
{_}sub{_} subscript (default 0.61× font, +0.15 baseline shift)
{^}sup{^} superscript (default 0.61× font, −0.4 baseline shift)
`{_:0.5 0.2}foo{_}`
{color:#rrggbb} switches the current colour to RGB (set, not toggle — reset with {color:#000000})
{color_cmyk:C,M,Y,K} switches the current colour to CMYK (percent values 0..100, floats OK — e.g. {color_cmyk:75,15,0,20}). The RGB marker is unchanged. Originally requested at ralfstuckert/pdfbox-layout#94.
{bg:#rrggbb}…{bg} fills a background rectangle behind the wrapped run (default tight extent — per-segment box that follows sub/superscript shifts)
{bg:tight:#rrggbb}…{bg} background with explicit tight extent — same as the default
{bg:line:#rrggbb}…{bg} background with line extent — uses the line's full slot so the highlight stays a single uniform rectangle across mixed sizes / sub-superscript, and is contiguous across wrapped lines
{link[uri]}…{link} wraps the inner text in an external hyperlink (default underline-decorated)
{link:none[uri]}…{link} hyperlink with no visual decoration
{link[#name]}…{link} internal link jumping to a named anchor declared elsewhere
{anchor:name}…{anchor} declares a named destination targetable by #name link URIs
\*, \**, \_, \__, \{, \\ backslash-escape any marker
bare \n / \r\n CommonMark soft break — rendered as a single space; word-wrap decides line breaks
\n (two-or-more trailing spaces) CommonMark hard line break
\<newline> (backslash immediately before the line ending) CommonMark hard line break
-+ item, -# item, -- item, -! (line-start) bullet item, numbered item, plain indent, end-indent block

How to use it

Add the Maven dependency:

<dependency>
  <groupId>com.helger</groupId>
  <artifactId>ph-pdf-layout-richtext</artifactId>
  <version>x.y.z</version>
</dependency>

Then build a rich-text element from a markup string:

final PLFontFamily aFontFamily = new PLFontFamily (PreloadFont.TIMES,
                                                   PreloadFont.TIMES_BOLD,
                                                   PreloadFont.TIMES_ITALIC,
                                                   PreloadFont.TIMES_BOLD_ITALIC);

final PLRichText aRichText = PLRichText.createFromMarkup (
    "Hello **world**, this is *important* and __underlined__. " +
        "Visit {link[https://example.com]}example.com{link} or " +
        "jump to {link[#summary]}the summary{link}.",
    aFontFamily, 11f, PLColor.BLACK);

aRichText.setHorzAlign (EHorzAlignment.JUSTIFY);

final PLPageSet aPS = new PLPageSet (PDRectangle.A4).setMargin (40, 60, 40, 60);
aPS.addElement (aRichText);

new PageLayoutPDF ().addPageSet (aPS).renderTo (new File ("rich.pdf"));

For block-level documents that mix prose paragraphs with bullet/numbered lists, use the higher-level helper:

final ICommonsList <IPLElement> aBlocks = PLRichTextBlocks.parseMarkup (
    "Some intro text.\n" +
    "-+ first bullet\n" +
    "-+ second bullet\n" +
    " -+ nested bullet\n" +
    "-!\n" +
    "Closing paragraph.",
    aFontFamily, 11f, PLColor.BLACK);

for (final IPLElement aBlock : aBlocks)
  aPS.addElement (aBlock);

If you prefer a programmatic API over markup, construct runs directly:

final FontSpec aRegular = new FontSpec (PreloadFont.TIMES,      11, PLColor.BLACK);
final FontSpec aBold    = new FontSpec (PreloadFont.TIMES_BOLD, 11, PLColor.BLACK);

final ICommonsList <PLRichTextRun> aRuns = new CommonsArrayList <> ();
aRuns.add (new PLRichTextRun ("Hello ", aRegular));
aRuns.add (new PLRichTextRun ("world",  aBold));
aRuns.add (new PLRichTextRun ("!",      aRegular));
aPS.addElement (new PLRichText (aRuns));

The rendered example PDFs are in example-files/richtext; the source-side test code that produced them is at ph-pdf-layout-richtext/src/test.

Similar libraries

Similar libraries for PDF rendering, but totally unrelated to this project:

Maven usage

Add the following to your pom.xml to use this artifact, replacing x.y.z with the real version number:

<dependency>
  <groupId>com.helger</groupId>
  <artifactId>ph-pdf-layout4</artifactId>
  <version>x.y.z</version>
</dependency>

Between v4.0.0 and v5.2.2 the artifactId was called ph-pdf-layout4

News and Noteworthy

v8.3.2 - 2026-06-07

  • ph-pdf-layout-richtext: added inline background-color markup{bg:#rrggbb}…{bg} fills a rectangle behind the wrapped run. Two vertical-extent modes selected by an optional qualifier:
    • {bg:#rrggbb} / {bg:tight:#rrggbb}tight (default): per-segment box sized to the segment's own font, anchored on its (possibly sub/superscript-shifted) baseline; the highlight follows the visible glyphs.
    • {bg:line:#rrggbb}line-height: box sized to the line's full slot using the unshifted baseline, so the highlight stays a single uniform rectangle across sub/superscript and is contiguous across wrapped lines.
    • Backgrounds are painted before the glyphs and the existing post-text decorations (underline, hyperlink, anchor); they compose with bold / italic / colour / underline / sub-superscript inside the span.
    • Internally: new PLBackgroundAnnotation (an IPLRichTextAnnotation) plus EPLBackgroundExtent enum (TIGHT, LINE_HEIGHT); new BackgroundFactory in PLMarkupCharacters; pre-text fill pass added in PLRichText's render loop.
  • Added the missing MIT license from the original pdfbox-layout in the richtext module

v8.3.1 - 2026-05-30

  • ph-pdf-layout-richtext markup syntax is now Markdown / CommonMark style — breaking change vs v8.3.0:
    • Bold is **bold** (was *bold*).
    • Italic is *italic* or _italic_ — both Markdown forms are accepted (was _italic_ only).
    • ***foo*** toggles bold and italic together.
    • __underline__ is unchanged, but differs from CommonMark: CommonMark uses __ as alternate bold; here __ stays reserved for underline because there is no standard Markdown for underline.
    • Line breaks now follow CommonMark inline semantics: a bare \n / \r\n is a soft break (rendered as a single space and word-wrapped). A hard line break requires either two-or-more trailing spaces before the line ending, or a backslash immediately before the line ending.
    • Backslash escape now applies to the new markers too: \*, \**, \_, \__.
    • Internally: new IPLMarkupToken.SoftBreak; new HARD_BREAK factory; new ITALIC_UNDERSCORE factory in PLMarkupCharacters; BOLD regex gained a (?!\*) lookahead so it cooperates with italic in ***…***. All affected unit tests and pixel-diff baselines have been updated; the rendered output of the existing markup tests is byte-identical to v8.3.0 once the new escape-demo line is taken into account.

v8.3.0 - 2026-05-29

  • New optional module ph-pdf-layout-richtext providing multi-style runs in a single paragraph, plus inline links, anchors, underline, sub/superscript and a Markdown-style markup parser. **Markup grammar, regex catalog and split algorithm are a port of Ralf Stuckert's pdfbox-layout (MIT) — see the "Rich text" section above for full credit. ** Includes a new PLRichText block-level element and a PLRichTextBlocks helper for paragraph/bullet/numbered-list block sequences.

v8.2.0 - 2026-05-28

  • Added split-fragment tracking on IPLObject: getOriginalID(), isSplitFragment() and isFirstFragment(). When an element is split across pages (e.g. a long PLVBox, PLText or PLTable), the resulting fragments still carry the unsplit ancestor's ID via getOriginalID(), and isFirstFragment() identifies the top-most slice. This is the foundation for upcoming table-of-contents and bookmark support, where callers need to know on which page a user-facing element first appeared.
  • Added per-element render callback: new IPLRenderListener, settable via PLPageSet.setRenderListener(...), fires after every element renders (including nested children) with the full PageRenderContext.
  • PageRenderContext now exposes page indices (getPageSetIndex, getPageSetPageIndex, getTotalPageIndex and their counts), matching the names already used by PagePreRenderContext.
  • New PLRenderedElementCollector helper consumes the listener and produces an ordered map from each element's original ID to its first-appearance page and coordinates - the natural input for building a TOC or PDF outline.
  • Added PLOutlineBuilder for generating PDF outlines (bookmarks). Declare a nested entry tree of (title, elementID) pairs, register the builder as both render listener and document customizer, and the resulting PDF opens with a clickable bookmark tree pointing at the rendered positions. Supports grouping nodes without destinations and tolerates missing element references.
  • Added named-anchor support: new IPLHasAnchorName interface on every renderable element exposes setAnchorName(String). When set, the rendering pipeline registers a PDF named destination at the element's top-left, addressable via URL fragments (mypdf.pdf#section1), bookmarks, or internal links. Duplicate anchor names are logged as a warning and the first registration wins.
  • New PLAnchor element - a zero-size marker for inserting an anchor at an arbitrary position in the document flow.
  • Added internal-link support: new PLInternalLink wraps another element and creates a clickable PDF link annotation that jumps to a named anchor via PDActionGoTo + PDNamedDestination. Forward references work because resolution happens at PDF read time. Extracted common link-annotation logic (rectangle, border, color) from AbstractPLExternalLink into a shared AbstractPLLinkBase; external-link behaviour is preserved byte-for-byte.

v8.1.2 - 2026-05-16

  • Security: dropped Serializable from PreloadFont (and its inner EncodedCodePoint) to remove a CWE-502 deserialization gadget surface. Note: the instanceof IFontResource guard previously present in readObject ran only after the foreign class had already been instantiated.
  • Security: AbstractPLExternalLink.setURI(String) now validates the URI scheme against an allowlist (http, https, mailto, tel, ftp, ftps by default). Dangerous schemes such as javascript:, file:, data:, vbscript: are rejected with IllegalArgumentException. The active set can be replaced via setAllowedURISchemes(ICommonsSet); supplying an empty set restores the previous unfiltered behavior
  • LoadedFont is now @ThreadSafe: the previously misleading @Immutable label has been replaced and the per-codepoint caches are guarded by a SimpleReadWriteLock with a fast/slow path pattern.
  • PLStreamImage now caps the number of bytes read from the input stream (defaults to 64 MiB). Override globally via PLStreamImage.setDefaultMaxImageSize(int) or per instance via setMaxImageSize(int). Reading aborts with IOException once the cap is exceeded.
  • Documentation: clarified the trust boundary of IPDDocumentCustomizer and IXMPMetadataCustomizer - both grant full mutation access to the document/metadata and must only be supplied from trusted code.

v8.1.1 - 2026-05-16

  • Updated to PDFBox 3.0.7
  • Removed OSGI bundling
  • Fixed thread-safety issue with preload Standard 14 fonts. See #65 - thx @andreasa-winenet

v8.1.0 - 2025-11-16

  • Updated to PDFBox 3.0.6
  • Updated to ph-commons 12.1.0
  • Using JSpecify annotations

v8.0.0 - 2025-08-24

  • Requires Java 17 as the minimum version
  • Updated to ph-commons 12.0.0

v7.4.2 - 2025-07-08

  • Added methods PageLayoutPDFD.setCustom(Leading|Trailing|Total)PageCount to allow for page count customization. See #58 - thx @xxs3315

v7.4.1 - 2025-06-24

  • Added support for rounded edges on PLBox and PLText. See #48 - thx @marco-de-angelis

v7.4.0 - 2025-06-17

  • Added ELineJoinStyle and ELineCapStyle enums
  • Fixed a possible improper table split if only the head lines would fit on the first page on splitting. See #49 - thx @jeremykwiatkowski
    • This required some heavy reworking of the splitting APIs which required a minor version update
  • The data type of the PDF creation date and time metadata was changed to ZonedDateTime

v7.3.7 - 2025-05-06

  • Updated to PDFBox 3.0.5
  • Added a new method PreloadFont.setUseFontLineHeightFromHHEA() to use the line height from the font instead of the default bounding box. This is especially helpful for the "Noto" and "Kurinto" fonts, which have a very large bounding box. Fixes #46 - thx @mrrao

v7.3.6 - 2025-03-31

  • Updated to PDFBox 3.0.4
  • Made PDF/A property "document language" customizable in PageLayoutPDF
  • Added new interface IXMPMetadataCustomizer to be able to customize the XMPMetadata. See pr #44 - thx @stmuecke

v7.3.5 - 2024-10-09

  • Updated to PDFBox 3.0.3
  • Updated tests to ph-fonts 5.0.3
  • Added a setter of PLFontSpec into AbstractPLText

v7.3.4 - 2024-05-30

  • Fixed an issue with BLOCK horizontal alignment in case of a page break. See issue #36

v7.3.3 - 2024-05-29

  • Updated to PDFBox 3.0.2
  • Added new horizontal alignment type BLOCK as a mixture of LEFT and JUSTIFY. See issue #36 - thx @istvangaal

v7.3.2 - 2024-03-27

  • Updated to ph-commons 11.1.5
  • Created Java 21 compatibility
  • Extracted a parent POM and prepared a submodule structure
  • Using https://github.com/red6/pdfcompare to test created PDFs against the stored ones. See issue #35 - thx @Lolf1010

v7.3.1 - 2024-01-24

  • Updated to PDFBox 3.0.1
  • Added support for clipping content of block elements via .setClipContent(boolean). See issue #34 - thx @terrason

v7.3.0 - 2023-10-30

  • Completely removed usage of java.awt.Color. Backwards incompatible change. This finalizes issue #32.

v7.2.0 - 2023-10-30

  • Added new class PLColor and deprecated all methods using java.awt.Color. Backwards incompatible change. See issue #32 - thx @AndroidDeveloperLB

v7.1.0 - 2023-08-20

  • Updated to PDFBox 3.0.0

v7.0.1 - 2023-07-31

  • Updated to PDFBox 2.0.29
  • Updated to ph-commons 11.1.0
  • Improved API to create an empty cell. See issue #29 - thx @fheldt

v7.0.0 - 2022-09-14

  • Using Java 11 as the baseline
  • Updated to ph-commons 11

v6.0.3 - 2022-08-17

  • Added support for PDF/A creation in PageLayoutPDF - thx @robertholfeld for publishing this in his branch

v6.0.2 - 2022-05-25

  • Extended PDPageContentStreamWithCache API. See issue #23 - thx @schneidh

v6.0.1 - 2022-05-07

  • Updated to jbig2-imageio 3.0.4
  • Updated to PDFBox 2.0.26
  • Added support for creating external links via new class PLExternalLink. See issue #14 - thx @rgarg-atheer and @martin19

v6.0.0 - 2022-01-05

  • Changed artifactId from ph-pdf-layout4 to ph-pdf-layout
  • Changed Java namespaces com.helger.pdflayout4.* to com.helger.pdflayout.*

v5.2.2 - 2021-12-29

  • Updated to PDFBox 2.0.25
  • Extended IPLHasMargin with (set|add)Margin(X|Y) to set or add to vertical or horizontal margin at once
  • Extended IPLHasPadding with (set|add)Padding(X|Y) to set or add to vertical or horizontal padding at once

v5.2.1 - 2021-03-22

  • Updated to PDFBox 2.0.23

v5.2.0 - 2021-03-21

  • Updated to ph-commons 10
  • Updated to PDFBox 2.0.22
  • Add syntactic sugar method PLTableCell.createEmptyCell()

v5.1.2 - 2020-06-15

  • Updated to PDFBox 2.0.20
  • Allow different page content height if the first page header and footer have different heights. See issue #14.

v5.1.1 - 2020-05-29

  • Updated to ph-fonts 4.1.0 (changed Maven groupId)

v5.1.0 - 2020-03-29

  • Updated to PDFBox 2.0.19
  • Updated to jbig2-imageio 3.0.3
  • Fixed line spacing on page break (see issue #10)
  • Allow table columns with different WidthSpec types, as long as colspan is 1.
  • Added another generic parameter to IPLSplittableObject
  • Made PageLayoutPDF API more chainable
  • New class PLBulletPointList can be used to create regular bullet point lists (see issue #9)
  • Updated to ph-commons 9.4.0

v5.0.9 - 2019-04-29

  • Updated to PDFBox 2.0.15 (security update)

v5.0.8 - 2018-11-22

  • Updated to PDFBox 2.0.12
  • Updated to ph-commons 9.2.0

v5.0.7 - 2018-07-10

  • Updated to PDFBox 2.0.11

v5.0.6 - 2018-06-21

  • Updated to org.apache.pdfbox:jbig2-imageio for JPEG handling
  • Fixed OSGI ServiceProvider configuration
  • Updated to ph-commons 9.1.2

v5.0.5 - 2018-04-16

  • Something went wrong when publishing to Maven Central - next try

v5.0.4 - 2018-04-16

  • Do not justify the last line of multiline text

v5.0.3 - 2018-04-16

  • Added ph-collection dependency for issue #4
  • Updated to PDFBox 2.0.9
  • Added possibility to justify text

v5.0.2 - 2018-02-21

  • Added possibility to use special page header and footer on the first page of a PLPageSet

v5.0.1 - 2018-02-12

  • Added image type support (issue #3)
  • Updated to BouncyCastle 1.59
  • Added new table grid types

v5.0.0 - 2017-11-09

  • Updated to PDFBox 2.0.8
  • Updated to ph-commons 9.0.0
  • Updated to BouncyCastle 1.58

v4.0.1 - 2017-05-16

  • Updated to PDFBox 2.0.6
  • Slight API extensions

v4.0.0 - 2017-02-22

  • No change compared to 4.0.0 Beta 5

v4.0.0 Beta 5 - 2017-01-19

  • Improved XML serialization slightly
  • Fixed an NPE with PLBox without a contained element

v4.0.0 Beta 4 - 2017-01-10

  • Block elements use full width now by default
  • Improved placeholder handling in text preparation

v4.0.0 Beta 3 - 2017-01-10

  • Binds to ph-commons 8.6.x
  • Fixed a height problem with vertical split HBoxes
  • Simplified class hierarchy for table rows
  • Made font fallback code point more flexible
  • Changed font rendering to use descent from font instead of heuristics
  • Fixed different border color rendering
  • Made debug rendering customizable
  • Added support for line spacing in PLText

v4.0.0 Beta 2 - 2017-01-03

  • The Maven artifact name was changed to 'ph-pdf-layout4' so that it can be used side-by-side with version 3.
  • The global package name was changed from com.helger.pdflayout to com.helger.pdflayout4 so that both 3.x and 4.x can run side-by-side
  • This is major rewrite to be closer to the CSS box model
  • VBox and HBox have no more layout information assigned to them
  • Added a new element "Box" that allows for easy alignment etc.
  • Separation between renderable objects, block element (box) and inline elements (text and image)
  • New class design for tables, so that each table cell is automatically represented by a box, each table row is a separate object
  • Added a simple grid system for tables to build the default grids easily
  • Added new "auto" width/height for columns/rows
  • Updated to PDFBox 2.0.4

v3.5.3 - 2017-11-07

  • Binds to ph-commons 9.0.0
  • Updated to PDFBox 2.0.8
  • Updated to BouncyCastle 1.58

v3.5.2 - 2017-01-10

  • Binds to ph-commons 8.6.x
  • Updated to PDFBox 2.0.4

v3.5.1 - 2016-10-07

  • Fixed a rendering flaw with borders

v3.5.0 - 2016-09-21

  • Changed internal class hierarchy to prepare for future changes
  • Changed package assignments for better grouping

v3.0.3 - 2016-09-19

  • Updated to PDFBox 2.0.3
  • Performance improvement by using optimized writer
  • Included optional MicroTypeConverters

v3.0.2 - 2016-09-06

  • API extensions for the classes in the "spec" package

v3.0.1 - never released because of issues with the release script :(

v3.0.0 - 2016-08-21

  • Requires JDK 8
  • Still on PDFBox 2.0.0 because of problems with 2.0.1 and 2.0.2

Note: Versions starting with 2.1.0 uses PDFBox 2.x, previous versions (up to and including 2.0.0) use PDFBox 1.8.x.

Note: version 4.0.0 has troubles building with JDK 1.8.0_92 - updating to 1.8.0_112 or later should work.


My personal Coding Styleguide | It is appreciated if you star the GitHub project if you like it.

About

Java library for creating fluid page layouts with Apache PDFBox. Supporting multi-page tables, different page layouts etc.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors