Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ under the License.
////
:documentationPath: /pipeline/transforms/
:language: en_US
:description: The Microsoft Excel Writer transform writes incoming rows from Hop out to an MS Excel file. It supports both the .xls and .xlsx file formats.
:description: The Excel Writer transform writes incoming rows to Microsoft Excel (.xls, .xlsx) or OpenDocument Spreadsheet (.ods) files.

= image:transforms/icons/excelwriter.svg[Excel writer transform Icon, role="image-doc-icon"] Excel writer

Expand All @@ -25,9 +25,17 @@ under the License.
|
== Description

The Microsoft Excel Writer transform writes incoming rows from Hop out to an MS Excel file. It supports both the .xls and .xlsx file formats.
The Excel Writer transform writes incoming rows from Hop to spreadsheet files.
It supports three output formats:

The .xls files use a binary format which is better suited for simple content, while the .xlsx files use the Open XML format which works well with templates since it can better preserve charts and miscellaneous objects.
* `.xls` — legacy Excel binary format (Apache POI)
* `.xlsx` — modern Excel Open XML format (Apache POI)
* `.ods` — OpenDocument Spreadsheet format (LibreOffice Calc, Apache OpenOffice; written via ODFDOM)

The `.xls` and `.xlsx` backends share the same POI code path.
The `.ods` backend is a separate implementation that mirrors the same transform options where the ODF format allows it.

The `.xls` files use a binary format which is better suited for simple content, while the `.xlsx` files use the Open XML format which works well with templates since it can better preserve charts and miscellaneous objects.

|
== Supported Engines
Expand All @@ -40,6 +48,16 @@ The .xls files use a binary format which is better suited for simple content, wh
!===
|===

== Output formats

[options="header"]
|===
|Format|Extension|Backend|Notes
|Excel 97–2003|`.xls`|POI|Sheet password protection supported
|Excel 2007+|`.xlsx`|POI|Streaming mode for large files; no sheet password protection
|OpenDocument Spreadsheet|`.ods`|ODFDOM|LibreOffice Calc compatible; see <<ods-limitations>> below
|===

== Options

=== File & sheet tab
Expand All @@ -49,11 +67,11 @@ The .xls files use a binary format which is better suited for simple content, wh
[options="header"]
|===
|Option|Description
|Stream XLSX data|Check this option when writing large XLSX files.
|Extension|Choose `xls`, `xlsx`, or `ods`. This determines the output file format.
|Stream XLSX data|Check this option when writing large XLSX files (not available for `.xls` or `.ods`).
It uses internally a streaming API and is able to write large files without any memory restrictions (of course not exceeding Excel's limit of 1,048,575 rows and 16,384 columns).
|Create parent folder|Enable to create the parent folder
|If output file exists|Check this option when writing large XLSX files.
It uses internally a streaming API and is able to write large files without any memory restrictions (of course not exceeding Excel's limit of 1,048,575 rows and 16,384 columns).
|If output file exists|Choose to reuse an existing file or create a new one.
|Add filename(s) to result|Check to have the filename added to the result filenames
|Wait for first row before creating file|Checking this option makes the transform create the file only after it has seen a row.
If this is disabled the output file is always created, regardless of whether rows are actually written to the file.
Expand All @@ -65,17 +83,15 @@ If this is disabled the output file is always created, regardless of whether row
|===
|Option|Description
|Sheet Name|The sheet name the transform will write rows to.
|Make this the active sheet|If checked the Excel file will by default open on the above sheet when opened in MS Excel.
|Make this the active sheet|If checked the spreadsheet file will open on this sheet by default (in Excel, LibreOffice Calc, etc.).
|If sheet exists in output file|The output file already has this sheet (for example when using a template, or writing to existing files), you can choose to write to the existing sheet, or replace it.
|Protect Sheet|The XLS file format allows to protect an entire sheet from changes.
If checked you need to provide a password.
Excel will indicate that the sheet was protected by the user you provide here.
|Protect Sheet|Lock the sheet with an optional password. Supported for `.xls` and `.ods` output. The *protected by user* field applies to `.xls` only. Not supported for `.xlsx`.
|===

*Template section*

When creating new files (when existing files are replaced, or completely fresh files are created) you may choose to create a copy of an existing template file instead.
Please make sure that the template file is of the same type as the output file (both must be xls or xlsx respectively).
The template and output file must use the same extension (`.xls`, `.xlsx`, or `.ods`).

When creating new sheets, the transform may copy a sheet from the current document (the template or an otherwise existing file the transform is writing to).
A new sheet is created if the target sheet is not present, or the existing one shall be replaced as per configuration above.
Expand All @@ -92,14 +108,15 @@ A new sheet is created if the target sheet is not present, or the existing one s
|Write Header|If checked the first line written will contain the field names
|Write Footer|If checked the last line written will contains the field names
|Auto Size Columns|If checked the transform tries to automatically size the columns to fit their content.
Since this is not a feature the xls(x) file formats support directly, results may vary.
For `.xls`/`.xlsx` this is approximated by POI; for `.ods` the OpenDocument *optimal column width* flag is set.
|Force formula recalculation a|If checked, the transform tries to make sure all formula fields in the output file are updated.

* The xls file format supports a "dirty" flag that the transform sets.
The formulas are recalculated as soon as the file is opened in MS Excel.
* For the xlsx file format, the transform must try to recalculate the formula fields itself.
Since the underlying POI library does not support the full set of Excel formulas yet, this may give errors.
The transform will throw errors if it cannot recalculate the formulas.
* For `.ods`, formula results are cleared before save so Calc recalculates on open. Hop does not evaluate ODF formulas at write time.
|Leave styles of existing cells unchanged|If checked, the transform will not try to set the style of existing cells it is writing to.
This is useful when writing to pre-styled template sheets.
|===
Expand All @@ -112,9 +129,8 @@ This is useful when writing to pre-styled template sheets.
|Start writing at end of sheet|The transform will try to find the last line of the sheet, and start writing from there.
|Offset by ... rows|Any non-0 number will cause the transform to move this amount of rows down (positive numbers) or up (negative numbers) before writing rows.
Negative numbers may be useful if you need to append to a sheet, but still preserve a pre-styled footer.
|Begin by writing ... empty lines|The transform will try to find the last line of the sheet, and start writing from there.
|Omit Header|Any non-0 number will cause the transform to move this amount of rows down (positive numbers) or up (negative numbers) before writing rows.
Negative numbers may be useful if you need to append to a sheet, but still preserve a pre-styled footer.
|Begin by writing ... empty lines|When *shift existing cells down* is selected, empty rows are inserted at the write position instead of simply skipping ahead.
|Omit Header|Skip the header row when appending to an existing sheet.
|===

*Fields section*
Expand All @@ -135,12 +151,29 @@ The `ignore manual fields` ignores any fields manually defined in the transform'
|Format|The Excel format to use in the sheet.
Please consult the Excel manual for valid formats.
There are some online references as well.
For `.ods`, common Excel format tokens are converted to OpenDocument equivalents where possible.
|Style from cell|A cell (i.e. A1, B3 etc.) to copy the styling from for this column (usually some pre-styled cell in a template)
|Field Title|If set, this is used for the Header/Footer instead of the Hop field name
|Header/Footer style from cell|A cell to copy the styling from for headers/footers (usually some pre-styled cell in a template)
|Field Contains Formula|Set to Yes, if the field contains an Excel formula (no leading '=')
|Field Contains Formula|Set to Yes, if the field contains an Excel formula (no leading '=').
For `.ods`, Excel-style formulas are converted to OpenFormula syntax on a best-effort basis.
|Hyperlink|A field, that contains the target to link to.
The supported targets are Link to other cells, http, ftp, email, and local documents
|Cell Comment / Cell Author|The xlsx format allows to put comments on cells.
If you'd like to generate comments, you may specify fields holding the comment and author for a given column.
|Cell Comment / Cell Author|Comments are written for `.xlsx` and `.ods` (OpenDocument annotations).
Excel may not display ODS annotations; LibreOffice Calc does.
|===

[[ods-limitations]]
== ODS output notes and limitations

The `.ods` backend supports the same transform dialog options as `.xls`/`.xlsx` wherever the OpenDocument format allows.
Known differences:

* *Formulas* — Excel syntax is converted to OpenFormula (`of:=...`). Complex Excel-only functions may not translate. Formula results are not calculated by Hop; LibreOffice Calc recalculates when the file is opened.
* *Comments* — Stored as ODF annotations. Visible in LibreOffice Calc; Microsoft Excel may ignore them in `.ods` files.
* *Hyperlinks* — Stored as ODF `text:a` elements.
* *Format masks* — Excel format strings are mapped to ODF number formats on a best-effort basis.
* *Style copy* — Copies the referenced cell's style name, not a full POI cell style object.
* *Sheet protection* — Uses ODF `table:protected` with SHA-1 password hash (LibreOffice Calc compatible). The *protected by user* field is not used for `.ods`.
* *Streaming* — The *Stream XLSX data* option applies to `.xlsx` only.
* *Sheet names* — The 31-character Excel sheet name limit is not enforced for `.ods`.
175 changes: 175 additions & 0 deletions integration-tests/transforms/0101-check-ods-file-exists.hpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--

Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-->
<pipeline>
<info>
<name>0101-check-ods-file-exists</name>
<name_sync_with_filename>Y</name_sync_with_filename>
<description/>
<extended_description/>
<pipeline_version/>
<pipeline_type>Normal</pipeline_type>
<parameters>
</parameters>
<capture_transform_performance>N</capture_transform_performance>
<transform_performance_capturing_delay>1000</transform_performance_capturing_delay>
<transform_performance_capturing_size_limit>100</transform_performance_capturing_size_limit>
<created_user>-</created_user>
<created_date>2026/05/29 12:00:00.000</created_date>
<modified_user>-</modified_user>
<modified_date>2026/05/29 12:00:00.000</modified_date>
</info>
<notepads>
</notepads>
<order>
<hop>
<from>File exists!</from>
<to>Cleanup temporary file</to>
<enabled>Y</enabled>
</hop>
<hop>
<from>Look for test ODS file</from>
<to>Detect empty stream</to>
<enabled>Y</enabled>
</hop>
<hop>
<from>Detect empty stream</from>
<to>Abort because expected file not found!</to>
<enabled>Y</enabled>
</hop>
<hop>
<from>Look for test ODS file</from>
<to>File exists!</to>
<enabled>Y</enabled>
</hop>
</order>
<transform>
<name>Abort because expected file not found!</name>
<type>Abort</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<abort_option>ABORT_WITH_ERROR</abort_option>
<always_log_rows>Y</always_log_rows>
<message>Expected ODS file not found!</message>
<row_threshold>0</row_threshold>
<attributes/>
<GUI>
<xloc>336</xloc>
<yloc>368</yloc>
</GUI>
</transform>
<transform>
<name>Cleanup temporary file</name>
<type>ProcessFiles</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<addresultfilenames>N</addresultfilenames>
<createparentfolder>N</createparentfolder>
<operation_type>delete</operation_type>
<overwritetargetfile>N</overwritetargetfile>
<simulate>N</simulate>
<sourcefilenamefield>filename</sourcefilenamefield>
<attributes/>
<GUI>
<xloc>720</xloc>
<yloc>144</yloc>
</GUI>
</transform>
<transform>
<name>Detect empty stream</name>
<type>DetectEmptyStream</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<attributes/>
<GUI>
<xloc>336</xloc>
<yloc>256</yloc>
</GUI>
</transform>
<transform>
<name>File exists!</name>
<type>Dummy</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<attributes/>
<GUI>
<xloc>560</xloc>
<yloc>144</yloc>
</GUI>
</transform>
<transform>
<name>Look for test ODS file</name>
<type>GetFileNames</type>
<description/>
<distribute>N</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<doNotFailIfNoFile>N</doNotFailIfNoFile>
<dynamic_include_subfolders>N</dynamic_include_subfolders>
<file>
<file_required>N</file_required>
<include_subfolders>N</include_subfolders>
<name>${PROJECT_HOME}/files/excel/temp-ods-output.ods</name>
</file>
<filefield>N</filefield>
<filter>
<filterfiletype>all_files</filterfiletype>
</filter>
<isaddresult>Y</isaddresult>
<limit>0</limit>
<raiseAnExceptionIfNoFile>N</raiseAnExceptionIfNoFile>
<rownum>N</rownum>
<attributes/>
<GUI>
<xloc>336</xloc>
<yloc>144</yloc>
</GUI>
</transform>
<transform_error_handling>
</transform_error_handling>
<attributes/>
</pipeline>
Loading
Loading