Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
15805c2
Add MSVC support for Windows port
peterboncz Mar 28, 2026
9eb2e6c
had forgotten to add the split interpreter files (were split to make …
peterboncz Mar 28, 2026
7be131b
fixes to get CI back working
peterboncz Mar 28, 2026
cd428a2
hmm.. this CI seems a bit rusty -- some more bumps and fixes
peterboncz Mar 28, 2026
9bffa5e
more minor fixes to pacify CI (probably not the last ones)
peterboncz Mar 28, 2026
9ba018e
remove unused includes
peterboncz Mar 28, 2026
436d9c6
new test matrix
peterboncz Mar 28, 2026
11884a3
attempt #1 at fixing mvsc++ build
peterboncz Mar 28, 2026
f0dba9c
attempt #2 at fixing mvsc++ build
peterboncz Mar 28, 2026
252b13b
attempt #3 at fixing mvsc++ build
peterboncz Mar 28, 2026
18dbd88
attempt #4 at fixing mvsc++ build
peterboncz Mar 28, 2026
46ffe2a
attempt #5 at fixing mvsc++ build
peterboncz Mar 28, 2026
d03ab56
attempt #6 at fixing mvsc++ build
peterboncz Mar 28, 2026
bad405b
attempt #7 at fixing mvsc++ build
peterboncz Mar 28, 2026
2fde9b3
attempt #8 at fixing mvsc++ build
peterboncz Mar 28, 2026
f710ad1
attempt #9 at fixing mvsc++ build
peterboncz Mar 28, 2026
04a63b2
attempt #10 at fixing mvsc++ build
peterboncz Mar 28, 2026
5f16c17
attempt #11 at fixing mvsc++ build
peterboncz Mar 28, 2026
2ffa803
attempt #12 at fixing mvsc++ build
peterboncz Mar 28, 2026
0495d68
attempt #13 at fixing mvsc++ build
peterboncz Mar 28, 2026
58dbb04
attempt #14 at fixing mvsc++ build
peterboncz Mar 28, 2026
53d08be
attempt #15 at fixing mvsc++ build
peterboncz Mar 28, 2026
bbc588b
attempt #16 at fixing mvsc++ build
peterboncz Mar 28, 2026
0f4163a
attempt #17 at fixing mvsc++ build
peterboncz Mar 28, 2026
16ea23b
back to attempt #8
peterboncz Mar 28, 2026
079a80a
go back at attempt #11
peterboncz Mar 28, 2026
4124b31
try to evolve #11 so that mvsc works..
peterboncz Mar 28, 2026
2b99c2f
try to evolve #11 so that mvsc works.. take#2
peterboncz Mar 28, 2026
76f7e5e
try to evolve #11 so that mvsc works.. take#3
peterboncz Mar 29, 2026
53d2538
try to evolve #11 so that mvsc works.. take#4
peterboncz Mar 29, 2026
21b09e5
try to evolve #11 so that mvsc works.. take#5
peterboncz Mar 29, 2026
196b8bb
try to evolve #11 so that mvsc works.. take#6
peterboncz Mar 29, 2026
84a13fa
try to evolve #11 so that mvsc works.. take#7
peterboncz Mar 29, 2026
36376a1
try to evolve #11 so that mvsc works.. take#8
peterboncz Mar 29, 2026
4657ca2
try to evolve #11 so that mvsc works.. take#10
peterboncz Mar 29, 2026
13ee6fd
try to evolve #11 so that mvsc works.. take#11
peterboncz Mar 30, 2026
44eaf9e
use std::range to allow also some older clang's to be used
peterboncz Mar 30, 2026
9cc1ea4
attempt at fixing SINGLE_COLUMN_JPEG test
peterboncz Mar 30, 2026
e6c6bf9
Fix three MSVC portability bugs caused by 32-bit unsigned long on Win…
peterboncz Mar 31, 2026
ec98391
make format
peterboncz Mar 31, 2026
8c4e2ab
fix casting
peterboncz Mar 31, 2026
4e9387b
small fix
peterboncz Mar 31, 2026
9d295c3
add missing includes
peterboncz Mar 31, 2026
d4bacc0
make sure the null map is always allocated
peterboncz Mar 31, 2026
f9669c4
Fix three MSVC test failures: GALP null deref, fill_in UB, and GTest …
peterboncz Apr 2, 2026
315e782
Merge branch 'windows-port' of github.com:cwida/FastLanes into window…
peterboncz Apr 2, 2026
8769826
format-fix
peterboncz Apr 2, 2026
4349bc5
Merge branch 'windows-port' of github.com:cwida/FastLanes into window…
peterboncz Apr 2, 2026
5a9894f
now that victory over the bugs is achieved move to get DLLs
peterboncz Apr 2, 2026
d11f2ef
make format
peterboncz Apr 2, 2026
2655d81
enable shared library (DLL) builds with explicit FLS_API symbol expor…
peterboncz Apr 2, 2026
ea2af7b
fix header
peterboncz Apr 2, 2026
f58eb48
format-fix
peterboncz Apr 2, 2026
428e9fe
remove redundant include
peterboncz Apr 3, 2026
9066389
trying to get to fully green
peterboncz Apr 3, 2026
57c3951
- add gcc as acompiler
peterboncz Apr 3, 2026
00e9254
second attempt at fixing CI specificiation
peterboncz Apr 3, 2026
b092467
third attempt to fix yaml
peterboncz Apr 3, 2026
3d7dff2
- add gcc compiler warning flag (float conversion)
peterboncz Apr 3, 2026
5926ee7
- stop gcc warning on conversions and shadowing
peterboncz Apr 3, 2026
abcbb83
hammering on gcc compile again
peterboncz Apr 3, 2026
4694fd0
one more gcc flag?
peterboncz Apr 3, 2026
a565a81
gcc does not accept extra ; not does it like auto idx = 0
peterboncz Apr 3, 2026
7034fc5
more gcc fixes
peterboncz Apr 3, 2026
9d05889
one more gcc compiler exception.
peterboncz Apr 3, 2026
c651fd9
one more gcc problem fixed
peterboncz Apr 3, 2026
55b2478
three more fixes to pacify gcc, hopefully
peterboncz Apr 3, 2026
9c3f7d4
- fix windows seh handler to hopefully pass on windows
peterboncz Apr 3, 2026
4abfa33
guard compiler option as gcc does not support it
peterboncz Apr 3, 2026
bb719c7
bugs found by windows testing
peterboncz Apr 3, 2026
77d1ab2
another turn of the wheel
peterboncz Apr 3, 2026
4f73c1a
hand-simplify unrsum
peterboncz Apr 3, 2026
73d8533
one more out of bounds error
peterboncz Apr 3, 2026
6987a1a
fix unsigned overflow in is_exception() causing all values to be mark…
peterboncz Apr 4, 2026
78dfcac
fix bimap_frequency and min/max lost after Cast due to Finalize/Cast …
peterboncz Apr 6, 2026
4f5e820
fix frequency coding
peterboncz Apr 6, 2026
04ecfcc
The FSST12 encoder used memcpy(out, &res, sizeof(u64)) to speculatively
peterboncz Apr 6, 2026
b775c78
- shift ubuntu/Debig builds from gcc to clang (gcc times out)
peterboncz Apr 6, 2026
0ebc7f4
split to reduce compilation effort
peterboncz Apr 7, 2026
c9009c6
forgot to add files
peterboncz Apr 7, 2026
44e3859
add more forgotten files
peterboncz Apr 7, 2026
2d2f6e9
one more forgotten file
peterboncz Apr 7, 2026
59c4581
Two bugs, two fixes:
peterboncz Apr 7, 2026
5367a48
- make include direct to avoid confusingt the tidy check
peterboncz Apr 7, 2026
70b19c0
Merge branch 'add-mvsc+gcc-support' of github.com:cwida/FastLanes int…
peterboncz Apr 7, 2026
cb79623
make format
peterboncz Apr 7, 2026
4c1a587
- include fixes for tidy
peterboncz Apr 7, 2026
7cff0c3
tidy fix
peterboncz Apr 7, 2026
3e5e206
null_map_arr[idx] out-of-bounds in rowgroup_equality_visitor when nul…
peterboncz Apr 7, 2026
0276a95
Merge branch 'add-mvsc+gcc-support' of github.com:cwida/FastLanes int…
peterboncz Apr 7, 2026
5388905
split physical_operator variant into enc/dec sub-variants to reduce c…
peterboncz Apr 7, 2026
a51b1af
fix tidy issues
peterboncz Apr 8, 2026
acec3cc
remove CI on ubuntu-arm/clang/Debug/static as it still times out
peterboncz Apr 8, 2026
62cf065
add more CI testing
peterboncz Apr 8, 2026
ab004dd
Merge branch 'add-mvsc+gcc-support' of github.com:cwida/FastLanes int…
peterboncz Apr 8, 2026
d1e80de
another run at the CI wheel
peterboncz Apr 8, 2026
9129162
try fix dll making on clang
peterboncz Apr 8, 2026
bc117d2
disabling windows11-arm clang
peterboncz Apr 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/actions/generate-dataset/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ runs:
# 2️⃣ Remove any .venv that might have been created earlier
# with Python 3.13, so we always recreate it with 3.12.
# ─────────────────────────────────────────────────────────────
- name: Remove stale virtual-env
- name: Recreate virtual-env with the correct Python
shell: bash
run: rm -rf "$GITHUB_WORKSPACE/.venv"
run: |
rm -rf "$GITHUB_WORKSPACE/.venv"
python3 -m venv "$GITHUB_WORKSPACE/.venv"

# ─────────────────────────────────────────────────────────────
# 3️⃣ Generate the synthetic data
Expand All @@ -36,5 +38,6 @@ runs:
# 4️⃣ Generate the sentence embeddings
# ─────────────────────────────────────────────────────────────
- name: Generate embeddings
if: runner.os != 'Windows'
shell: bash
run: make generate-embeddings
6 changes: 3 additions & 3 deletions .github/workflows/benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ run-name: >-

on:
push:
branches: [ '*' ]
branches: [ main, dev ]
pull_request:
branches: [ '*' ]
branches: [ main, dev ]

concurrency:
group: Benchmarker CI-${{ github.ref }}
Expand All @@ -28,7 +28,7 @@ jobs:
strategy:
fail-fast: false
matrix:
platform: [ ubuntu-latest, macos-latest ]
platform: [ ubuntu-latest ]

defaults:
run:
Expand Down
304 changes: 207 additions & 97 deletions .github/workflows/cpp.yaml

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]

Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/flatbuffers-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ run-name: >-
# Trigger on every push & PR, on all branches
# ─────────────────────────────────────────────────────────────
on:
push: # no branches filter ⇒ every branch
pull_request: # no branches filter ⇒ every target branch
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]
concurrency:
group: flatbuffers-${{ github.ref }}
cancel-in-progress: true
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/fsst.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ run-name: >-
# ──────────────────────────────────────────────────────────────────────────────
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ "main", "dev" ]
branches: [ main, dev ]

# Cancel in-flight runs on the same branch/PR so we do not waste minutes
concurrency:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/header-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ run-name: >-
# ────────────────────────────────────────────────────────
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]
workflow_dispatch:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ "main", "dev" ]
branches: [ main, dev ]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,8 @@ rust/target/
skbuild-*/
.idea/
.venv/

# Windows build artifacts
build_win.bat
cmake_output.txt
build_win/
121 changes: 121 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What is FastLanes

FastLanes is a C++20 columnar compression storage format — "Like Parquet, but with 40% better compression and 40× faster decoding." Zero external dependencies, SIMD-friendly without explicit SIMD instructions. Bindings exist for Python (`python/`), Rust (`rust/`), C (`src/c_api/`), and CUDA (`cuda/`).

## Build Commands

FastLanes uses CMake 3.22+ with Ninja. On Linux/macOS it requires Clang >= 13. On Windows it uses MSVC (set up via `vcvarsall.bat`).

### Configure and build (Release with tests)
```bash
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DFLS_BUILD_TESTING=ON
cmake --build build --parallel
```

### Run all tests
```bash
cd build && ctest -j4 --output-on-failure --timeout 300 -E QuickFuzz
```

### Run a single test by filter
```bash
build/test/src/dataset_tests/dataset_tests.exe --gtest_filter=FastLanesReaderTester.issue_000
```

### Run a single test target
```bash
cmake --build build --target unit_test && ctest -R unit_test --output-on-failure
```

### Key CMake options
| Option | Default | Purpose |
|--------|---------|---------|
| `FLS_BUILD_TESTING` | OFF | Build tests (fetches GoogleTest v1.15.2) |
| `FLS_BUILD_SHARED_LIBS` | OFF | Build as shared library (DLL) instead of static |
| `FLS_BUILD_BENCHMARKING` | OFF | Build benchmarks |
| `FLS_BUILD_PYTHON` | OFF | Build Python bindings |
| `FLS_BUILD_CUDA` | OFF | Build CUDA reader |
| `FLS_ENABLE_CLANG_TIDY` | OFF | Enable clang-tidy on all targets |

### Windows-specific (MSVC)

Invoke builds via a `.bat` that calls `vcvarsall.bat` first. Example pattern:
```bat
call "C:\Program Files\Microsoft Visual Studio\...\vcvarsall.bat" arm64
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DFLS_BUILD_TESTING=ON
cmake --build build --parallel
```

Test data can be cached across builds by setting `FASTLANES_DATA_DIR` environment variable to an existing data directory (e.g., `build_release/_deps/data-src`).

### Format code
The project uses `.clang-format` (LLVM base, tabs, 120-column limit). Run clang-format on changed files before committing.

## Architecture

### Public API

The main entry point is `fastlanes::Connection` (in `src/include/fls/connection.hpp`):
```cpp
auto conn = fastlanes::connect();
conn->read_csv("input/"); // ingest CSV
conn->to_fls("output/"); // write FastLanes format
auto reader = conn->read_fls("data.fls"); // read back
auto table = reader->materialize();
```

`TableReader` provides rowgroup-level random access. `RowgroupReader` decompresses individual rowgroups. The reader stack: `TableReader` → `RowgroupReader` → `RowgroupView` → `ColumnView` → `SegmentView`.

### Library structure

All source is under `src/`. Each subdirectory builds an OBJECT library that gets linked into the single `FastLanes` library target. Key components:

- **`cor/`** — Core: architecture detection, CPU features, layout (`Buf`), compression/decompression engines
- **`expression/`** — Expression-based encoding: physical expressions, operators (RLE, FSST, ALP, dict, delta, etc.), interpreter
- **`encoder/`** — High-level encoding pipeline, materializer (decompression)
- **`wizard/`** — Schema discovery: analyzes data and selects optimal encoding per column
- **`reader/`** — File reading: segments, column views, rowgroup views, table reader
- **`table/`** — In-memory table representation: `Rowgroup`, `Table`, `Vector`, typed columns
- **`footer/`** — FlatBuffers-generated metadata descriptors (table, rowgroup, column, segment)
- **`alp/`** — ALP (Adaptive Lossless Floating-Point) compression codec
- **`primitive/`** — Low-level primitives: bitpacking, patching, FSST string compression

### DLL / Shared library support (Windows)

The `FLS_API` macro in `src/include/fls/api/api.hpp` controls symbol visibility:
- `FLS_STATIC` defined → `FLS_API` is empty (static build)
- `FLS_BUILD_DLL` defined → `FLS_API` is `__declspec(dllexport)` (building the DLL)
- Neither defined → `FLS_API` is `__declspec(dllimport)` (consuming the DLL)

When `FLS_BUILD_SHARED_LIBS=ON`, `FLS_BUILD_DLL` is set directory-scoped via `add_compile_definitions` in `src/CMakeLists.txt` so all object libraries under `src/` get it. Test targets (under `test/`) don't get it, so `FLS_API` correctly resolves to `dllimport` for them.

Any public function or class that test code (or external consumers) calls across the DLL boundary must be marked `FLS_API`. For template functions, the explicit instantiations in the .cpp must also carry `FLS_API`.

Note: `WINDOWS_EXPORT_ALL_SYMBOLS` does NOT work for this project — the symbol count exceeds the 65535 .def file limit.

**MSVC dllexport gotchas:** MSVC eagerly instantiates all special member functions for `__declspec(dllexport)` classes. This causes two problems:

1. **Non-copyable members** (e.g., `vector<unique_ptr<T>>`): MSVC tries to generate copy ctor/assign and fails. Fix: explicitly `= delete` copy operations on the class.

2. **Incomplete types in unique_ptr**: MSVC tries to generate the destructor inline, which needs the complete type. Fix: either include the complete type's header, or declare the destructor in the header and define it `= default` in the .cpp where the type is complete.

### Type aliases

Defined in `src/include/fls/common/alias.hpp`:
- `n_t` = `uint64_t` (counts), `idx_t` = `uint32_t` (indices), `bw_t` = `uint8_t` (bit width)
- `up<T>` = `unique_ptr<T>`, `sp<T>` = `shared_ptr<T>`

### Test structure

Tests live in `test/src/` with six suites: `dataset_tests`, `expression_tests`, `fls_reader_tests`, `primitive_tests`, `quick_fuzz_tests`, `unit_tests`. All use GoogleTest. On MSVC, a `msvc_heap_guard` object library handles SEH guard-page exceptions that would otherwise cause spurious test failures.

## Code Style

- `.clang-tidy` is strict: `WarningsAsErrors: '*'` — all warnings are errors
- Types: `CamelCase`. Functions: `aNy_CasE`. Members: `lower_case` (private: `m_` prefix). Constants: `UPPER_CASE`. Typedefs: `lower_case` with `_t` suffix
- Tabs for indentation, 120-column limit
- PRs target `dev` branch
Loading
Loading