Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions filter-plugin/logstash-filter-guardium-universal/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Architectural Proposal: Universal Guardium Filter Plugin

> **This is a suggestion to the project maintainers.**
> The reference implementation here is meant to illustrate the idea concretely,
> not to be merged as-is. Feedback and alternative approaches are very welcome.

---

## The Problem

Every filter plugin in this repository follows the same structure.
Opening any two plugins side-by-side reveals that they are nearly identical β€” the only
meaningful difference is the 50–150 lines of parsing logic specific to each datasource.

Everything else is copy-pasted boilerplate:

```
@LogstashPlugin annotation ┐
implements Filter β”‚
static Log4j init block β”‚ ~200 lines repeated verbatim
filter() event loop + try/catch β”‚ in every single plugin
GSON serialization β”‚
correctIPs() utility β”‚
logEvent() utility β”‚
configSchema() / getId() β”˜
```

This creates real maintenance costs:
- A security fix or utility improvement must be applied to **54 files**
- Adding a new datasource means scaffolding a full Logstash plugin (~500 lines, 8+ files, a new gem)
- 54 separate gem artifacts to build, test, version, and ship

---

## The Suggestion

> **Replace all 54 individual filter plugins with a single generic plugin,
> where each datasource is just a thin parser class (or ideally just a config file).**

The Logstash plugin layer should exist exactly once. The only thing that varies between
datasources β€” the parsing logic β€” should be expressed in the simplest possible form.

---

## Proposed Architecture

### Current state (54 plugins)

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ logstash-filter-mysql-guardium/ β”‚
β”‚ β”œβ”€β”€ build.gradle (190 lines, ~identical across all) β”‚
β”‚ β”œβ”€β”€ MySqlFilterGuardium.java β”‚
β”‚ β”‚ β”œβ”€β”€ @LogstashPlugin, implements Filter ┐ boilerplate β”‚
β”‚ β”‚ β”œβ”€β”€ Log4j init, GSON, correctIPs() β”‚ ~200 lines β”‚
β”‚ β”‚ β”œβ”€β”€ filter() loop, error tagging β”˜ β”‚
β”‚ β”‚ └── parseRecord() ← the only unique part β”‚
β”‚ └── filter.conf: mysql_filter_guardium {} β”‚
β”‚ β”‚
β”‚ logstash-filter-mongodb-guardium/ (same structure) β”‚
β”‚ logstash-filter-snowflake-guardium/ (same structure) β”‚
β”‚ logstash-filter-postgres-guardium/ (same structure) β”‚
β”‚ ... Γ— 54 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

54 gems Β· 54 build files Β· 54 copies of the same boilerplate
```

### Proposed state (1 plugin + thin parsers)

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ logstash-filter-guardium-universal/ (ONE gem) β”‚
β”‚ β”‚ β”‚
β”‚ β”œβ”€β”€ GuardiumUniversalFilter.java ← all Logstash boilerplate β”‚
β”‚ β”‚ └── delegates to ──────────────────────────────────┐ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”œβ”€β”€ IGuardiumParser (interface) β”‚ β”‚
β”‚ β”‚ └── parseRecord(Event) β†’ Record β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”œβ”€β”€ AbstractGuardiumParser β”‚ β”‚
β”‚ β”‚ └── correctIPs(), shared utilities β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”œβ”€β”€ ParserRegistry ←────────────────────────────────── β”˜ β”‚
β”‚ β”‚ β”œβ”€β”€ "mysql" β†’ MySqlParser (~150 lines) β”‚
β”‚ β”‚ β”œβ”€β”€ "mongodb" β†’ MongoDbParser (~ 60 lines) β”‚
β”‚ β”‚ β”œβ”€β”€ "snowflake" β†’ SnowflakeParser (~ 40 lines) β”‚
β”‚ β”‚ └── ... (one line per datasource) β”‚
β”‚ β”‚ β”‚
β”‚ └── filter.conf: β”‚
β”‚ guardium_universal_filter { datasource => "mysql" } β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 gem Β· 1 build file Β· boilerplate written once
```

---

## What Changes for Each Datasource

### `filter.conf` β€” minimal change

```diff
- mysql_filter_guardium {}
+ guardium_universal_filter { datasource => "mysql" }
```

### Adding a new datasource β€” before vs. after

| | Before | After |
|---|---|---|
| Files to create | 8+ (plugin class, build.gradle, VERSION, gemspec, ...) | 1 (parser class) |
| Lines of new code | ~500 | ~100 |
| New gem required | Yes | No |
| Boilerplate to copy | ~200 lines | 0 lines |

---

## Reference Implementation

This PR includes a working reference implementation to make the idea concrete:

```
filter-plugin/logstash-filter-guardium-universal/
β”œβ”€β”€ GuardiumUniversalFilter.java ← the single Logstash plugin
β”œβ”€β”€ parser/
β”‚ β”œβ”€β”€ IGuardiumParser.java ← interface: parseRecord(Event) β†’ Record
β”‚ β”œβ”€β”€ AbstractGuardiumParser.java ← shared utilities
β”‚ └── ParserRegistry.java ← datasource name β†’ parser instance
β”œβ”€β”€ datasources/
β”‚ β”œβ”€β”€ mysql/MySqlParser.java ← MySQL fully migrated (~150 lines)
β”‚ β”œβ”€β”€ mongodb/MongoDbParser.java ← MongoDB thin connector
β”‚ └── snowflake/SnowflakeParser.java← Snowflake thin connector
└── [MySQL|MongoDB|Snowflake]*Package/filter.conf
```

**MySQL is fully migrated** as a concrete example β€” its parsing logic is identical to the
original, just extracted into a plain Java class with no Logstash dependency.
MongoDB and Snowflake are included as thin connectors to show how complex, multi-class
parser hierarchies integrate cleanly.

---

## Migration Strategy

The migration can be done incrementally with zero disruption:

```
Phase 1 Framework + 3 reference parsers (this PR)
Phase 2 Migrate remaining 51 parsers one by one (mechanical extraction)
Phase 3 Move parser class hierarchies fully into the new plugin
Phase 4 Deprecate individual filter plugin directories
```

Existing pipelines are unaffected until their `filter.conf` is updated.
Both the old and new plugin can coexist during migration.

---

## Questions for the Team

- Is this direction aligned with the project's goals?
- Should `IGuardiumParser` live in the `common` module instead, to allow
parser JARs to be developed and deployed independently?
- Should parsers eventually be driven by config files (YAML field mappings)
for simple datasources, with Java only needed for complex ones?

---

> Raised by [@haimofergmail](https://github.com/haimofergmail) β€” open to all feedback.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# MongoDB audit logs via syslog β€” uses the universal filter plugin
filter {
if [type] == "syslog-mongodb" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:server_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}

if "_grokparsefailure" in [tags] { drop {} }

mutate {
rename => { "host" => "server_ip" }
}

# ---- Universal filter (replaces mongodb_guardium_filter{}) ---------------
guardium_universal_filter {
datasource => "mongodb"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# MySQL audit logs via syslog β€” uses the universal filter plugin
# Previously required a dedicated logstash-filter-mysql-guardium plugin.
filter {
if [type] == "syslog-mysql" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:server_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}

if "_grokparsefailure" in [tags] { drop {} }

date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}

mutate {
rename => { "host" => "server_ip" }
}

# ---- Universal filter (replaces mysql_filter_guardium{}) ----------------
guardium_universal_filter {
datasource => "mysql"
}

if "_guardium_parse_error_mysql" not in [tags] {
mutate {
remove_field => ["message", "syslog_timestamp", "syslog_program",
"syslog_pid", "syslog_message", "type"]
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Snowflake audit logs via JDBC input β€” uses the universal filter plugin
filter {
# ---- Universal filter (replaces guardium_snowflake_filter{}) ---------------
guardium_universal_filter {
datasource => "snowflake"
}
}
1 change: 1 addition & 0 deletions filter-plugin/logstash-filter-guardium-universal/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.0.0
Loading