Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
8366072
Update IBMCloudMongoDB_README.md
rasikashete3 Jun 9, 2025
484d06e
GRD-104909 : Added Readme fix for codec pattern.
shete-rasika Aug 6, 2025
8b1b0f6
GRD-104909 : Updated Readme.
shete-rasika Aug 6, 2025
4e8fccd
Updated readme
shete-rasika Aug 6, 2025
1675f8b
add limitation
JingqiuDu Aug 7, 2025
cd2102c
Merge pull request #884 from IBM/GRD-106883_addLimitation
JingqiuDu Aug 7, 2025
5c342fc
update trino readme
AndychenIBM Aug 8, 2025
0a93bd7
update trino readme
AndychenIBM Aug 8, 2025
9115803
update trino readme
AndychenIBM Aug 8, 2025
3312a61
GRD-106799 update trino readme
AndychenIBM Aug 8, 2025
763796d
GRD-106799 update readme trino to master
AndychenIBM Aug 8, 2025
c64857c
aug 8
laurel-hu Aug 8, 2025
571d52f
Merge pull request #887 from laurel-hu/GRD-106996
JingqiuDu Aug 8, 2025
c9728b0
Fixed indices parsing
pankajkumaribm Aug 8, 2025
9f9a22f
Fixed indices parsing
pankajkumaribm Aug 8, 2025
04270e1
update readme
JingqiuDu Aug 8, 2025
b1ec537
Merge pull request #888 from IBM/GRD-107002_updateReadme
JingqiuDu Aug 8, 2025
6da2a95
Fixed indices parsing
pankajkumaribm Aug 8, 2025
2002b2e
Merge pull request #889 from pankajkumaribm/bugFix
pankajkumaribm Aug 8, 2025
2cc9ee4
aug 8
laurel-hu Aug 8, 2025
2a9deed
Merge pull request #892 from laurel-hu/GRD-107009
JingqiuDu Aug 8, 2025
8086ffc
GRD-100469: Troubleshooting Section for AWS Cloudwatch VPC Endpoint
ashish-mehta4 Aug 11, 2025
b5c9588
Merge pull request #895 from ashish-mehta4/GRD-100469
rasikashete3 Aug 11, 2025
e802184
Readme updated
pankajkumaribm Aug 11, 2025
5e2d3f1
aug 11
laurel-hu Aug 11, 2025
5720fa1
Merge pull request #897 from laurel-hu/GRD-106539
JingqiuDu Aug 11, 2025
a739f45
aug 11
laurel-hu Aug 11, 2025
26e555b
Merge pull request #898 from laurel-hu/GRD-107009
taees-eimouri Aug 11, 2025
bef79ba
Opensearch improvements
pankajkumaribm Aug 11, 2025
01cfc2e
Merge pull request #896 from pankajkumaribm/osReadmes
pankajkumaribm Aug 11, 2025
0a674f7
Opensearch query body json format fixes
pankajkumaribm Aug 11, 2025
e3b9b23
Merge pull request #903 from pankajkumaribm/mainB
pankajkumaribm Aug 11, 2025
6ca0908
drop generated logs
JingqiuDu Aug 11, 2025
590b108
GRD-106362 create new message for parsing, updated NA to N.A.
AndychenIBM Aug 11, 2025
e4bf5d5
change service name and fix multi s-taps
JingqiuDu Aug 11, 2025
c5f2dd6
Merge pull request #904 from AndychenIBM/GRD-106362-Postgre_to_master
taees-eimouri Aug 11, 2025
32bcd33
GRD-106362 updated GHANGELOG
AndychenIBM Aug 12, 2025
cad3bb2
Merge pull request #787 from IBM/GRD-102063
rasikashete3 Aug 12, 2025
505c0e5
Merge pull request #881 from IBM/GRD-104909
rasikashete3 Aug 12, 2025
e0b9710
Merge pull request #909 from IBM/GRD-107013_ChangeServiceName
taees-eimouri Aug 12, 2025
f04c782
Merge pull request #907 from IBM/GRD-106888_DropSystemGeneratedLog
taees-eimouri Aug 12, 2025
7764ede
GRD-107009 fixed service name mismatch with database name
AndychenIBM Aug 12, 2025
cccda8b
Merge pull request #913 from AndychenIBM/GRD-107009_TrinoDB_main
taees-eimouri Aug 12, 2025
a2c5039
Merge pull request #911 from AndychenIBM/GRD-106362-Postgre_to_master
AndychenIBM Aug 12, 2025
30fb032
aug 13
laurel-hu Aug 13, 2025
87dee02
Merge pull request #915 from laurel-hu/GRD-106536
JingqiuDu Aug 13, 2025
55ad186
Merge remote-tracking branch 'upstream/main' into GRD-107959
zeeIBM Aug 13, 2025
443f512
GRD-107009 (#918)
laurel-hu Aug 13, 2025
efaa7de
Merge remote-tracking branch 'upstream/main' into GRD-107959
zeeIBM Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ input {
consumer_group => "gconsumer_group"
decorate_events => false
type => "databricks"
#Insert Storage Connection String retrived while creating the storage account.
#Recommanded when the requirement is to read from multiple EventHubs. Else, data loss may happen.
storage_connection => "<storage_connection_string>"
#Insert your enrollmentId of azure account
add_field => {"enrollmentId" => <enrollmentId>}
}
}

Expand Down
138 changes: 70 additions & 68 deletions filter-plugin/logstash-filter-databricks-guardium/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* Environment: Azure
* Supported inputs: Azure Event Hub (pull)
* Supported Guardium versions:
* Guardium Data Protection: 11.4 and above
* Guardium Data Protection: 11.4 and above

This is a [Logstash](https://github.com/elastic/logstash) filter plug-in for the universal connector that is featured in IBM Security Guardium. It parses events and messages from the Azure-Databricks audit log into a [Guardium record](https://github.com/IBM/universal-connectors/blob/main/common/src/main/java/com/ibm/guardium/universalconnector/commons/structures/Record.java) instance (which is a standard structure made out of several parts). The information is then sent over to Guardium. Guardium records include the accessor (the person who tried to access the data), the session, data, and exceptions. If there are no errors, the
data contains details about the query "construct". The construct details the main action (verb) and collections (objects) involved.
Expand All @@ -27,69 +27,70 @@ The plug-in is free and open-source (Apache 2.0). It can be used as a starting p


### Azure Event Hub Connection:
1. Search "event hub" in the search bar.
2. Select **create event hubs namespace**.

3. To create a namespace:
- Select the subscription in which you want to create the namespace.
- Select the resource group you created in the previous step.
- Enter a unique name for the namespace.
- Select a location for the namespace.
- Choose the appropriate pricing tier. (In this example, we selected basic).
- Leave the throughput units (or processing units for standard and premium tier) settings as is.
- Select **Review + Create**. Review the settings and select **Create**.
- Your recently created namespace appears in **resource group**.
1. Search "event hub" in the search bar.
2. Select **create event hubs namespace**.

3. To create a namespace:
- Select the subscription in which you want to create the namespace.
- Select the resource group you created in the previous step.
- Enter a unique name for the namespace.
- Select a location for the namespace.
- Choose the appropriate pricing tier. (In this example, we selected basic).
- Leave the throughput units (or processing units for standard and premium tier) settings as is.
- Select **Review + Create**. Review the settings and select **Create**.
- Your recently created namespace appears in **resource group**.

4. To create an event hub:
- Go to the Event Hubs Namespace page.
- Click **+ Event Hub**.
- Enter a unique name for the event hub.
- Choose at least the maximum number of partitions that you expect to require during peak usage for this event hub.
For example, if you want to generate traffic from 2 DB instances, then choose at least 2 partitions if not more.
- Click **Review+create**.
- Review the settings and click **Create**.
4. To create an event hub:
- Go to the Event Hubs Namespace page.
- Click **+ Event Hub**.
- Enter a unique name for the event hub.
- Choose at least the maximum number of partitions that you expect to require during peak usage for this event hub.
For example, if you want to generate traffic from 2 DB instances, then choose at least 2 partitions if not more.
- Click **Review+create**.
- Review the settings and click **Create**.

5. Connection string for an event hub:
- In the list of event hubs, select your event hub.
- On the Event Hubs instance page, go to **Settings** > **Shared access policies** > **Add**.
- Name the policy, click **manage** to provide permissions, and create the policy.
- Select Connection string–primary key string from policy (it would be required in input plugin).

6. Azure Storage Accounts Creation:
- Login to https://portal.azure.com.
- Search Storage accounts in search bar.
- Click **Create**.
- Basic Tab:
- Select the subscription in which you want to create the storage account.
- Select an existing resource group or create a new one.
- Enter a unique name for the storage account.
- Select the same region for the storage account that you selected for the server.
- Choose any performance type.
- Select **Geo-redundant(GRS) Redundancy configuration**.
- Select **Make read access to data**.
- Click **Next:Advance**.
- Advanced tab:
- **Require secure transfer** should already be selected.
- **Allow enabling public access** should already be selected.
- **Enable storage account key access** should already be selected.
- Select the latest TLS version.
- Permitted scope should display the default value (from any storage account).
- The remaining parameters (Hierarchical Namespace, Access protocols, Blob storage, and Azure Files) should display the default values provided by Azure.
- Click **Next:Networking**.
- Networking tab:
- Enable public access from all networks for **Network access**.
- Select **Microsoft network routing** for **Routing preference**.
- Click **Next:Data protection**.
- Data protection tab:
- Keep the default values provided by Azure.
- Click **Next:Encryption**.
- Encryption tab:
- **Encryption type** should already be set to **Microsoft-managed key(MMK)**.
- **Enable support for customer-managed keys** should be set to the default value (**blobs and files**).
- By default, **Infrastructure encryption** should not be enabled.
- Click **Next:Tags**.
- For the Tags tab, make no changes and click **Next:Review**.
- Click **Create** after you review all the parameters.
5. Connection string for an event hub:
- In the list of event hubs, select your event hub.
- On the Event Hubs instance page, go to **Settings** > **Shared access policies** > **Add**.
- Name the policy, click **manage** to provide permissions, and create the policy.
- Select Connection string–primary key string from policy (it would be required in input plugin).

6. Azure Storage Accounts Creation:
- Login to https://portal.azure.com.
- Search Storage accounts in search bar.
- Click **Create**.
- Basic Tab:
- Select the subscription in which you want to create the storage account.
- Select an existing resource group or create a new one.
- Enter a unique name for the storage account.
- Select the same region for the storage account that you selected for the server.
- Choose any performance type.
- Select **Geo-redundant(GRS) Redundancy configuration**.
- Select **Make read access to data**.
- Click **Next:Advance**.
- Advanced tab:
- **Require secure transfer** should already be selected.
- **Allow enabling public access** should already be selected.
- **Enable storage account key access** should already be selected.
- Select the latest TLS version.
- Permitted scope should display the default value (from any storage account).
- The remaining parameters (Hierarchical Namespace, Access protocols, Blob storage, and Azure Files) should display the default values provided by Azure.
- Click **Next:Networking**.
- Networking tab:
- Enable public access from all networks for **Network access**.
- Select **Microsoft network routing** for **Routing preference**.
- Click **Next:Data protection**.
- Data protection tab:
- Keep the default values provided by Azure.
- Click **Next:Encryption**.
- Encryption tab:
- **Encryption type** should already be set to **Microsoft-managed key(MMK)**.
- **Enable support for customer-managed keys** should be set to the default value (**blobs and files**).
- By default, **Infrastructure encryption** should not be enabled.
- Click **Next:Tags**.
- For the Tags tab, make no changes and click **Next:Review**.
- Click **Create** after you review all the parameters.



### Link event hub to Databricks
Expand All @@ -98,25 +99,26 @@ The plug-in is free and open-source (Apache 2.0). It can be used as a starting p
2. Navigate to your Azure Databricks. Open the Diagnostic settings pane under the Monitoring section.
3. After the page opens, you will need to create a new diagnostic setting.
4. In the Diagnostic settings pane, fill in the form with your preferred categories.
5. Select your categories details, and then send your logs to your preferred destination, in this case, we check **Stream to an event hub**, and put prefered event hub information in.
5. Select your categories details, and then send your logs to your preferred destination, in this case, we check **Stream to an event hub** and **Archive to a storage account**, and put prefered event hub and Storage account information in.
6. Launch your Databricks Workspace and go to profile at top right corner.
7. click ```Settings```, go to ```Advanced```, search for ```Verbose Audit Logs``` and turn it on.



## 4. Connecting to the Azure Databricks
## 2. Connecting to the Azure Databricks
### Insert/Update data through Data Explorer
1. Login to https://portal.azure.com.
2. Navigate to your Azure Databricks. Launch Workspace.
3. Under **SQL Editor**, you can run sql command by creating new quert scripts.

## 5. Limitations
## 3. Limitations
1. The following important fields couldn't be mapped with Databricks audit logs:
The following field are not found in original audit log from Azure Databricks: Database name, ProtocolVersion, AppUserName, Client mac, Common Protocol, Os User, ClientOs, ServerOs.
2. The log with sql excution will not have client ip, but it will come with another log with action name of "commandFinish".
The following fields are not found in original audit log from Azure Databricks: Database name, ProtocolVersion, AppUserName, Client mac, Common Protocol, Os User, ClientOs, ServerOs.
2. The log with sql execution will not have client ip, but it will come with another log with action name of "commandFinish".
3. The eventhub takes 10~30 minutes to receive raw logs from Databricks, the same delay time for Guardium is expected.
4. If queries are submitted as part of a notebook cell, job, or script, Databricks may log the entire execution context (e.g., the notebook run or job task) rather than each individual SQL query. In this case, Guardium will not be able to form separate records and only parse the first statement.

## 6. Configuring the Azure-Databricks filter in Guardium
## 4. Configuring the Azure-Databricks filter in Guardium
The Guardium universal connector is the Guardium entry point for native audit logs. The Guardium universal connector identifies and parses the received events, and converts them to a standard Guardium format. The output of the Guardium universal connector is forwarded to the Guardium sniffer on the collector, for policy and auditing enforcements. Configure Guardium to read the native audit logs by customizing the Azure-Databricks template.

### Before you begin
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,12 @@ public static Record parseRecord(final JsonObject records) {
String accountId = getAccountId(records);// second part of resourceId
String sessionId = getSessionId(requestParams);
String serviceName = getServiceName(properties);
String dbName = subId+":"+serviceName;
record.setSessionId(sessionId);
record.setDbName(Constants.UNKNOWN_STRING);
record.setDbName(dbName);
record.setAppUserName(Constants.UNKNOWN_STRING);
record.setAccessor(parseAccessor(subId, accountId, properties, records));
record.getAccessor().setServiceName(Constants.NOT_AVAILABLE);
record.getAccessor().setServiceName(dbName);
record.setSessionLocator(parserSessionLocator(properties));

String response = properties.get(Constants.RESPONSE).toString();
Expand Down Expand Up @@ -365,7 +366,6 @@ static Accessor parseAccessor(String subId, String accountId, JsonObject propert
accessor.setServerType(Constants.SERVER_TYPE);
accessor.setDbProtocol(Constants.DATA_PROTOCOL);
accessor.setDbProtocolVersion(Constants.UNKNOWN_STRING);
accessor.setServiceName(Constants.NOT_AVAILABLE);

// Set source program (user agent)
accessor.setSourceProgram(properties.has(Constants.USER_AGENT)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ void testParseServiceName() {
final String DatabricksString = "{ \"resourceId\": \"/SUBSCRIPTIONS/5C0C81D4-656F-415D-8599-DCD86F2F665E/RESOURCEGROUPS/DATABRICKSTEST/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/DATABRICK-TEST\", \"operationVersion\": \"1.0.0\", \"identity\": \"{\\\"email\\\":\\\"abc.h@abc.com\\\",\\\"subjectName\\\":null}\", \"operationName\": \"Microsoft.Databricks/accounts/tokenLogin\", \"time\": \"2025-05-01T13:43:28Z\", \"category\": \"accounts\", \"properties\": {\"sourceIPAddress\":\"20.193.136.102\",\"logId\":\"87e1a69e-444a-434e-a0be-648b1797e6d9\",\"serviceName\":\"accounts\",\"userAgent\":\"Apache-HttpClient/4.5.14 (Java/17.0.13) Databricks-Service/driver DBHttpClient/v2RawClient\",\"response\":\"{\\\"statusCode\\\":200}\",\"sessionId\":null,\"actionName\":\"tokenLogin\",\"requestId\":\"0018d88c-2f52-4cf0-86a4-8d1dc416ab10\",\"requestParams\":\"{\\\"user\\\":\\\"abc@abc.com\\\",\\\"tokenId\\\":\\\"kfhjgjfhdgjkdh39284783297423943hejhfkdsfh39\\\",\\\"authenticationMethod\\\":\\\"API_INT_PAT_TOKEN\\\"}\"}, \"Host\": \"1234-123456-ab1c2d3e-12-123-12-1\"}";
final JsonObject DatabricksJson = JsonParser.parseString(DatabricksString).getAsJsonObject();
Record record = Parser.parseRecord(DatabricksJson);
assertEquals("N.A.", record.getAccessor().getServiceName());
assertEquals("5C0C81D4-656F-415D-8599-DCD86F2F665E:accounts", record.getAccessor().getServiceName());

}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ Verify [here](https://ondeck.console.cloud.ibm.com/docs/cloud-logs?topic=cloud-l
- Verify all of the information and click ```Create stream```.

## 4. Limitations
1. The analysis is based on IBM Cloud Database for MongoDB 4.4.
1. The analysis is based on IBM Cloud Database for MongoDB 7.0.
2. Logs for SQL errors do not get generated from the data source.
3. IBM Cloud Databases for MongoDB only supports 22 events. See [here](https://cloud.ibm.com/docs/databases-for-mongodb?topic=databases-for-mongodb-auditlogging) for more information.
4. In this example, we used both CLI and UI queries to run the analysis.
Expand Down
2 changes: 1 addition & 1 deletion filter-plugin/logstash-filter-trino-guardium/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,4 @@ enforcements.
the Universal Connector using the ```Disable/Enable``` button.

#### Limitations
• Client Hostname and Source Program will be seen as blank in report.
• Client Hostname and Source Program will be seen as blank in report.
48 changes: 47 additions & 1 deletion input-plugin/logstash-input-cloudwatch-logs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ The `region` setting allows specify the region in which the Cloudwatch log group

#### `codec`
The `codec` setting allows specify, the codec used for input data. Input codecs are a convenient method for decoding the data before it enters the input, without needing a separate filter in the Logstash pipeline.
##### `codec pattern`
The `codec pattern` is a regular expression that Logstash uses to identify lines that are either the start of a new multiline event or a continuation of a previous one.
For the Redshift and Postgres plug-ins, update the value of the pattern parameter from the inputs section as specified in the codec pattern.
For Redshift, add pattern from [here] ( https://github.com/IBM/universal-connectors/blob/main/filter-plugin/logstash-filter-redshift-aws-guardium/redshift-over-cloudwatch.conf )
For Postgres, add pattern from [here] ( https://github.com/IBM/universal-connectors/blob/main/filter-plugin/logstash-filter-postgres-guardium/PostgresOverCloudWatchPackage/postgresCloudwatch.conf )

#### `role_arn`
The role_arn setting allows you to specify which AWS IAM Role to assume, if any. This is used to generate temporary credentials, typically for cross-account access. To understand more about the settings to be followed while using this parameter, click [here]( ./SettingsForRoleArn.md )
Expand Down Expand Up @@ -108,4 +113,45 @@ Other standard logstash parameters are available such as:
```
grdapi add_domain_to_universal_connector_allowed_domains domain=amazonaws.com
grdapi add_domain_to_universal_connector_allowed_domains domain=amazon.com
```
```
```


## Troubleshooting:
### Using VPC Endpoints for AWS Connectivity

If Logstash is unable to connect to AWS CloudWatch Logs due to **network restrictions** (e.g. Traffic restricted to private subnets), you may need to route the connection through an **AWS VPC Endpoint**.

### Solution: Use a VPC Endpoint with a Custom Endpoint URL

When running Logstash inside a **VPC** (Virtual Private Cloud), especially in private subnets, the AWS SDK cannot reach the public `logs.{region}.amazonaws.com` endpoint unless explicitly allowed.
To solve this, configure the plugin to use your VPC Endpoint along with AWS's bundled Certificate Authority (CA) for secure communication.

#### Example Configuration

```logstash
input {
cloudwatch_logs {
# Default Configuration
log_group => "/aws/lambda/my-lambda-function"
region => "us-east-1"
access_key_id => "YOUR_ACCESS_KEY"
secret_access_key => "YOUR_SECRET_KEY"

# Use your private VPC Endpoint instead of the public AWS endpoint
endpoint => "https://vpce-xxxxxxxxabcdef.logs.us-east-1.vpce.amazonaws.com"
# Ensures the connection uses AWS's trusted root certificates
use_aws_bundled_ca => true
}
}
```

> **Note**: Replace the `endpoint` URL with your actual VPC Endpoint DNS name. You can find this in the AWS Console under **VPC > Endpoints**.

### Additional Notes

- Make sure the VPC Endpoint is created for the **CloudWatch Logs interface service** (`com.amazonaws.us-east-1.logs`).
- Ensure your **subnet** and **security group** allow HTTPS traffic to the endpoint.
- If you're using **IAM roles** (e.g., EC2 instance roles), you can omit the access keys.

For more information, see the official AWS docs: [VPC Interface Endpoints for CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CloudWatchLogs-and-InterfaceVPC.html).