diff --git a/filter-plugin/logstash-filter-databricks-guardium/AzureDatabricksOverAzureEventHub/databricks.conf b/filter-plugin/logstash-filter-databricks-guardium/AzureDatabricksOverAzureEventHub/databricks.conf index b2b2e68ca..8c0c7dd18 100644 --- a/filter-plugin/logstash-filter-databricks-guardium/AzureDatabricksOverAzureEventHub/databricks.conf +++ b/filter-plugin/logstash-filter-databricks-guardium/AzureDatabricksOverAzureEventHub/databricks.conf @@ -9,6 +9,11 @@ input { consumer_group => "gconsumer_group" decorate_events => false type => "databricks" + #Insert Storage Connection String retrived while creating the storage account. + #Recommanded when the requirement is to read from multiple EventHubs. Else, data loss may happen. + storage_connection => "" + #Insert your enrollmentId of azure account + add_field => {"enrollmentId" => } } } diff --git a/filter-plugin/logstash-filter-databricks-guardium/README.md b/filter-plugin/logstash-filter-databricks-guardium/README.md index fbd768b3d..709a46f35 100644 --- a/filter-plugin/logstash-filter-databricks-guardium/README.md +++ b/filter-plugin/logstash-filter-databricks-guardium/README.md @@ -4,7 +4,7 @@ * Environment: Azure * Supported inputs: Azure Event Hub (pull) * Supported Guardium versions: -* Guardium Data Protection: 11.4 and above + * Guardium Data Protection: 11.4 and above This is a [Logstash](https://github.com/elastic/logstash) filter plug-in for the universal connector that is featured in IBM Security Guardium. It parses events and messages from the Azure-Databricks audit log into a [Guardium record](https://github.com/IBM/universal-connectors/blob/main/common/src/main/java/com/ibm/guardium/universalconnector/commons/structures/Record.java) instance (which is a standard structure made out of several parts). The information is then sent over to Guardium. Guardium records include the accessor (the person who tried to access the data), the session, data, and exceptions. If there are no errors, the data contains details about the query "construct". The construct details the main action (verb) and collections (objects) involved. @@ -27,69 +27,70 @@ The plug-in is free and open-source (Apache 2.0). It can be used as a starting p ### Azure Event Hub Connection: - 1. Search "event hub" in the search bar. - 2. Select **create event hubs namespace**. - - 3. To create a namespace: - - Select the subscription in which you want to create the namespace. - - Select the resource group you created in the previous step. - - Enter a unique name for the namespace. - - Select a location for the namespace. - - Choose the appropriate pricing tier. (In this example, we selected basic). - - Leave the throughput units (or processing units for standard and premium tier) settings as is. - - Select **Review + Create**. Review the settings and select **Create**. - - Your recently created namespace appears in **resource group**. +1. Search "event hub" in the search bar. +2. Select **create event hubs namespace**. + +3. To create a namespace: + - Select the subscription in which you want to create the namespace. + - Select the resource group you created in the previous step. + - Enter a unique name for the namespace. + - Select a location for the namespace. + - Choose the appropriate pricing tier. (In this example, we selected basic). + - Leave the throughput units (or processing units for standard and premium tier) settings as is. + - Select **Review + Create**. Review the settings and select **Create**. + - Your recently created namespace appears in **resource group**. - 4. To create an event hub: - - Go to the Event Hubs Namespace page. - - Click **+ Event Hub**. - - Enter a unique name for the event hub. - - Choose at least the maximum number of partitions that you expect to require during peak usage for this event hub. - For example, if you want to generate traffic from 2 DB instances, then choose at least 2 partitions if not more. - - Click **Review+create**. - - Review the settings and click **Create**. +4. To create an event hub: + - Go to the Event Hubs Namespace page. + - Click **+ Event Hub**. + - Enter a unique name for the event hub. + - Choose at least the maximum number of partitions that you expect to require during peak usage for this event hub. + For example, if you want to generate traffic from 2 DB instances, then choose at least 2 partitions if not more. + - Click **Review+create**. + - Review the settings and click **Create**. - 5. Connection string for an event hub: - - In the list of event hubs, select your event hub. - - On the Event Hubs instance page, go to **Settings** > **Shared access policies** > **Add**. - - Name the policy, click **manage** to provide permissions, and create the policy. - - Select Connection string–primary key string from policy (it would be required in input plugin). - - 6. Azure Storage Accounts Creation: - - Login to https://portal.azure.com. - - Search Storage accounts in search bar. - - Click **Create**. - - Basic Tab: - - Select the subscription in which you want to create the storage account. - - Select an existing resource group or create a new one. - - Enter a unique name for the storage account. - - Select the same region for the storage account that you selected for the server. - - Choose any performance type. - - Select **Geo-redundant(GRS) Redundancy configuration**. - - Select **Make read access to data**. - - Click **Next:Advance**. - - Advanced tab: - - **Require secure transfer** should already be selected. - - **Allow enabling public access** should already be selected. - - **Enable storage account key access** should already be selected. - - Select the latest TLS version. - - Permitted scope should display the default value (from any storage account). - - The remaining parameters (Hierarchical Namespace, Access protocols, Blob storage, and Azure Files) should display the default values provided by Azure. - - Click **Next:Networking**. - - Networking tab: - - Enable public access from all networks for **Network access**. - - Select **Microsoft network routing** for **Routing preference**. - - Click **Next:Data protection**. - - Data protection tab: - - Keep the default values provided by Azure. - - Click **Next:Encryption**. - - Encryption tab: - - **Encryption type** should already be set to **Microsoft-managed key(MMK)**. - - **Enable support for customer-managed keys** should be set to the default value (**blobs and files**). - - By default, **Infrastructure encryption** should not be enabled. - - Click **Next:Tags**. - - For the Tags tab, make no changes and click **Next:Review**. - - Click **Create** after you review all the parameters. +5. Connection string for an event hub: + - In the list of event hubs, select your event hub. + - On the Event Hubs instance page, go to **Settings** > **Shared access policies** > **Add**. + - Name the policy, click **manage** to provide permissions, and create the policy. + - Select Connection string–primary key string from policy (it would be required in input plugin). + +6. Azure Storage Accounts Creation: + - Login to https://portal.azure.com. + - Search Storage accounts in search bar. + - Click **Create**. + - Basic Tab: + - Select the subscription in which you want to create the storage account. + - Select an existing resource group or create a new one. + - Enter a unique name for the storage account. + - Select the same region for the storage account that you selected for the server. + - Choose any performance type. + - Select **Geo-redundant(GRS) Redundancy configuration**. + - Select **Make read access to data**. + - Click **Next:Advance**. + - Advanced tab: + - **Require secure transfer** should already be selected. + - **Allow enabling public access** should already be selected. + - **Enable storage account key access** should already be selected. + - Select the latest TLS version. + - Permitted scope should display the default value (from any storage account). + - The remaining parameters (Hierarchical Namespace, Access protocols, Blob storage, and Azure Files) should display the default values provided by Azure. + - Click **Next:Networking**. + - Networking tab: + - Enable public access from all networks for **Network access**. + - Select **Microsoft network routing** for **Routing preference**. + - Click **Next:Data protection**. + - Data protection tab: + - Keep the default values provided by Azure. + - Click **Next:Encryption**. + - Encryption tab: + - **Encryption type** should already be set to **Microsoft-managed key(MMK)**. + - **Enable support for customer-managed keys** should be set to the default value (**blobs and files**). + - By default, **Infrastructure encryption** should not be enabled. + - Click **Next:Tags**. + - For the Tags tab, make no changes and click **Next:Review**. + - Click **Create** after you review all the parameters. + ### Link event hub to Databricks @@ -98,25 +99,26 @@ The plug-in is free and open-source (Apache 2.0). It can be used as a starting p 2. Navigate to your Azure Databricks. Open the Diagnostic settings pane under the Monitoring section. 3. After the page opens, you will need to create a new diagnostic setting. 4. In the Diagnostic settings pane, fill in the form with your preferred categories. -5. Select your categories details, and then send your logs to your preferred destination, in this case, we check **Stream to an event hub**, and put prefered event hub information in. +5. Select your categories details, and then send your logs to your preferred destination, in this case, we check **Stream to an event hub** and **Archive to a storage account**, and put prefered event hub and Storage account information in. 6. Launch your Databricks Workspace and go to profile at top right corner. 7. click ```Settings```, go to ```Advanced```, search for ```Verbose Audit Logs``` and turn it on. -## 4. Connecting to the Azure Databricks +## 2. Connecting to the Azure Databricks ### Insert/Update data through Data Explorer 1. Login to https://portal.azure.com. 2. Navigate to your Azure Databricks. Launch Workspace. 3. Under **SQL Editor**, you can run sql command by creating new quert scripts. -## 5. Limitations +## 3. Limitations 1. The following important fields couldn't be mapped with Databricks audit logs: - The following field are not found in original audit log from Azure Databricks: Database name, ProtocolVersion, AppUserName, Client mac, Common Protocol, Os User, ClientOs, ServerOs. -2. The log with sql excution will not have client ip, but it will come with another log with action name of "commandFinish". + The following fields are not found in original audit log from Azure Databricks: Database name, ProtocolVersion, AppUserName, Client mac, Common Protocol, Os User, ClientOs, ServerOs. +2. The log with sql execution will not have client ip, but it will come with another log with action name of "commandFinish". 3. The eventhub takes 10~30 minutes to receive raw logs from Databricks, the same delay time for Guardium is expected. +4. If queries are submitted as part of a notebook cell, job, or script, Databricks may log the entire execution context (e.g., the notebook run or job task) rather than each individual SQL query. In this case, Guardium will not be able to form separate records and only parse the first statement. -## 6. Configuring the Azure-Databricks filter in Guardium +## 4. Configuring the Azure-Databricks filter in Guardium The Guardium universal connector is the Guardium entry point for native audit logs. The Guardium universal connector identifies and parses the received events, and converts them to a standard Guardium format. The output of the Guardium universal connector is forwarded to the Guardium sniffer on the collector, for policy and auditing enforcements. Configure Guardium to read the native audit logs by customizing the Azure-Databricks template. ### Before you begin diff --git a/filter-plugin/logstash-filter-databricks-guardium/logstash-filter-databricks_guardium_filter.zip b/filter-plugin/logstash-filter-databricks-guardium/logstash-filter-databricks_guardium_filter.zip index 097d5bc34..5ef3167dc 100644 Binary files a/filter-plugin/logstash-filter-databricks-guardium/logstash-filter-databricks_guardium_filter.zip and b/filter-plugin/logstash-filter-databricks-guardium/logstash-filter-databricks_guardium_filter.zip differ diff --git a/filter-plugin/logstash-filter-databricks-guardium/src/main/java/com/ibm/guardium/databricks/Parser.java b/filter-plugin/logstash-filter-databricks-guardium/src/main/java/com/ibm/guardium/databricks/Parser.java index 90404f2ca..350776e28 100644 --- a/filter-plugin/logstash-filter-databricks-guardium/src/main/java/com/ibm/guardium/databricks/Parser.java +++ b/filter-plugin/logstash-filter-databricks-guardium/src/main/java/com/ibm/guardium/databricks/Parser.java @@ -64,11 +64,12 @@ public static Record parseRecord(final JsonObject records) { String accountId = getAccountId(records);// second part of resourceId String sessionId = getSessionId(requestParams); String serviceName = getServiceName(properties); + String dbName = subId+":"+serviceName; record.setSessionId(sessionId); - record.setDbName(Constants.UNKNOWN_STRING); + record.setDbName(dbName); record.setAppUserName(Constants.UNKNOWN_STRING); record.setAccessor(parseAccessor(subId, accountId, properties, records)); - record.getAccessor().setServiceName(Constants.NOT_AVAILABLE); + record.getAccessor().setServiceName(dbName); record.setSessionLocator(parserSessionLocator(properties)); String response = properties.get(Constants.RESPONSE).toString(); @@ -365,7 +366,6 @@ static Accessor parseAccessor(String subId, String accountId, JsonObject propert accessor.setServerType(Constants.SERVER_TYPE); accessor.setDbProtocol(Constants.DATA_PROTOCOL); accessor.setDbProtocolVersion(Constants.UNKNOWN_STRING); - accessor.setServiceName(Constants.NOT_AVAILABLE); // Set source program (user agent) accessor.setSourceProgram(properties.has(Constants.USER_AGENT) diff --git a/filter-plugin/logstash-filter-databricks-guardium/src/test/java/com/ibm/guardium/databricks/ParserTest.java b/filter-plugin/logstash-filter-databricks-guardium/src/test/java/com/ibm/guardium/databricks/ParserTest.java index 741d62bf0..a8d00b75a 100644 --- a/filter-plugin/logstash-filter-databricks-guardium/src/test/java/com/ibm/guardium/databricks/ParserTest.java +++ b/filter-plugin/logstash-filter-databricks-guardium/src/test/java/com/ibm/guardium/databricks/ParserTest.java @@ -23,7 +23,7 @@ void testParseServiceName() { final String DatabricksString = "{ \"resourceId\": \"/SUBSCRIPTIONS/5C0C81D4-656F-415D-8599-DCD86F2F665E/RESOURCEGROUPS/DATABRICKSTEST/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/DATABRICK-TEST\", \"operationVersion\": \"1.0.0\", \"identity\": \"{\\\"email\\\":\\\"abc.h@abc.com\\\",\\\"subjectName\\\":null}\", \"operationName\": \"Microsoft.Databricks/accounts/tokenLogin\", \"time\": \"2025-05-01T13:43:28Z\", \"category\": \"accounts\", \"properties\": {\"sourceIPAddress\":\"20.193.136.102\",\"logId\":\"87e1a69e-444a-434e-a0be-648b1797e6d9\",\"serviceName\":\"accounts\",\"userAgent\":\"Apache-HttpClient/4.5.14 (Java/17.0.13) Databricks-Service/driver DBHttpClient/v2RawClient\",\"response\":\"{\\\"statusCode\\\":200}\",\"sessionId\":null,\"actionName\":\"tokenLogin\",\"requestId\":\"0018d88c-2f52-4cf0-86a4-8d1dc416ab10\",\"requestParams\":\"{\\\"user\\\":\\\"abc@abc.com\\\",\\\"tokenId\\\":\\\"kfhjgjfhdgjkdh39284783297423943hejhfkdsfh39\\\",\\\"authenticationMethod\\\":\\\"API_INT_PAT_TOKEN\\\"}\"}, \"Host\": \"1234-123456-ab1c2d3e-12-123-12-1\"}"; final JsonObject DatabricksJson = JsonParser.parseString(DatabricksString).getAsJsonObject(); Record record = Parser.parseRecord(DatabricksJson); - assertEquals("N.A.", record.getAccessor().getServiceName()); + assertEquals("5C0C81D4-656F-415D-8599-DCD86F2F665E:accounts", record.getAccessor().getServiceName()); } diff --git a/filter-plugin/logstash-filter-mongodb-guardium/IBMCloudMongoDB_README.md b/filter-plugin/logstash-filter-mongodb-guardium/IBMCloudMongoDB_README.md index 2602f2609..0b9543020 100644 --- a/filter-plugin/logstash-filter-mongodb-guardium/IBMCloudMongoDB_README.md +++ b/filter-plugin/logstash-filter-mongodb-guardium/IBMCloudMongoDB_README.md @@ -195,7 +195,7 @@ Verify [here](https://ondeck.console.cloud.ibm.com/docs/cloud-logs?topic=cloud-l - Verify all of the information and click ```Create stream```. ## 4. Limitations -1. The analysis is based on IBM Cloud Database for MongoDB 4.4. +1. The analysis is based on IBM Cloud Database for MongoDB 7.0. 2. Logs for SQL errors do not get generated from the data source. 3. IBM Cloud Databases for MongoDB only supports 22 events. See [here](https://cloud.ibm.com/docs/databases-for-mongodb?topic=databases-for-mongodb-auditlogging) for more information. 4. In this example, we used both CLI and UI queries to run the analysis. diff --git a/filter-plugin/logstash-filter-trino-guardium/README.md b/filter-plugin/logstash-filter-trino-guardium/README.md index eebacb9cd..b580289e1 100644 --- a/filter-plugin/logstash-filter-trino-guardium/README.md +++ b/filter-plugin/logstash-filter-trino-guardium/README.md @@ -101,4 +101,4 @@ enforcements. the Universal Connector using the ```Disable/Enable``` button. #### Limitations - • Client Hostname and Source Program will be seen as blank in report. \ No newline at end of file + • Client Hostname and Source Program will be seen as blank in report. diff --git a/input-plugin/logstash-input-cloudwatch-logs/README.md b/input-plugin/logstash-input-cloudwatch-logs/README.md index dfc7a1944..b229e1692 100644 --- a/input-plugin/logstash-input-cloudwatch-logs/README.md +++ b/input-plugin/logstash-input-cloudwatch-logs/README.md @@ -69,6 +69,11 @@ The `region` setting allows specify the region in which the Cloudwatch log group #### `codec` The `codec` setting allows specify, the codec used for input data. Input codecs are a convenient method for decoding the data before it enters the input, without needing a separate filter in the Logstash pipeline. + ##### `codec pattern` + The `codec pattern` is a regular expression that Logstash uses to identify lines that are either the start of a new multiline event or a continuation of a previous one. + For the Redshift and Postgres plug-ins, update the value of the pattern parameter from the inputs section as specified in the codec pattern. + For Redshift, add pattern from [here] ( https://github.com/IBM/universal-connectors/blob/main/filter-plugin/logstash-filter-redshift-aws-guardium/redshift-over-cloudwatch.conf ) + For Postgres, add pattern from [here] ( https://github.com/IBM/universal-connectors/blob/main/filter-plugin/logstash-filter-postgres-guardium/PostgresOverCloudWatchPackage/postgresCloudwatch.conf ) #### `role_arn` The role_arn setting allows you to specify which AWS IAM Role to assume, if any. This is used to generate temporary credentials, typically for cross-account access. To understand more about the settings to be followed while using this parameter, click [here]( ./SettingsForRoleArn.md ) @@ -108,4 +113,45 @@ Other standard logstash parameters are available such as: ``` grdapi add_domain_to_universal_connector_allowed_domains domain=amazonaws.com grdapi add_domain_to_universal_connector_allowed_domains domain=amazon.com -``` \ No newline at end of file +``` +``` + + +## Troubleshooting: +### Using VPC Endpoints for AWS Connectivity + +If Logstash is unable to connect to AWS CloudWatch Logs due to **network restrictions** (e.g. Traffic restricted to private subnets), you may need to route the connection through an **AWS VPC Endpoint**. + +### Solution: Use a VPC Endpoint with a Custom Endpoint URL + +When running Logstash inside a **VPC** (Virtual Private Cloud), especially in private subnets, the AWS SDK cannot reach the public `logs.{region}.amazonaws.com` endpoint unless explicitly allowed. +To solve this, configure the plugin to use your VPC Endpoint along with AWS's bundled Certificate Authority (CA) for secure communication. + +#### Example Configuration + +```logstash +input { + cloudwatch_logs { + # Default Configuration + log_group => "/aws/lambda/my-lambda-function" + region => "us-east-1" + access_key_id => "YOUR_ACCESS_KEY" + secret_access_key => "YOUR_SECRET_KEY" + + # Use your private VPC Endpoint instead of the public AWS endpoint + endpoint => "https://vpce-xxxxxxxxabcdef.logs.us-east-1.vpce.amazonaws.com" + # Ensures the connection uses AWS's trusted root certificates + use_aws_bundled_ca => true + } +} +``` + +> **Note**: Replace the `endpoint` URL with your actual VPC Endpoint DNS name. You can find this in the AWS Console under **VPC > Endpoints**. + +### Additional Notes + +- Make sure the VPC Endpoint is created for the **CloudWatch Logs interface service** (`com.amazonaws.us-east-1.logs`). +- Ensure your **subnet** and **security group** allow HTTPS traffic to the endpoint. +- If you're using **IAM roles** (e.g., EC2 instance roles), you can omit the access keys. + +For more information, see the official AWS docs: [VPC Interface Endpoints for CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CloudWatchLogs-and-InterfaceVPC.html).