Skip to content

HDDS-15395. Grafana dashboard for container balancer#10398

Open
navinko wants to merge 2 commits into
apache:masterfrom
navinko:HDDS-15395
Open

HDDS-15395. Grafana dashboard for container balancer#10398
navinko wants to merge 2 commits into
apache:masterfrom
navinko:HDDS-15395

Conversation

@navinko
Copy link
Copy Markdown
Contributor

@navinko navinko commented May 31, 2026

What changes were proposed in this pull request?

Add Grafana dashboard for container balancer

Please describe your PR in detail:

Created a Grafana dashboard to display the metrics relevant to balancer operation.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15395

How was this patch tested?

Simulated unbalanced cluster -> Triggered balancer through CLI -> Captured dashboard screen shot and validated panel
Used same json "Ozone - Container Balancer Metrics.json" for current PR
Screen shots for references:

image image image image

Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @navinko for working on this.

Comment on lines +1062 to +1063
"datasource": {
"name": "dfm13nk97y9kwf"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t hardcode dfm13nk97y9kwf. Use ${datasource} like the other panels. Remove the hardcoded UID from the variable default too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also updated DatasourceVariable spec to "default"

Comment on lines +251 to +253
"expr": "sum(increase(container_balancer_metrics_data_size_moved_gb[$__range]))",
"legendFormat": "Moved Data Size (GB)",
"range": true
Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be sum(container_balancer_metrics_data_size_moved_gb_in_latest_iteration) instead?
since the title of this panel shows 'Size Moved (Latest)'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this !
For the latest run stats , was trying to get delta using total data size moved metrics "container_balancer_metrics_data_size_moved_gb"

  • I was trying to apply rate with aggregate function sum (), which makes query invalid , to fix that it requires to bind with increase function .

Realised this is not even required we already have another metrics for latest _iteration
"container_balancer_metrics_data_size_moved_gb_in_latest_iteration"
Fixed this now :

image image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sreejasahithi for review , updated json

@navinko navinko requested a review from sreejasahithi June 1, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants