Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ func (o *ScaleUpOrchestrator) ScaleUp(
}

// Execute scale up.
klog.V(1).Infof("Final scale-up plan: %v", scaleUpInfos)
klog.V(1).Infof("Final scale-up plan: %v %v", totalCapacity, scaleUpInfos)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was extracted from #9315 . Let's continue the discussion here @aleksandra-malinowska @jbtk

Main considerations:

  • potentially breaking log queries - Fair argument, but I think we should find a middle ground. Otherwise it could lead to treating our logs like an API and versioning them. In this case no information is dropped, scaleUpInfo format is unchanged, only a summary is added. I assume that tooling based on log parsing will from time to time require some tweaking and should be avoided.
  • why do we need it? - Figuring out a scale-up size in a regional cluster requires doing N subtractions and adding them in memory. For a single scale-up that's ok, but when analyzing a long sequence of scale-up it's suboptimal.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add something like finalScaleUpSize to give more context in the log line?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a cleanup of any kind - please fix.

+1 to @damikag - it would be better to introduce some context for this number that is added here. If we are changing the log line we could also add information across how many groups this is spread (this is also sth that is visible in the log line, but requires parsing)

Have you considered another approach - adding a separate log line with this information? I guess that there is a tradeoff between changing this log line and perfomance/amount of logs if we add additional log line - have you considered this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair argument, but I think we should find a middle ground. Otherwise it could lead to treating our logs like an API and versioning them.

Like I mentioned previously, this specific log line is fairly unique with regard to this. It's essentially a source-of-truth for the final CA decision, whatever happens later.

I'm not particularly attached to the rest of our logging, and I suspect it could probably use substantial improvements.

only a summary is added

My main issue is that this new addition is in the middle of the log and has zero context, it's just an integer inserted in between "scale-up plan:" and describing the actual plan.

The former makes it potentially misleading to people who routinely query CA logs for debugging purposes, as discussed before.

The latter makes it unhelpful for people who aren't used to reading CA logs. Depending on the scale-up size, it's just a magic number that can be mistaken for something else (# of scaled-up node groups, # of considered options, # of accommodated pods). You'd have to either read the code or reverse-engineer just to find out what it means, neither of which is ideal when starting basic troubleshooting.

At least let's add a message describing what this number means (ideally something easily understood by the new users, like "adding a total of %d nodes across %d groups"). Personally I'd also prefer if it was added at the end or logged separately, but with a descriptive message it's at least less likely to quietly break a debugging flow.

aErr, failedNodeGroups := o.scaleUpExecutor.ExecuteScaleUps(scaleUpInfos, nodeInfos, now, allOrNothing)
if aErr != nil {
return status.UpdateScaleUpError(
Expand Down
Loading