-
Notifications
You must be signed in to change notification settings - Fork 9
Add metrics about the solver-service behaviour #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
moyodiallo
wants to merge
5
commits into
ocurrent:main
Choose a base branch
from
moyodiallo:metrics
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
9975663
Add metrics on the internal-workers
moyodiallo 08be7f2
Update from Thomas's comment
moyodiallo fbe5cd9
Update service/solver.ml
moyodiallo b857d5f
Refactoring, fixing the number of waiting requests
moyodiallo c790a40
Add the total time spent finding solutions.
moyodiallo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern (incr, handle, decr) feels unsafe in the presence of exceptions from
handle requestcc @talex5There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle requestneeds to catch those exceptions otherwise we're losing a worker and that will result as a bug.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if a worker crashes then the whole service exits, so it's not strictly necessary to handle exceptions here, although it would make the code more robust to future changes. However, I don't think you need this counter. The number of running workers is always
min n_workers n_jobs, so you can just calculate it as needed.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, I did not get this
min n_workers n_jobs. Thanks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we don't have
n_jobsin the pool. All we have is waiting jobs (jobs Eio.Stream.t). It won't be precise if consider waiting jobs asn_jobs, ex. 2 jobs waiting and all the 8 workers busy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be useful to see is:
That would let us answer questions like total solver_worker capacity available, utilisation of the total capacity, and saturation of the total capacity and per solver_worker saturation.
I thought from reading the code and https://github.com/ocaml-multicore/eio?search=1#multicore-support, Pool pre_forked a number of OS threads (Domains?) which would wake up when work was added to the Eio.Stream? Is that accurate @talex5 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right - there is a fixed pool of workers and they will all be busy unless there just aren't any jobs in the queue.
The issue here is that
run_workeris running in a worker domain, and so can't (currently) update Prometheus metrics itself (which I guess it why it's updating an atomic instead and waiting for the main domain to report that). But the main domain can work out how many workers are running just by knowing how many jobs are outstanding, so this isn't necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right there's a way to do that.