Conversation
There was a problem hiding this comment.
Using an extra gen_server just to crash the Rabbit app seems a bit hacky to me, though it works. I'm sure there must be a better way to crash Rabbit when ra_systems_sup exits.
|
Here is a full log from a raft.data_dir = /path/to/small/tmpfs/mount
# Tune down WAL file size. If there are a few large WAL
# files then recovery might take long enough that it avoids
# hitting max restart intensity.
raft.wal_max_size_bytes = 67108864 |
|
Thanks, I will review it next week when I get back from holidays. |
| init(_) -> | ||
| process_flag(trap_exit, true), | ||
| Pid = whereis(ra_systems_sup), | ||
| true = link(Pid), |
There was a problem hiding this comment.
Don’t you need to unlink in the terminate/2 callback? Otherwise, stopping rabbit will kill ra_systems_sup. I know that rabbit messes with its dependencies so it might not be a problem right now. But if one day, we fix rabbit, it will be one.
The goal of this change is to shut down Rabbit when the `ra_systems_sup` supervisor exits. If disk is completely exhausted then the `ra_systems_sup` supervisor can exit from the repeated `enospc` errors. If we do not also crash Rabbit, Rabbit continues on but Khepri is unavailable, so no incoming connections can successfully log in. Plus there are other effects like `rabbit_vhost_process` deleting a vhost because Khepri does not say that it exists.
56ddbbd to
8a3cd92
Compare
| true = link(Pid), | ||
| {ok, Pid, hibernate}. | ||
|
|
||
| handle_call(_Request, _From, State) -> {noreply, State}. |
There was a problem hiding this comment.
This will leave the caller waiting for a reply, until timeout if it set one. I think you can reply with an ok just to avoid that. What do you think?
| ?MODULE_STRING ": Ra system supervisor exited with reason ~tp~n", | ||
| [Reason], | ||
| #{domain => ?RMQLOG_DOMAIN_GLOBAL}), | ||
| exit(E); |
There was a problem hiding this comment.
What about returning {stop, Reason, State}? Would it have a different behaviour compared to exiting?
There was a problem hiding this comment.
Yeah actually, Rabbit doesn't exit unless we use exit/1 here. With {stop, Reason, State} Rabbit keeps going. I'm not really sure why. This is the part that seems pretty hacky to me 😅
The goal of this change is to shut down Rabbit when the
ra_systems_supsupervisor exits. If disk is completely exhausted then thera_systems_supsupervisor can exit from the repeatedenospcerrors. If we do not also crash Rabbit, Rabbit continues on but Khepri is unavailable, so no incoming connections can successfully log in. Plus there are other effects likerabbit_vhost_processdeleting a vhost because Khepri does not say that it exists.Connects to rabbitmq/ra#585