cli : move to HTTP-based implementation#24948
Conversation
Co-authored-by: Piotr Wilkin <ilintar@gmail.com>
| return matches; | ||
| } | ||
|
|
||
| // note: make this view implementation generic, so that we can move to TUI in the future if we want to |
There was a problem hiding this comment.
Hm, maybe instead of doing it like this, just make cli-view virtual and make this a cli-view-console class that implements it? That would just require providing a cli-view-tui to substitute the view class.
There was a problem hiding this comment.
IMO it's quite over-engineer at this stage: the current singleton design already works well enough, and converting it to polymorphism when we want to do TUI is trivial (and even if we arrive there, I doubt if we will want to keep both console:: and TUI; we will likely use one of the two, so go back to singleton)
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
| std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>()); | ||
| cur_msg += "--- File: "; | ||
| cur_msg += fname; | ||
| cur_msg += " ---\n"; | ||
| cur_msg += content; |
There was a problem hiding this comment.
This kills the FIM separator functionality, can we add text media support to server so text files can be attached separately and handled on a per-model basis?
ggerganov
left a comment
There was a problem hiding this comment.
Generally, I think at some point we have to decouple the HTTP layer from the server and client logics. The server and client should communicate with generic messages and how the messages are communicated would be implementation-specific. One way is HTTP of course, but we can also have other way - e.g. in-process message queue. This will allow us to integrate the server logic more broadly in applications without having to depend on the HTTP stack.
| // set by the SIGINT handler; cleared once the interrupt has been handled | ||
| extern std::atomic<bool> g_cli_interrupted; |
There was a problem hiding this comment.
The cli context can return a reference to this flag - no need to make it global.
| cli_client client; // always initialized | ||
| std::optional<cli_server> server; // only set when no --server-base is given | ||
|
|
||
| json messages = json::array(); | ||
| json pending_media = json::array(); // staged multimodal content parts |
There was a problem hiding this comment.
These should be pimpl-ed to avoid including the json header in other headers.
| std::string message = "\nAvailable models:"; | ||
| if (!models.empty()) { | ||
| for (size_t i = 0; i < models.size(); ++i) { | ||
| message += "\n " + std::to_string(i + 1) + ". " + models[i]; | ||
| } | ||
| } | ||
| message += "\n"; |
There was a problem hiding this comment.
It would be useful to also list the aliases for each model config.
| #include "server-context.h" | ||
| #include "server-task.h" | ||
| #include "cli-context.h" | ||
| #include "cli-view.h" |
There was a problem hiding this comment.
| #include "cli-view.h" |
There was a problem hiding this comment.
This file seems to provide some UI primitives, so it would be better to call it cli-ui.h for example.
| // POST request with an SSE streaming response; on_data is invoked once | ||
| // per "data:" event; the function returns after the stream is finished: | ||
| // a null json on graceful exit (incl. cancellation via should_stop), | ||
| // the error response json otherwise | ||
| json post_sse(const std::string & path, | ||
| const json & body, | ||
| const std::function<bool()> & should_stop, | ||
| const std::function<void(const json &)> & on_data); |
There was a problem hiding this comment.
I don't think the client needs to be coupled to json. It can return raw strings and let the cli context decode the json.
Indeed, it's already possible with the current implementation: in I'm planning to add an example for that, i.e. allow using server in downstream app without going through the HTTP stack, but just wondering if we should firstly move |
Overview
Supersede #21674
--server-baseargument that allow CLI to connect to a remote llama-server instanceDesign choice:
cli-contextholds the main context and state (list of messages, display states, etc)cli-clientis a thin wrapper around HTTP client, provide OAI-compat client APIcli-serveris optional, to manage owned llama-server instance (running in a thread) if remote server is not specifiedcli-viewis an abstraction for view state, provide some RAII display component (generic, reusable components)Requirements