Skip to content

cli : move to HTTP-based implementation#24948

Open
ngxson wants to merge 10 commits into
masterfrom
xsn/cli_http_based
Open

cli : move to HTTP-based implementation#24948
ngxson wants to merge 10 commits into
masterfrom
xsn/cli_http_based

Conversation

@ngxson

@ngxson ngxson commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Overview

Supersede #21674

  • Add --server-base argument that allow CLI to connect to a remote llama-server instance
  • If not specified, CLI spawns a server instance via a random port (as a thread, NOT a dedicated process)
  • If remote server is in router mode, also ask which model to be used

Design choice:

  • cli-context holds the main context and state (list of messages, display states, etc)
  • cli-client is a thin wrapper around HTTP client, provide OAI-compat client API
  • cli-server is optional, to manage owned llama-server instance (running in a thread) if remote server is not specified
  • cli-view is an abstraction for view state, provide some RAII display component (generic, reusable components)

Requirements

@pwilkin pwilkin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just minor nits.

Comment thread tools/cli/cli-view.h
return matches;
}

// note: make this view implementation generic, so that we can move to TUI in the future if we want to

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe instead of doing it like this, just make cli-view virtual and make this a cli-view-console class that implements it? That would just require providing a cli-view-tui to substitute the view class.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's quite over-engineer at this stage: the current singleton design already works well enough, and converting it to polymorphism when we want to do TUI is trivial (and even if we arrive there, I doubt if we will want to keep both console:: and TUI; we will likely use one of the two, so go back to singleton)

Comment thread tools/server/server.cpp Outdated
Comment thread tools/server/server.cpp Outdated
ngxson and others added 3 commits June 23, 2026 22:48
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Comment thread tools/cli/cli-context.cpp
Comment on lines +381 to +385
std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
cur_msg += "--- File: ";
cur_msg += fname;
cur_msg += " ---\n";
cur_msg += content;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kills the FIM separator functionality, can we add text media support to server so text files can be attached separately and handled on a per-model basis?

@ggerganov ggerganov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think at some point we have to decouple the HTTP layer from the server and client logics. The server and client should communicate with generic messages and how the messages are communicated would be implementation-specific. One way is HTTP of course, but we can also have other way - e.g. in-process message queue. This will allow us to integrate the server logic more broadly in applications without having to depend on the HTTP stack.

Comment thread tools/cli/cli-context.h
Comment on lines +17 to +18
// set by the SIGINT handler; cleared once the interrupt has been handled
extern std::atomic<bool> g_cli_interrupted;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cli context can return a reference to this flag - no need to make it global.

Comment thread tools/cli/cli-context.h
Comment on lines +23 to +27
cli_client client; // always initialized
std::optional<cli_server> server; // only set when no --server-base is given

json messages = json::array();
json pending_media = json::array(); // staged multimodal content parts

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be pimpl-ed to avoid including the json header in other headers.

Comment thread tools/cli/cli-context.cpp
Comment on lines +172 to +178
std::string message = "\nAvailable models:";
if (!models.empty()) {
for (size_t i = 0; i < models.size(); ++i) {
message += "\n " + std::to_string(i + 1) + ". " + models[i];
}
}
message += "\n";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to also list the aliases for each model config.

Comment thread tools/cli/cli.cpp
#include "server-context.h"
#include "server-task.h"
#include "cli-context.h"
#include "cli-view.h"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "cli-view.h"

Comment thread tools/cli/cli-view.h

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to provide some UI primitives, so it would be better to call it cli-ui.h for example.

Comment thread tools/cli/cli-client.h
Comment on lines +28 to +35
// POST request with an SSE streaming response; on_data is invoked once
// per "data:" event; the function returns after the stream is finished:
// a null json on graceful exit (incl. cancellation via should_stop),
// the error response json otherwise
json post_sse(const std::string & path,
const json & body,
const std::function<bool()> & should_stop,
const std::function<void(const json &)> & on_data);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the client needs to be coupled to json. It can return raw strings and let the cli context decode the json.

@ngxson

ngxson commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Generally, I think at some point we have to decouple the HTTP layer from the server and client logics. The server and client should communicate with generic messages and how the messages are communicated would be implementation-specific. One way is HTTP of course, but we can also have other way - e.g. in-process message queue. This will allow us to integrate the server logic more broadly in applications without having to depend on the HTTP stack.

Indeed, it's already possible with the current implementation: in server-http.h we already had server_http_context is an abstraction on top of HTTP request handling, and the current impl uses httplib. A downstream application can provide another impl without using httplib

I'm planning to add an example for that, i.e. allow using server in downstream app without going through the HTTP stack, but just wondering if we should firstly move server_context to a new tools/engine shared library (libllama-engine). After that, tools/server will only contain httplib, router, tools and non-inference stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants