cli : move to HTTP-based implementation by ngxson · Pull Request #24948 · ggml-org/llama.cpp

ngxson · 2026-06-23T14:53:52Z

Overview

Supersede #21674

Add --server-base argument that allow CLI to connect to a remote llama-server instance
If not specified, CLI spawns a server instance via a random port (as a thread, NOT a dedicated process)
If remote server is in router mode, also ask which model to be used

Design choice:

cli-context holds the main context and state (list of messages, display states, etc)
cli-client is a thin wrapper around HTTP client, provide OAI-compat client API
cli-server is optional, to manage owned llama-server instance (running in a thread) if remote server is not specified
cli-view is an abstraction for view state, provide some RAII display component (generic, reusable components)

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:
- code copied from cli: add option to connect to server via http(s) #21674 --> not sure
- code added in this PR is human-written

Co-authored-by: Piotr Wilkin <ilintar@gmail.com>

pwilkin

Looks good, just minor nits.

pwilkin · 2026-06-23T15:55:09Z

+    return matches;
+}
+
+// note: make this view implementation generic, so that we can move to TUI in the future if we want to


Hm, maybe instead of doing it like this, just make cli-view virtual and make this a cli-view-console class that implements it? That would just require providing a cli-view-tui to substitute the view class.

IMO it's quite over-engineer at this stage: the current singleton design already works well enough, and converting it to polymorphism when we want to do TUI is trivial (and even if we arrive there, I doubt if we will want to keep both console:: and TUI; we will likely use one of the two, so go back to singleton)

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

CISC · 2026-06-24T07:38:44Z

+        std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
+        cur_msg += "--- File: ";
+        cur_msg += fname;
+        cur_msg += " ---\n";
+        cur_msg += content;


This kills the FIM separator functionality, can we add text media support to server so text files can be attached separately and handled on a per-model basis?

ggerganov

Generally, I think at some point we have to decouple the HTTP layer from the server and client logics. The server and client should communicate with generic messages and how the messages are communicated would be implementation-specific. One way is HTTP of course, but we can also have other way - e.g. in-process message queue. This will allow us to integrate the server logic more broadly in applications without having to depend on the HTTP stack.

ggerganov · 2026-06-24T07:41:19Z

+// set by the SIGINT handler; cleared once the interrupt has been handled
+extern std::atomic<bool> g_cli_interrupted;


The cli context can return a reference to this flag - no need to make it global.

ggerganov · 2026-06-24T07:42:35Z

+    cli_client client;                // always initialized
+    std::optional<cli_server> server; // only set when no --server-base is given
+
+    json messages      = json::array();
+    json pending_media = json::array(); // staged multimodal content parts


These should be pimpl-ed to avoid including the json header in other headers.

ggerganov · 2026-06-24T07:46:17Z

+    std::string message = "\nAvailable models:";
+    if (!models.empty()) {
+        for (size_t i = 0; i < models.size(); ++i) {
+            message += "\n  " + std::to_string(i + 1) + ". " + models[i];
+        }
+    }
+    message += "\n";


It would be useful to also list the aliases for each model config.

ggerganov · 2026-06-24T07:46:49Z

-#include "server-context.h"
-#include "server-task.h"
+#include "cli-context.h"
+#include "cli-view.h"


Suggested change

#include "cli-view.h"

ggerganov · 2026-06-24T07:49:57Z

This file seems to provide some UI primitives, so it would be better to call it cli-ui.h for example.

ggerganov · 2026-06-24T08:02:49Z

+    // POST request with an SSE streaming response; on_data is invoked once
+    // per "data:" event; the function returns after the stream is finished:
+    // a null json on graceful exit (incl. cancellation via should_stop),
+    // the error response json otherwise
+    json post_sse(const std::string & path,
+                  const json & body,
+                  const std::function<bool()> & should_stop,
+                  const std::function<void(const json &)> & on_data);


I don't think the client needs to be coupled to json. It can return raw strings and let the cli context decode the json.

ngxson · 2026-06-24T11:49:16Z

Generally, I think at some point we have to decouple the HTTP layer from the server and client logics. The server and client should communicate with generic messages and how the messages are communicated would be implementation-specific. One way is HTTP of course, but we can also have other way - e.g. in-process message queue. This will allow us to integrate the server logic more broadly in applications without having to depend on the HTTP stack.

Indeed, it's already possible with the current implementation: in server-http.h we already had server_http_context is an abstraction on top of HTTP request handling, and the current impl uses httplib. A downstream application can provide another impl without using httplib

I'm planning to add an example for that, i.e. allow using server in downstream app without going through the HTTP stack, but just wondering if we should firstly move server_context to a new tools/engine shared library (libllama-engine). After that, tools/server will only contain httplib, router, tools and non-inference stuff.

ngxson and others added 7 commits June 23, 2026 13:14

cli: move to HTTP-based implementation

5979767

wip

f7421ea

Merge branch 'master' into xsn/cli_http_based

90c111b

working

19296c1

remote server ok

85c58bb

cli support router mode

1401fc3

Co-authored-by: Piotr Wilkin <ilintar@gmail.com>

case: router with only one model

b093e46

ngxson requested a review from pwilkin June 23, 2026 14:53

ngxson requested review from a team as code owners June 23, 2026 14:53

ngxson mentioned this pull request Jun 23, 2026

cli: add option to connect to server via http(s) #21674

Closed

github-actions Bot added examples server labels Jun 23, 2026

pwilkin approved these changes Jun 23, 2026

View reviewed changes

ngxson and others added 3 commits June 23, 2026 22:48

Apply suggestions from code review

beef5cf

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

remove outdated comment

5d67f69

use destructor instead

a432e6f

CISC reviewed Jun 24, 2026

View reviewed changes

ggerganov approved these changes Jun 24, 2026

View reviewed changes

		// set by the SIGINT handler; cleared once the interrupt has been handled
		extern std::atomic<bool> g_cli_interrupted;

Uh oh!

Conversation

ngxson commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Jun 23, 2026 •

edited

Loading