feat: add MiniMax as embedding provider#894
Conversation
Add MiniMax AI as a built-in embedding provider alongside OpenAI, Gemini, AzureOpenAI, and Local. MiniMax's embo-01 model produces 1536-dimensional embeddings via a native API at https://api.minimax.io/v1/embeddings. The API uses a different request/response format from OpenAI (texts array + type field for db/query distinction, vectors array in response). Usage: embed!("text", "minimax:embo-01") // storage embeddings embed!("text", "minimax:embo-01:query") // search query embeddings Configuration: - Set MINIMAX_API_KEY environment variable - Provider prefix: minimax:<model>[:<type>] - type defaults to "db" (use "query" for search queries) Includes 5 unit tests (parsing, API key validation) and 2 integration tests (ignored by default, require API key and network).
| Some(m) if m.starts_with("minimax:") => { | ||
| let parts: Vec<&str> = m.splitn(2, ':').collect(); | ||
| let model_and_type = parts.get(1).unwrap_or(&"embo-01"); | ||
| let (model_name, embedding_type) = if model_and_type.contains(':') { | ||
| let type_parts: Vec<&str> = model_and_type.splitn(2, ':').collect(); | ||
| ( | ||
| type_parts[0].to_string(), | ||
| type_parts.get(1).unwrap_or(&"db").to_string(), | ||
| ) | ||
| } else { | ||
| (model_and_type.to_string(), "db".to_string()) | ||
| }; | ||
|
|
||
| Ok((EmbeddingProvider::MiniMax { embedding_type }, model_name)) | ||
| } |
There was a problem hiding this comment.
Dead
unwrap_or fallback — default "embo-01" is never applied
Because the match guard already requires m.starts_with("minimax:"), calling m.splitn(2, ':') will always produce at least two parts: ["minimax", "<rest>"]. This means parts.get(1) always returns Some(...) — even Some("") when the input is "minimax:". The .unwrap_or(&"embo-01") fallback is therefore dead code and the intended default is never applied.
Concretely, "minimax:" results in model_name = "" (not "embo-01"), which will be sent verbatim to the MiniMax API and cause a server-side error rather than a clean default.
The same dead-fallback pattern exists in the Gemini arm (line 125), but this PR introduces it again for MiniMax.
A clean fix is to use strip_prefix and treat an empty suffix as the default:
| Some(m) if m.starts_with("minimax:") => { | |
| let parts: Vec<&str> = m.splitn(2, ':').collect(); | |
| let model_and_type = parts.get(1).unwrap_or(&"embo-01"); | |
| let (model_name, embedding_type) = if model_and_type.contains(':') { | |
| let type_parts: Vec<&str> = model_and_type.splitn(2, ':').collect(); | |
| ( | |
| type_parts[0].to_string(), | |
| type_parts.get(1).unwrap_or(&"db").to_string(), | |
| ) | |
| } else { | |
| (model_and_type.to_string(), "db".to_string()) | |
| }; | |
| Ok((EmbeddingProvider::MiniMax { embedding_type }, model_name)) | |
| } | |
| Some(m) if m.starts_with("minimax:") => { | |
| let suffix = m.strip_prefix("minimax:").unwrap_or("embo-01"); | |
| let suffix = if suffix.is_empty() { "embo-01" } else { suffix }; | |
| let (model_name, embedding_type) = if suffix.contains(':') { | |
| let type_parts: Vec<&str> = suffix.splitn(2, ':').collect(); | |
| ( | |
| type_parts[0].to_string(), | |
| type_parts.get(1).unwrap_or(&"db").to_string(), | |
| ) | |
| } else { | |
| (suffix.to_string(), "db".to_string()) | |
| }; | |
| Ok((EmbeddingProvider::MiniMax { embedding_type }, model_name)) | |
| } |
| } | ||
| _ => panic!("Expected MiniMax provider"), | ||
| } | ||
| assert_eq!(model, ""); |
There was a problem hiding this comment.
test_parse_minimax_provider_empty_model asserts empty model rather than the documented default
This test asserts model == "" for input "minimax:". Combined with the dead unwrap_or(&"embo-01") in the parser, this test documents and entrenches the broken behaviour: a user who types "minimax:" expecting the default model will silently get an empty model string sent to the API. If the dead-fallback bug in the parser is fixed, this test should be updated to assert model == "embo-01".
| assert_eq!(model, ""); | |
| assert_eq!(model, "embo-01"); // defaults to embo-01 when no model specified |
MiniMax embedding API returns HTTP 200 even for errors (e.g. rate limits) with the actual error in base_resp.status_code. Check this field before attempting to parse the embedding vectors. Also fix integration tests to use #[tokio::test] for proper async runtime.
Summary
embo-01model (1536 dimensions)textsarray +typefield for db/query distinction,vectorsresponse)base_resp.status_codein the response bodyUsage
Requires
MINIMAX_API_KEYenvironment variable (or pass key directly).Changes
helix-db/src/helix_gateway/embedding_providers/mod.rsMiniMaxvariant toEmbeddingProviderenum,minimax:prefix parsing, API key resolution, andfetch_embedding_async()implementationhelix-db/src/helix_gateway/tests/embedding_providers.rshelix-db/src/helix_engine/tests/README.mdMiniMax API Details
POST https://api.minimax.io/v1/embeddingsembo-01(1536 dimensions){"model": "embo-01", "texts": ["text"], "type": "db"}{"vectors": [[...]], "base_resp": {"status_code": 0}}"db"for storage,"query"for search queriesTest plan
cargo test embedding_providers)MINIMAX_API_KEY)Greptile Summary
This PR adds MiniMax as a new embedding provider following the existing pattern for OpenAI, Gemini, and Azure OpenAI, including proper API key resolution, a custom request/response format (
textsarray +typefield +vectorsresponse), and handling of MiniMax's HTTP-200-for-errors behaviour viabase_resp.status_code.Key points:
parts.get(1).unwrap_or(&"embo-01")on the model-parsing path (line 173) is unreachable —splitn(2, ':')always producesSome("")when the user writes"minimax:", so the intended default"embo-01"is never applied and an empty model string is forwarded to the API. The accompanying test documents this broken behaviour by assertingmodel == "".embedding_typevalidation: Thetypefield forwarded to MiniMax is not checked against the two accepted values ("db","query"); invalid types will only fail at runtime with an opaque API error.#[tokio::test]+fetch_embedding_async, while all existing integration tests use plain#[test]+ the synchronousfetch_embeddingwrapper.Important Files Changed
unwrap_or(&"embo-01")default that never fires, causing"minimax:"to silently send an empty model name to the API.""instead of"embo-01"), and integration tests use#[tokio::test]inconsistently with the rest of the test suite.Sequence Diagram
sequenceDiagram participant User participant embed_macro as embed! macro participant Parser as parse_provider_and_model participant EmbeddingModelImpl participant MiniMaxAPI as MiniMax API (api.minimax.io) User->>embed_macro: embed!(text, "minimax:embo-01[:query]") embed_macro->>Parser: parse_provider_and_model(Some("minimax:embo-01")) Parser-->>EmbeddingModelImpl: (MiniMax { embedding_type: "db" }, "embo-01") embed_macro->>EmbeddingModelImpl: fetch_embedding(text) EmbeddingModelImpl->>MiniMaxAPI: POST /v1/embeddings\n{model, texts:[text], type:"db"} MiniMaxAPI-->>EmbeddingModelImpl: HTTP 200 {vectors:[[...]], base_resp:{status_code:0}} Note over EmbeddingModelImpl: Check base_resp.status_code != 0\n(MiniMax returns HTTP 200 even for errors) EmbeddingModelImpl-->>embed_macro: Vec<f64> (1536 dims) embed_macro-->>User: embedding vectorReviews (1): Last reviewed commit: "feat: add MiniMax as embedding provider" | Re-trigger Greptile