Skip to main content
Luminy is not limited to the built-in provider list — you can point it at any server that speaks the OpenAI Chat Completions API or the Anthropic Messages API. This means LM Studio, vLLM, llama.cpp with an HTTP server, Azure OpenAI, corporate API gateways, and any other compatible endpoint all work out of the box.

Endpoint types

Use the openai-compat provider for any server that implements the OpenAI Chat Completions API (POST /v1/chat/completions).Model ID prefix: openai-compat:Example model ID:
openai-compat:my-model-name
Common compatible servers:
ServerDefault base URLNotes
LM Studiohttp://localhost:1234/v1Enable the local server in LM Studio’s settings
vLLMhttp://localhost:8000/v1Specify --served-model-name when launching
Ollama (OpenAI mode)http://localhost:11434/v1Alternative to the native Ollama integration
Azure OpenAIhttps://<resource>.openai.azure.com/openai/deployments/<deployment>Requires deployment name as model
llama.cpp serverhttp://localhost:8080/v1Start with --port 8080

Configuring a custom endpoint

1

Open Settings

Click the gear icon or press ⌘, (macOS) / Ctrl+, (Windows/Linux).
2

Navigate to the custom endpoint section

Scroll to Custom Endpoints (or the specific provider section — OpenAI-Compatible or Anthropic-Compatible).
3

Enter the base URL

Paste your server’s base URL, for example:
http://localhost:1234/v1
Do not include the specific path (e.g., /chat/completions) — Luminy appends the correct path automatically.
4

Enter the model name

Type the model name exactly as your server expects it, for example:
llama-3.1-8b-instruct
This becomes the model ID after the prefix: openai-compat:llama-3.1-8b-instruct.
5

Enter an API key (if required)

Some servers require a bearer token or API key. Paste it into the API Key field. If your server has no authentication, you can leave this blank or enter any placeholder string — the field is optional.
6

Save and select your model

Click Save. The custom model appears in the chat composer’s model selector. Select it and start a session.

Example: LM Studio

LM Studio runs a local OpenAI-compatible server on your machine.
1

Enable the LM Studio server

Open LM Studio, load a model, go to the Local Server tab, and click Start Server. It listens on http://localhost:1234 by default.
2

Configure in Luminy

In Settings → OpenAI-Compatible, set:
  • Base URL: http://localhost:1234/v1
  • Model: the exact model name shown in LM Studio (e.g., Meta-Llama-3.1-8B-Instruct-Q4_K_M)
  • API Key: leave blank or enter any value
3

Select the model

Choose openai-compat:Meta-Llama-3.1-8B-Instruct-Q4_K_M (or your model name) from the selector in the chat composer.

Use cases

LM Studio

Run quantized GGUF models locally with a polished UI. Connect Luminy via the built-in OpenAI-compatible server.

Private vLLM deployments

Deploy vLLM on a GPU server and expose it behind a private URL. Configure the base URL and optional bearer token in Luminy.

Azure OpenAI

Use your Azure OpenAI deployment endpoint and API key. Enter the full deployment URL as the base URL.

Corporate API gateways

Many enterprises proxy AI APIs through internal gateways. If the gateway is OpenAI-compatible, Luminy connects directly.

Tool calling requirement

Luminy’s agentic loop — file edits, terminal commands, multi-step task execution — depends on the model’s tool/function calling capability. Not every model or server supports it.Before using a custom endpoint for agentic tasks, verify that:
  • The model you are serving supports tool/function calling.
  • The server correctly implements the tools parameter in the Chat Completions or Messages API.
If tool calling is unsupported, Luminy falls back to text-only responses and the agentic features will be unavailable for that model.