Local AI with Ollama: Private Model Inference in Luminy

Ollama lets you download and run open-source AI models entirely on your own hardware. Because inference happens locally, there are no API keys to manage, no usage costs, and no conversation data ever leaves your machine. Luminy detects your running Ollama instance automatically and lists all pulled models directly in the chat composer’s model selector.

Prerequisites

Before connecting Luminy to Ollama, you need to:

Install Ollama — download the installer from ollama.com and follow the platform-specific setup steps.
Pull at least one model — open a terminal and run ollama pull <model-name> (see recommended models below).

Setting up Ollama with Luminy

Install Ollama

Download and install Ollama from ollama.com. Once installed, Ollama starts a local server automatically at http://localhost:11434.

Pull a coding model

Open a terminal and pull a model. For everyday coding tasks, qwen2.5-coder:7b is the recommended starting point:

ollama pull qwen2.5-coder:7b

For a larger, more capable alternative:

ollama pull deepseek-coder-v2

Verify Ollama is running

Confirm the model downloaded successfully:

ollama list

You should see your pulled models listed with their sizes.

Open Luminy Settings

In Luminy, go to Settings → Ollama. The default endpoint is pre-filled as http://localhost:11434 — leave it unchanged unless you are running Ollama on a different host or port.

Select your model in the chat composer

Click the model selector dropdown in the chat input area. Your locally pulled models appear automatically — select one and start chatting. No restart required.

Model names and the Ollama prefix

Ollama models use their bare model name in Luminy — you do not need to type a provider prefix. For example, the model you pulled as qwen2.5-coder:7b appears as qwen2.5-coder:7b in the selector, not ollama:qwen2.5-coder:7b. Luminy detects which models are Ollama-hosted automatically by querying the local Ollama server at startup and whenever a new model is selected.

Custom Ollama endpoint

If you are running Ollama on a remote server (for example, a GPU workstation on your local network or a cloud VM), you can point Luminy at it:

Open Settings → Ollama

Navigate to the Ollama section in Luminy Settings.

Update the endpoint URL

Replace http://localhost:11434 with your remote server’s address, for example:

http://192.168.1.50:11434

Save

Click Save. Luminy will connect to the new endpoint and refresh the model list.

Make sure the remote Ollama instance is reachable from your machine and that any firewall rules allow traffic on port 11434.

Recommended models for coding

Model	Approx. size	Best for
`qwen2.5-coder:7b`	~4 GB	Fast, everyday coding assistance
`qwen2.5-coder:32b`	~20 GB	Complex reasoning and large refactors
`deepseek-coder-v2:16b`	~10 GB	Balanced speed and code quality
`codellama:13b`	~8 GB	Code completion and generation

Hardware requirements: plan on at least 8 GB of free RAM for 7B models, and 32 GB or more for 32B models. For the best experience, run models on a machine with a discrete GPU or Apple Silicon — CPU-only inference on large models can be slow.

Tool calling support

Luminy’s agentic features (file edits, terminal commands, multi-step tasks) depend on the model’s ability to call tools. Not every Ollama model supports tool/function calling.

If you select a model that does not support tool calling, Luminy’s agentic loop will be limited to text-only responses. Stick to the qwen2.5-coder or deepseek-coder-v2 series for full agentic capability — both families have reliable tool-use support.

​Prerequisites

​Setting up Ollama with Luminy

​Model names and the Ollama prefix

​Custom Ollama endpoint

​Recommended models for coding

​Tool calling support

Prerequisites

Setting up Ollama with Luminy

Model names and the Ollama prefix

Custom Ollama endpoint

Recommended models for coding

Tool calling support