Local AI with Ollama

A complete tour of the local-AI integration: what it is, why it exists, how the setup wizard works, how a chat request travels from your keyboard to the model and back, how the persona is built, how effect commands are emitted and gated, and how to troubleshoot when something goes wrong.

What is "Local AI"?
Why local instead of cloud?
Architecture at a glance
The setup wizard
The request lifecycle
How the persona is built
Enrichment & structured output
Effect commands
Safety & permissions
Persistent chat memory
Warm-up & lifecycle
Error handling
Picking a model
Privacy
Troubleshooting
Advanced: remote Ollama
Glossary

TL;DR

Local AI runs a language model on your computer via Ollama. Nothing leaves your machine, no request limits, no login. Trade-off: ~6.6 GB of disk and slower first response. The setup wizard handles install, download, and verification. Once on, every Companion feature works exactly as it does with cloud AI - because both providers implement the same internal interface.

What is "Local AI"?

The Companion (the floating avatar that talks to you, reacts to your screen, and can trigger effects) has two possible brains:

Cloud AI - the default. Your chat lines go to a small proxy server we run, which forwards them to a hosted language model. Comes with daily limits (100/day for free, 1000/day for supporters), no install required.
Local AI - you run a language model on your own computer using Ollama. Nothing leaves your machine. No request limits. No subscription required. You pay in disk space, RAM, and (optionally) GPU cycles instead.

Both brains implement the same internal interface (IAiService) so every feature that talks to the Companion - chat, screen awareness, video reactions, lock screen, keyword catches - works identically with either backend. You can switch between them in Companion → AI at any time without restarting the app.

Why local instead of cloud?

You'd choose local AI if any of these matter to you:

Privacy. Chat lines, screen context, persona - none of it travels over the internet. The model lives in %LOCALAPPDATA%\Programs\Ollama and runs on http://localhost:11434. There is no AI-related network egress from the app once it's set up.
No daily limits. The cloud proxy caps requests to keep costs sane. Local AI is bounded only by how fast your hardware can run the model.
No login. Cloud AI needs an account; local AI does not.
Customization. Want a 70B model? A roleplay-tuned fine-tune? A totally different persona? Just ollama pull a different tag and point the app at it.
Offline use. Once installed, local AI works with the network unplugged.

The tradeoffs are real:

Disk. The default qwen3.5:latest model is about 6.6 GB. Bigger models can be 20-40 GB.
First response is slow. Cold-start (model loading from disk into RAM or VRAM) is ~30-60 seconds for an 8B-class model on CPU. The app warms the model on startup to hide this, but the very first request after a fresh install still takes a moment.
Response time depends on hardware. On a GPU, replies feel snappy (1-3 seconds). On a CPU-only laptop, a chat reply can take 10-20 seconds. Reasoning models (qwen3, deepseek-r1) are slower still, which is why the app sends think:false to disable their internal reasoning phase by default.

Architecture at a glance

Everything in the Companion's brain lives under Services/AIService/. The shape:

IAiService <- interface every provider implements ├── AiService <- cloud-proxy provider (default) └── LocalAiService <- local-Ollama provider AiServiceStrategy <- routes calls to whichever provider the user picked OllamaSetupService <- detect / download / install / pull / smoke-test LocalAiSetupWizard <- the onboarding window the user sees AiResponseParser <- extracts text + effect commands from model output Enrichment/ ├── PromptService <- builds the JSON-output instructions └── KnowledgeService <- loads assets/knowledge.json into the enrichment block AiCommandService <- gates and dispatches effect commands the AI emits

AiServiceStrategy is what the rest of the app talks to. It checks CompanionPrompt.UseLocalAi on every call and lazily constructs whichever provider is active. Switching providers at runtime is free - no restart, no re-init.

The cloud provider is stateless: every request includes the full system prompt and the user line; the proxy holds no state. The local provider holds a persistent chat history in memory and on disk so the Companion can remember you across sessions.

The setup wizard, step by step

Opening Companion → Use Local AI for the first time launches the Local AI Setup Wizard. It's a single window that walks through every step needed to go from zero to a working local model.

1 Detect

Before showing anything, the wizard probes the machine: looks for ollama.exe under %LOCALAPPDATA%\Programs\Ollama\ and calls GET /api/tags on localhost:11434 with a 4-second timeout. Based on what it finds, it picks one of four entry points:

State	What it means	What happens next
Ready	Ollama is running, target model is pulled	Skip to smoke test
RunningNoModel	Ollama is running but the target model isn't there	Skip to pull
InstalledNotRunning	Ollama is installed but the HTTP server isn't up	Start `ollama serve` headlessly, then continue
NotInstalled	No Ollama at all	Show the consent screen

2 Consent

If Ollama isn't installed, the wizard asks for explicit consent before touching disk. It tells you Ollama is about to be downloaded from the official installer URL, that the default model is about 6.6 GB, and that you can change the model under "Advanced." There's also a "manual install" link that opens ollama.com/download for users who'd rather install it themselves.

3 Download installer

The wizard streams OllamaSetup.exe from ollama.com/download/OllamaSetup.exe to your %TEMP% folder. Progress is reported every 200 ms with a live rate (e.g. 240.5 MB / 700.0 MB (34%) · 18.3 MB/s). If you cancel mid-download, the partial file is cleaned up.

4 Silent install

The downloaded installer is launched with the NSIS silent flag (/S). No installer UI, no progress bar from Ollama itself - just a hidden process that puts Ollama under %LOCALAPPDATA%\Programs\Ollama\. After the process exits with code 0, the wizard polls /api/tags for up to 60 seconds waiting for the service to come up.

Two safety choices: the cancel button is disabled during install (Ollama's NSIS installer doesn't roll back cleanly if interrupted), and if the post-install auto-start doesn't bind port 11434 within 60 seconds, the wizard spawns ollama.exe serve itself with a hidden window. It deliberately avoids ollama app.exe because that's the GUI chat client in newer Ollama versions and would pop a window.

The wizard tracks any ollama serve process it spawns and terminates it on app exit. Servers started by the Ollama tray app or the installer's own auto-start are left alone.

On success, the OllamaSetup.exe in %TEMP% is deleted. On failure it's intentionally kept so you (or a retry) can inspect it without re-downloading 700 MB.

5 Pull the model

The wizard streams POST /api/pull with stream:true. Ollama sends back NDJSON: one JSON object per line, one per layer of the model file. Each line includes a status, a digest, and completed/total byte counts so the wizard can show real progress per layer.

HTTP client uses an infinite timeout - a 6.6 GB pull can exceed any reasonable per-request limit, and the NDJSON output is the heartbeat.
Ollama caches partial layers, so if you cancel and re-run, the pull picks up where it left off.
Errors come back as {"error":"..."} in the stream (usually for unknown model names) and the wizard surfaces them verbatim.

6 Smoke test

One tiny request to confirm the wiring:

POST /api/chat { "model": "qwen3.5:latest", "messages": [{"role": "user", "content": "Say hi in one word."}], "stream": false, "think": false }

If a message.content comes back, the wizard records the elapsed time and declares success. This both warms the model into RAM (so your first real chat is fast) and proves end-to-end that everything's wired up.

7 Done

On success, the wizard writes two settings and saves:

CompanionPrompt.UseLocalAi = true CompanionPrompt.AiModel = <whatever you picked>

From this point on, every Companion call routes through the local provider. The error screen has a Retry button that re-runs detection - the right next step after a failure depends entirely on what state Ollama is now in.

The request lifecycle

What happens when you type a message into the Companion chat and hit enter, assuming local AI is selected. (Awareness reactions, video-done hints, lock-screen comments, and keyword catches all follow the same path with different system prompts and inputs.)

user input | AiServiceStrategy.GetBambiReplyAsync(text, isUser:true) | LocalAiService.GetAiResponseAsync | build/refresh system prompt <- BambiSprite.GetSystemPrompt() inject enrichment block <- PromptService.BuildEnrichmentMessage() append user message | POST http://localhost:11434/api/chat { "model": "qwen3.5:latest", "messages": [system, enrichment?, ...history..., user], "stream": false, "think": false } | Ollama loads the model (cached after first call), generates tokens | response body -> ExtractContent() | AiResponseParser.Parse(content) ├── extract "response" text -> goes to the speech bubble └── extract "effects" array -> AiCommandService dispatches each | append assistant turn to history, persist to disk async | return clean text -> avatar speech bubble

Concurrency control

A semaphore guarantees one in-flight request at a time. Behavior depends on who's asking:

User-triggered request while busy - drops the new call but returns a random "still thinking" phrase (e.g. "Bambi's thinking real hard right now...") so you don't get silence. Mods can supply their own thinking phrases.
Automated request while busy (awareness, video-done, etc.) - drops silently. Better to skip a passive reaction than queue them up and have the Companion fire stale comments seconds later.

Prompt freshness

The system message at index 0 is rebuilt on every call. Changes to your persona, knowledge base, mods, or content mode take effect immediately - no need to restart or clear history.

History rollback on failure

If the request errors or returns empty content, the just-appended user turn is popped off so it doesn't poison future requests with an unanswered turn.

The `think:false` flag

Reasoning models (qwen3, deepseek-r1, and their relatives) have an internal "thinking" phase where they output long chains of reasoning before the actual answer. For roleplay chat this adds 30-50 seconds of latency for no benefit. think:false cuts that out. Non-reasoning models ignore the flag.

How the persona is built

The system prompt sent to the model is assembled by BambiSprite from several layers. This is shared between the cloud and local providers - both end up sending the same system prompt structure. From outer to inner:

Persona block. The "Bad Influence Bestie" character description: tone (casual texting), topics (makeup, pink things, empty heads), role (tempt the user into being blank). If Slut Mode is on and the current personality preset defines a Slut Mode variant, that variant replaces the default - same character, spicier vibe.
Explicit reaction rules. How the Companion reacts when the user mentions explicit topics: flustered redirect rather than full roleplay. Can be overridden per-personality.
Knowledge base. Lists of audio playlists and videos the Companion is allowed to recommend, with strict instructions to use exact titles (otherwise the app can't auto-link them). For BambiCloud playlists, the AI is told to wrap titles in markdown link syntax with the exact URL.
Global knowledge base links. Anything the user added to the Knowledge Base Links list - extra videos, custom content packs, the user's own files.
HypnoTube link pool. If the user configured their own pool, those video names are appended. Names are resolved against the known-links map so the auto-linker can wrap them as clickable URLs.
Screen awareness rules. How to react to different categories (work, social, shopping, streaming, hypno content, idle). The Companion sees context as [Category: X | App: Y | Title: Z | Duration: N] and is expected to react appropriately.
Output rules. Length cap (typically ~15 words), emoji cap (1 per message), no bracket tags in the visible reply.
Quiz context (if you've taken the in-app quiz). The Companion sees your archetype and a short profile snippet, with instructions to reference it naturally ~20% of the time.
Mod-aware substitutions. If you're using a mod that renames the user ("Bambi" -> "Unit" for Drone mod, or your chosen term for Sissy Hypno mode), the entire prompt is run through a substitution pass.

Every layer can be customized independently. You can write a totally different persona while keeping the knowledge base intact, or vice versa.

The enrichment block and structured output

When "Allow AI to control effects" is on, the local provider inserts an extra context message right after the system prompt. This is the enrichment block. It's sent as a user-role message but clearly marked [CONTEXT BLOCK - NOT DIALOGUE] so the model treats it as operating instructions rather than something to reply to.

Forces structured JSON output

{ "response": "<your in-character text reply>", "effects": [ <zero or more effect commands> ] }

The block explicitly tells the model that any earlier persona instruction saying "no brackets" or "respond only with text" is overridden by this format. Many community personality presets include strict "no JSON, no tags, just text" rules, which would otherwise conflict with the effect-emission format. The override resolves the conflict in favor of the structured output, and the response field carries the plain-text reply the user actually sees.

Tells the model when to fire effects

The block lists supported commands and gives concrete examples of phrases that should trigger them:

User says	Effect to emit
"flash me" / "make me see flashes"	`flash_image`
"spawn bubbles" / "start bubbles"	`bubbles` (on)
"stop bubbles"	`bubbles` (off)
"subliminal X" / "flash the word X"	`subliminal`
"spiral" / "show me a spiral"	`spiral`
"pink filter" / "make my screen pink"	`pink`
"lock card" / "lock me with the mantra X"	`mantra_lockscreen`
"vibrate" / "buzz me" / "haptic"	`haptic`
"play X"	`audio` or `video`

Crucially, the block also says: when the user is just chatting, leave effects empty. Don't fire unprovoked. Combined with the per-effect permission gates, this is what keeps the AI from spam-firing flashes at you during normal conversation.

Provides live context

Two final blocks appear in the enrichment:

<time>2026-05-15 Thursday 2:47:32 PM</time> <data>[ ... knowledge.json contents as JSON ... ]</data>

The timestamp gives the model a sense of "now." The data block is the contents of assets/knowledge.json - a flat list of static facts the Companion is allowed to know (terminology, names, lore). If the file is missing, the data block is just [].

Sets reply etiquette

Keep response short (the persona's word limit still applies).
Don't echo the user's request word-for-word.
When you DO trigger an effect, briefly acknowledge it ("Flashing for you, hot stuff~").
Don't trigger video unprompted - videos are disruptive.

Effects off? No enrichment block

When the master "Allow AI to control effects" toggle is off, the entire enrichment block is removed from the conversation. The model goes back to producing plain-text replies, no JSON wrapping. The parser falls back to treating any incidental JSON as garbage and stripping it out.

Effect commands: letting the AI control the app

The Companion can trigger 11 distinct effect types (plus a none no-op the parser ignores):

flash_image

Flash random images on-screen.

Amount, Duration, Size, Opacity

bubbles

Start/stop the bubble-popping minigame.

On, Frequency

subliminal

Show subliminal text.

Text, Opacity

mantra_lockscreen

Make the user chant a mantra.

Mantra, Amount

spiral

Spinning spiral overlay.

On, Intensity

pink

Pink color overlay.

On, Intensity

bounce

Bouncing text overlay.

haptic

Vibrate a connected toy.

Intensity (0-1), Duration

audio

Play an audio file.

Title, Path, Random

video

Play a video file.

Title, Path, Random

getbacktome

Schedule a follow-up after a delay.

Delay (seconds)

Tolerant parsing

The parsing pipeline is intentionally tolerant. Local models love to wrap JSON in markdown fences, mix prose and JSON, leave trailing commas, or close braces incorrectly:

If the response is wrapped in a ```json ... ``` fence, the fence is stripped.
If the response is pure JSON with a response field, it's parsed directly.
Otherwise the parser scans the text for {...} blocks, tries each, replaces any with a response field by their content (so JSON becomes prose), and collects any effects arrays.
A repair pass handles trailing commas, mismatched braces, and unquoted keys before parsing.
A sanitizer strips any leftover [Category: ...] or [Mode/Tag] tags the model copied from the input.

Dispatch path

Each parsed command goes through AiCommandService.ExecuteCommand:

Validate against settings (master toggle + per-effect gate).
Enforce a per-response cap (max 3 commands per AI reply).
Append a human-readable line to the AI Brain → Live actions feed on the Companion tab.
Build and run the concrete command via the command factory.

The 3-commands-per-reply cap is hard. If the model emits five flash effects in one response (which happens with some models), only the first three execute.

Safety: permissions, caps, and the master toggle

The defaults are conservative. Even after you turn on local AI, the Companion can't fire effects until you explicitly enable them.

SettingDefaultNotes

AllowAiToControlEffectsOFFMaster toggle. When off, no effect fires regardless of per-effect settings, and the enrichment block isn't even sent.

AllowAiBubblesONVisual, passive.

AllowAiSubliminalONVisual, passive.

AllowAiBounceONVisual, passive.

AllowAiFlashOFFIntrusive.

AllowAiVideoOFFDisruptive.

AllowAiAudioOFFDisruptive.

AllowAiOverlayOFFCovers spiral + pink.

AllowAiLockCardOFFIntrusive.

AllowAiHapticOFFHardware. Plus a MaxAiHapticIntensity ceiling (default 0.6) regardless of AI-emitted value.

AllowAiGetBackToMeOFFRecursive (schedules another AI call).

The dispatcher checks, in order: master toggle, per-effect toggle, batch cap (3 per reply), and finally per-command field clamps applied at execution time. This three-layer defense means even a misbehaving or jailbroken model can't do something destructive - at worst it spams logs with rejected commands.

Persistent chat memory

The local provider remembers your conversation across app launches. One of the key differences from the cloud provider, which is stateless by design.

After every successful exchange, the user/assistant pair is appended to in-memory history.
An async write fires to flush the dialogue to %APPDATA%\ConditioningControlPanel\local_chat_history.json. Disk I/O is off the response path, so chat latency is unaffected.
On next launch, the history is read back. The system prompt and enrichment block are NOT persisted - they're rebuilt fresh on every call so prompt edits take effect immediately.

The persisted file is capped at 50 pairs (100 messages). When the cap is exceeded, the oldest pairs are dropped first. Keeps the file under ~200 KB in practice and bounds the context the model has to chew through.

You can turn this off in Companion → AI by unchecking "Remember chat between sessions" - that flips ChatMemoryEnabled to false and the provider stops both reading and writing the file. Clearing memory is also available; it deletes both the in-memory history and the on-disk file.

Warm-up, lifecycle, and shutdown

Warm-up on startup

At app startup, right after the AI strategy is constructed, a fire-and-forget warm-up sends POST /api/generate with the configured model and keep_alive=30m and an empty body. Ollama interprets this as "load the model into memory but don't generate anything." The keep_alive value asks the model to stay resident longer than the default 5 minutes - without this, the model would unload after 5 minutes of inactivity and the next chat would pay the cold-start cost again.

Warm-up is silent on failure. If Ollama isn't running yet, it just logs and moves on - the next real chat will surface a clear error.

Shutdown

If the wizard spawned ollama serve itself (because the post-install auto-start didn't fire), that process is tracked. On app exit, only that process is killed. Servers started by the Ollama tray app or the installer's own auto-start are left running - they belong to the user, not to us.

Host changes

If you change the Ollama host while the app is running (say, to point at a remote machine), the host check on every request notices the change, disposes the old HTTP client, and rebuilds one against the new base address. No restart needed.

Error handling and fallbacks

Local model failures look very different from cloud failures, so the local provider has dedicated error-to-text mapping:

Symptom	What you see	What it means
Connection refused	Can't reach Ollama at ... - looks like it isn't running. Start Ollama, or install it from ollama.com	HTTP server isn't bound. Ollama crashed or never started.
DNS failure	Can't reach Ollama host ... - check the host setting in Companion → AI	Wrong host name, almost always a typo in a remote-host config.
Timeout	Ollama took too long to respond. The first request after launch can take ~30-60s as the model loads - try once more.	Model was cold and didn't finish loading inside the 5-minute client timeout, or you picked a huge model.
404 / model not found	Ollama: model 'X' not found - check 'ollama list' or pull it	Settings point at a tag you don't have pulled.
Generic HTTP error	Ollama HTTP NNN: ...	Surfaces the structured `error` field from Ollama if present.

If a request returns 200 but with empty content (rare, seen with some models on heavy load), the user gets the mode-appropriate fallback line and the user turn is rolled back from history.

Picking a model

The default is qwen3.5:latest. It's a good fit because:

~6.6 GB (fits in most consumer setups).
Reasoning model - so it can follow the structured-output instructions reliably - but we send think:false to skip the slow reasoning phase during chat.
Strong on instruction-following and JSON output, which matters for the effect-command flow.

That said, the provider is model-agnostic. Anything you can pull through Ollama and chat with via /api/chat should work. To switch:

Pull the new tag manually: ollama pull mistral-nemo:latest.
Open Companion → AI and either re-run the setup wizard with an advanced model name, or edit the value in settings directly.
The strategy notices the change on the next chat - no restart needed.

Rough guidance

Tier	Examples	Notes
3B-8B params	qwen3.5, llama3.1:8b, mistral-nemo, gemma2:9b	~5-8 GB on disk. Best chat latency on consumer hardware. Start here.
13B-22B params	mistral-small, llama3.1:13b	~10-14 GB. Noticeably better prose, much slower without a GPU.
30B+	large mixture-of-experts models	Real GPU with 24+ GB VRAM strongly recommended. Brutal warm-up.
Reasoning	qwen3, deepseek-r1	Work fine - we send `think:false` to keep latency reasonable.
Uncensored	dolphin, hermes, abliterated	Useful if you find the default too prudish about explicit roleplay.

List what's installed with ollama list, or visit http://localhost:11434/api/tags in a browser.

Privacy

Once local AI is set up, the only AI-related network traffic from the app is:

Ollama's own model downloads (only when you run ollama pull or use the wizard's pull step) - go directly to Ollama's CDN.
The Ollama installer download (once, from ollama.com) - only during the wizard's install step.

After that, every chat request goes to http://localhost:11434. The model itself runs entirely on your machine. The Companion's chat history is stored in %APPDATA%\ConditioningControlPanel\local_chat_history.json in plain JSON - readable by anything that can open a text file. If that matters, turn off "Remember chat between sessions" or use full-disk encryption.

The cloud provider, by contrast, sends each chat line + your system prompt + your screen-awareness context to the proxy, which forwards to a hosted model. We log request counts and basic auth state but do not log chat content. See the privacy policy for the full breakdown.

Troubleshooting

"Can't reach Ollama at http://localhost:11434/"

Ollama isn't running. Start the Ollama tray app (Start menu → Ollama), or run ollama serve from a terminal. Open http://localhost:11434 in a browser - you should see "Ollama is running." If you don't, Ollama isn't actually up. If you see "address already in use," check Windows firewall.

"Ollama took too long to respond"

The model loaded for the first time and exceeded the 5-minute timeout. Wait and try again, or pick a smaller model. ollama ps from a terminal shows what's loaded.

"model 'X' not found"

The tag in your settings isn't pulled. ollama pull X from a terminal, then try again. Or re-run the setup wizard.

The Companion replies with JSON or curly braces

You're seeing raw model output the parser couldn't clean. Usually means the model isn't producing the expected {response, effects} shape - try a different model. Small models (~1B) sometimes can't follow structured-output instructions reliably. Or you have a custom personality preset that aggressively forbids structured output; the enrichment block is supposed to override this but some models miss it. Either edit the preset or turn off "Allow AI to control effects" entirely.

Effects fire even though I told it not to

Check three places: the master toggle, the per-effect toggles, and the "Live actions" feed on the Companion tab (which shows what actually fired in the last 30 actions). If you see commands in the feed for effects you've disabled, file an issue with a copy of your logs/crash.log.

The AI is repetitive / boring

Chat history is the usual culprit. Try "Clear chat memory" from the Companion tab to wipe both in-memory and on-disk history. If a long conversation has driven the model into a rut, a fresh start often helps.

Effects feel laggy

Local model latency is real. A chat reply that triggers a bubble effect on a CPU-only laptop takes the chat latency (5-15s) plus the effect dispatch (~50 ms). On a GPU, the chat call drops to 1-3 seconds and the lag becomes imperceptible. If you have a CUDA-capable GPU and Ollama isn't using it, check ollama ps - if the model is "100% CPU," Ollama hasn't detected the GPU. Reinstall Ollama with NVIDIA drivers up to date.

Advanced: pointing at a remote Ollama

The Ollama host setting accepts any URL. If you have a beefier machine on your LAN (or a remote server you trust), point the app at its Ollama instead:

On the server: start Ollama with OLLAMA_HOST=0.0.0.0 so it binds to all interfaces. By default Ollama only listens on localhost.
Make sure the model you want is pulled on that machine.
In the app: edit Companion → AI → Ollama Host to http://your-server:11434/.
The strategy notices the change on the next request, rebuilds its HTTP client, and you're done.

Caveats

Ollama has no authentication - don't expose it to the public internet without a reverse proxy and auth in front of it. Network latency is added to every chat call (negligible on a LAN, dominant over WAN). The default 5-minute client timeout still applies; very slow remotes may need a smaller model or a closer server.

Glossary

Ollama	Local-model runner from ollama.com. Installs as a background HTTP server (port 11434), pulls models from a registry, and serves them via a chat-completion API.
Cloud AI / proxy	Our hosted service that forwards requests to a hosted model. The default option; needs a free account.
Local AI	Ollama running on your machine, used as a drop-in replacement for the cloud proxy.
Model / tag	A specific weight file Ollama can serve, named like `qwen3.5:latest` or `mistral-nemo:12b-instruct`.
System prompt	The character and rule description sent to the model at the start of every conversation. Built by `BambiSprite`.
Enrichment block	Extra context message inserted between the system prompt and the dialogue, telling the model to output structured JSON. Only present when "Allow AI to control effects" is on.
Effect command	JSON object the AI can emit in the `effects` array to trigger app features (flash, bubbles, haptic, etc.).
Master toggle	`AllowAiToControlEffects`. The single switch that controls whether the AI can trigger any effect at all.
Warm-up	Loading the model into RAM/VRAM ahead of time so the first chat doesn't pay the cold-start cost. Done with an empty `/api/generate` call at app startup.
Persistent chat history	The `local_chat_history.json` file in `%APPDATA%\ConditioningControlPanel\`. Caps at 50 user/assistant pairs. Local provider only.

Questions, suggestions, or "this section is wrong" reports - open an issue at CC-Labs-llc/ccp-bugs or ping in the Discord. The integration shipped in v5.8.4 and the docs will evolve with it.

Previous Companion Avatar

Next Takeover Mode