Model Setup — OpenFlow Help

Parakeet (Recommended Speech Transcription)

OpenFlow supports NVIDIA Parakeet TDT 0.6B v3 through a persistent sherpa-onnx worker. Open Settings > Models and click Install beside Parakeet. OpenFlow then:

Downloads the pinned Windows sherpa-onnx runtime and multilingual INT8 model (about 510 MB total download).
Verifies both archives with SHA-256 before extracting them.
Stores the runtime under %APPDATA%\openflow\bin\parakeet\. The model is stored under %APPDATA%\openflow\models\parakeet\ when space allows. If that drive has less than 2.5 GB free, OpenFlow automatically uses the fixed drive with the most free space (for example E:\OpenFlow\models\parakeet\) and remembers that location.
Selects Parakeet and keeps its local server warm between dictations.

Parakeet automatically detects Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, and Ukrainian. If a different language is selected, or if the Parakeet runtime fails, OpenFlow automatically falls back to Whisper.

The automatic Windows installation uses the CPU-compatible runtime for the most reliable setup. Advanced installations can provide another sherpa-onnx runtime with OPENFLOW_PARAKEET_SERVER, select its execution provider with OPENFLOW_PARAKEET_PROVIDER, or point to an existing model directory with OPENFLOW_PARAKEET_MODEL_DIR.

Whisper (Fallback Speech Transcription)

OpenFlow retains its local Whisper engine as a selectable transcription engine and as Parakeet's automatic fallback. Within the Whisper engine, OpenFlow tries whisper.cpp first, then faster-whisper, then the classic whisper CLI.

Default Model: Tiny

The Tiny model (75MB) is the default because it gives the lowest local latency. Use Base or Small if you want more accuracy and can tolerate a slightly slower turnaround.

Fastest Transcription Backend

For near-instant local transcription, package whisper.cpp with OpenFlow or install it locally. OpenFlow looks for whisper-cli / whisper-cpp in bundled resources, next to the app executable, in %APPDATA%\openflow\bin\, and on PATH. Put the matching model file in bundled resources or:

Windows: %APPDATA%\openflow\models\whisper\

OpenFlow looks for names like ggml-base.bin, ggml-small.bin, or ggml-large-v3.bin. To point to an exact model file, start OpenFlow from PowerShell with:

$env:OPENFLOW_WHISPER_CPP_MODEL = "C:\path\to\ggml-base.bin"
npm run tauri dev

faster-whisper is still a good fallback if you already have it installed in the same Python environment available to OpenFlow:

pip install faster-whisper

To force a backend for troubleshooting, set OPENFLOW_WHISPER_BACKEND to whisper.cpp, faster-whisper, or whisper before launching:

$env:OPENFLOW_WHISPER_BACKEND = "whisper.cpp"
npm run tauri dev

Supported Models

Model	Size	Description
Tiny	75 MB	Default, lowest-latency profile
Base	145 MB	Better balance of accuracy and speed
Small	460 MB	Balanced accuracy and latency
Medium	1.5 GB	Exceptional language translation
Large-v3	2.9 GB	Maximum precision, high resources

Model Storage Location

Models are stored in your app data directory:

Windows: %APPDATA%\openflow\models\whisper\

Selecting a Model

Open Settings > Models
Click "Download" next to the model you want
Once downloaded, click "Select" to make it active

Cleanup Model (Text Refinement)

OpenFlow uses a local LLM to clean up raw speech transcripts.

Recommended Model: Gemma 4 E2B it (GGUF)

Hugging Face Repo: unsloth/gemma-4-E2B-it-GGUF
Recommended Quantization: Q4_K_M or Q5_K_M
Purpose: Corrects filler words, false starts, and spoken punctuation

Option A: llama.cpp Server (Recommended)

Install llama.cpp or download the llama-server binary
Download a GGUF model file from the Hugging Face repo above
Start the server:

llama-server -hf unsloth/gemma-4-E2B-it-GGUF

Or with a local file:

llama-server -m path/to/gemma-4-e2b-it-Q4_K_M.gguf --port 8080

In OpenFlow, go to Settings > Models
Set the cleanup model provider to "Local llama.cpp server"
Set the base URL to http://127.0.0.1:8080/v1
The app will automatically connect and use it for text cleanup

Option B: Rule-Based Only (No LLM Required)

If you don't have a local LLM server running, OpenFlow automatically falls back to rule-based cleanup, which handles:

Filler word removal (um, uh, like, you know)
Correction handling (I mean, actually, scratch that)
Spoken punctuation (comma, period, question mark)
Formatting commands (new line, bullet point)
Basic capitalization and sentence punctuation

This works entirely offline with no additional setup.

Model Storage Location

Cleanup models are stored in:

Windows: %APPDATA%\openflow\models\cleanup\

Checking Model Status

The sidebar shows "Local Whisper Active" with the engine name when transcription is running. The Settings > Models page shows cleanup model status and connection state.