OpenFlow supports NVIDIA Parakeet TDT 0.6B v3 through a persistent sherpa-onnx worker. Open Settings > Models and click Install beside Parakeet. OpenFlow then:

  1. Downloads the pinned Windows sherpa-onnx runtime and multilingual INT8 model (about 510 MB total download).
  2. Verifies both archives with SHA-256 before extracting them.
  3. Stores the runtime under %APPDATA%\openflow\bin\parakeet\. The model is stored under %APPDATA%\openflow\models\parakeet\ when space allows. If that drive has less than 2.5 GB free, OpenFlow automatically uses the fixed drive with the most free space (for example E:\OpenFlow\models\parakeet\) and remembers that location.
  4. Selects Parakeet and keeps its local server warm between dictations.

Parakeet automatically detects Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, and Ukrainian. If a different language is selected, or if the Parakeet runtime fails, OpenFlow automatically falls back to Whisper.

The automatic Windows installation uses the CPU-compatible runtime for the most reliable setup. Advanced installations can provide another sherpa-onnx runtime with OPENFLOW_PARAKEET_SERVER, select its execution provider with OPENFLOW_PARAKEET_PROVIDER, or point to an existing model directory with OPENFLOW_PARAKEET_MODEL_DIR.

Whisper (Fallback Speech Transcription)

OpenFlow retains its local Whisper engine as a selectable transcription engine and as Parakeet's automatic fallback. Within the Whisper engine, OpenFlow tries whisper.cpp first, then faster-whisper, then the classic whisper CLI.

Default Model: Tiny

The Tiny model (75MB) is the default because it gives the lowest local latency. Use Base or Small if you want more accuracy and can tolerate a slightly slower turnaround.

Fastest Transcription Backend

For near-instant local transcription, package whisper.cpp with OpenFlow or install it locally. OpenFlow looks for whisper-cli / whisper-cpp in bundled resources, next to the app executable, in %APPDATA%\openflow\bin\, and on PATH. Put the matching model file in bundled resources or:

Windows: %APPDATA%\openflow\models\whisper\

OpenFlow looks for names like ggml-base.bin, ggml-small.bin, or ggml-large-v3.bin. To point to an exact model file, start OpenFlow from PowerShell with:

$env:OPENFLOW_WHISPER_CPP_MODEL = "C:\path\to\ggml-base.bin"
npm run tauri dev

faster-whisper is still a good fallback if you already have it installed in the same Python environment available to OpenFlow:

pip install faster-whisper

To force a backend for troubleshooting, set OPENFLOW_WHISPER_BACKEND to whisper.cpp, faster-whisper, or whisper before launching:

$env:OPENFLOW_WHISPER_BACKEND = "whisper.cpp"
npm run tauri dev

Supported Models

ModelSizeDescription
Tiny75 MBDefault, lowest-latency profile
Base145 MBBetter balance of accuracy and speed
Small460 MBBalanced accuracy and latency
Medium1.5 GBExceptional language translation
Large-v32.9 GBMaximum precision, high resources

Model Storage Location

Models are stored in your app data directory:

Windows: %APPDATA%\openflow\models\whisper\

Selecting a Model

  1. Open Settings > Models
  2. Click "Download" next to the model you want
  3. Once downloaded, click "Select" to make it active

Cleanup Model (Text Refinement)

OpenFlow uses a local LLM to clean up raw speech transcripts.

  • Hugging Face Repo: unsloth/gemma-4-E2B-it-GGUF
  • Recommended Quantization: Q4_K_M or Q5_K_M
  • Purpose: Corrects filler words, false starts, and spoken punctuation
  1. Install llama.cpp or download the llama-server binary
  2. Download a GGUF model file from the Hugging Face repo above
  3. Start the server:
llama-server -hf unsloth/gemma-4-E2B-it-GGUF

Or with a local file:

llama-server -m path/to/gemma-4-e2b-it-Q4_K_M.gguf --port 8080
  1. In OpenFlow, go to Settings > Models
  2. Set the cleanup model provider to "Local llama.cpp server"
  3. Set the base URL to http://127.0.0.1:8080/v1
  4. The app will automatically connect and use it for text cleanup

Option B: Rule-Based Only (No LLM Required)

If you don't have a local LLM server running, OpenFlow automatically falls back to rule-based cleanup, which handles:

  • Filler word removal (um, uh, like, you know)
  • Correction handling (I mean, actually, scratch that)
  • Spoken punctuation (comma, period, question mark)
  • Formatting commands (new line, bullet point)
  • Basic capitalization and sentence punctuation

This works entirely offline with no additional setup.

Model Storage Location

Cleanup models are stored in:

Windows: %APPDATA%\openflow\models\cleanup\

Checking Model Status

The sidebar shows "Local Whisper Active" with the engine name when transcription is running. The Settings > Models page shows cleanup model status and connection state.