The hard part of running local models is no longer getting them running. Developers who use Ollama now routinely have a dozen weights pulled and no fast way to know which one handles a specific task best.

Ollama Model Tester, an MIT-licensed CLI released by Ulysses Tenn on GitHub, solves that directly. Give it a prompt and it fires that prompt against whichever local models you select, repeats each run N times at a chosen temperature, and writes responses plus Ollama metadata (token counts, timing) to a structured folder keyed on the prompt. Same prompt, different model, same folder: comparison is the default output shape, not an afterthought.

The tool requires only Python 3.7 and a running Ollama instance. No pip install. Fully scriptable via flags once you drop the interactive setup.

This is the unglamorous infrastructure the local-models movement actually needs. Model releases are outpacing the tooling to evaluate them for real workloads. If your team has a half-dozen fine-tunes or quantized weights in rotation, a structured empirical test beats reading benchmark leaderboards that were not run on your hardware or your prompts.

Ulysses Tenn on GitHub (github.com/ulyssestenn/omt), 2026-06-04.