New LLM benchmark: llmtester

in #ailast month

image.png

I just published a new LLM Benchmark tool called llmtester.

Over 50,000 tests over 13 categories!

This is the easiest way to benchmark LLM models and see where they go wrong.

  • Interactive CLI - Keyboard-driven benchmark selection and configuration
  • Multi-Provider Support - OpenAI, Anthropic, Together.ai, Groq, Fireworks AI, Perplexity, OpenRouter, and any OpenAI-compatible API
  • LLM-as-Judge - Optional secondary model evaluation for code, math, SQL, bash, and truthfulness benchmarks
  • Progress Tracking - Resume interrupted evaluations from where you left off
  • Result Explorer - Built-in TUI to browse past results, filter by pass/fail, and inspect individual responses
  • Config Persistence - Saves provider, endpoint, and model settings between runs
  • Shuffle & Sampling - Run a percentage of each benchmark with optional shuffling for diverse distribution

Run the latest version without any install with npx llmtester

What makes llmtester so awesome?

50,000+ tests, second LLM judges results on more complex tests, ability to fully explore past tests.

I got tired of the doing everything manually, so built a full test runner.

Includes tests in many domains from Grade school math, advanced math, reasoning, programming, and even sql. Some of these tests are impossible to grade without a judge, llmtester will handle this all for you!

image.png

image.png

image.png

You can find the package on npm and github.

Sort:  

Why are you downvoting me?

Loading...
Loading...
Loading...

Hello @themarkymark, I hope you're doing well. I noticed I've been getting downvotes from @buildawhale, and I think I might be on your blacklist. I really want to understand what I did wrong so I can fix it. I have been trying to share original content on Hive, and I genuinely want to contribute positively to this community. I would greatly appreciate it if you could give me a chance to make things right. If there's anything specific you need from me, I'm ready to do it. Thank you for your time.
Cc: @hivewatcher

 10 days ago Reveal Comment
Loading...

Will this work with Local LLMs too?

Yes.

Loading...
Loading...
Loading...
Loading...
 last month Reveal Comment
 last month Reveal Comment
Loading...
 last month Reveal Comment
Loading...
Loading...
 last month Reveal Comment
 last month Reveal Comment
Loading...
Loading...
Loading...
Loading...
 last month Reveal Comment
Loading...
Loading...

I need to play around with LLMs a bit more. I was joking with my in laws who are getting ready to move to a community with an HOA that I was going to load the bylaws into an LLM so they could just ask questions about what is allowed and what isn't instead of thumbing through the whole boring thing.

Look into rag

 last month Reveal Comment
 last month Reveal Comment
 last month Reveal Comment
 last month Reveal Comment
 last month Reveal Comment
 last month Reveal Comment
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
 last month Reveal Comment

Tsc tsc you are playing with fire! The fire will consume you . Your house will burn.

Loading...
 24 days ago Reveal Comment
 24 days ago Reveal Comment
 24 days ago Reveal Comment
 24 days ago Reveal Comment
 23 days ago Reveal Comment
 22 days ago Reveal Comment
 22 days ago Reveal Comment
 22 days ago Reveal Comment
 14 days ago Reveal Comment
 13 days ago Reveal Comment
Loading...
 4 days ago Reveal Comment
 4 days ago Reveal Comment
 3 days ago Reveal Comment
Loading...
Loading...
Loading...
 26 days ago Reveal Comment