New LLM benchmark: llmtester

themarkymark in #ai • last month

I just published a new LLM Benchmark tool called llmtester.

Over 50,000 tests over 13 categories!

This is the easiest way to benchmark LLM models and see where they go wrong.

Interactive CLI - Keyboard-driven benchmark selection and configuration
Multi-Provider Support - OpenAI, Anthropic, Together.ai, Groq, Fireworks AI, Perplexity, OpenRouter, and any OpenAI-compatible API
LLM-as-Judge - Optional secondary model evaluation for code, math, SQL, bash, and truthfulness benchmarks
Progress Tracking - Resume interrupted evaluations from where you left off
Result Explorer - Built-in TUI to browse past results, filter by pass/fail, and inspect individual responses
Config Persistence - Saves provider, endpoint, and model settings between runs
Shuffle & Sampling - Run a percentage of each benchmark with optional shuffling for diverse distribution

Run the latest version without any install with npx llmtester

What makes llmtester so awesome?

50,000+ tests, second LLM judges results on more complex tests, ability to fully explore past tests.

I got tired of the doing everything manually, so built a full test runner.

Includes tests in many domains from Grade school math, advanced math, reasoning, programming, and even sql. Some of these tests are impossible to grade without a judge, llmtester will handle this all for you!

You can find the package on npm and github.

#opensource #hive-engine #vyb #pob #cent #neoxian

last month in #ai by themarkymark

0.00 BPC

Sort:

Trending

[-]

jorgebgt 29 days ago

Why are you downvoting me?

0.00 BPC

3 votes

[-]

aftabirshad 10 days ago

Hello @themarkymark, I hope you're doing well. I noticed I've been getting downvotes from @buildawhale, and I think I might be on your blacklist. I really want to understand what I did wrong so I can fix it. I have been trying to share original content on Hive, and I genuinely want to contribute positively to this community. I would greatly appreciate it if you could give me a chance to make things right. If there's anything specific you need from me, I'm ready to do it. Thank you for your time.
Cc: @hivewatcher

0.00 BPC

1 vote

[-]

kgakakillerg (1) 10 days ago

0.00 BPC