Shaberi3 ベンチマーク
LLMパフォーマンス比較ダッシュボード
Upload CSV File
あなたのベンチマーク結果を可視化したい場合は、shaberi/results/totals.csvをアップロードしてください
Select Models
Select All
Deselect All
chatgpt-4o-latest
claude-3-5-sonnet-20241022
o1-mini-2024-09-12
gpt-4o-2024-05-13
gemini-1.5-pro-exp-0827
EZO-Qwen2.5-72B-Instruct-Q8_0
Athene-V2-Agent:Q4_K_M
Athene-V2-Chat:Q4_K_M
claude-3-5-sonnet-20240620
claude-3-opus-20240229
EZO-Qwen2.5-72B-Instruct-Q4_K_M
qwen2.5:72b-instruct-q8_0
gemini-1.5-pro-002
Llama-3.1-70B-EZO-1.1-it-Q8_0
gpt-4o-2024-08-06
Llama-3.1-Nemotron-70B-Instruct-Q4_K_L
qwen2.5:32b-instruct-q8_0
claude-3-5-haiku-20241022
gpt-4o-mini-2024-07-18
gemini-1.5-pro
EZO-Qwen2.5-32B-Instruct-Q4_K_M
EZO-Qwen2.5-32B-Instruct-Q8_0
gemini-1.5-flash-002
Athene-70B-Q8_0
gemma-2-9B-it-SimPO-Q8_0
gemini-1.5-flash-exp-0827
EZO-Humanities-9B-gemma-2-it-f16
gemma-2-9B-it-SimPO-f16
Mistral-Large-Instruct-2407-Q4_K_M
qwen2:72b-instruct-q4_K_M
c4ai-aya-expanse-32b
gemma-2-27b-it.f16.Q8_0
EZO-Common-9B-gemma-2-it-f16
magnum-v2-72b-Q8_0
qwen2.5:14b-instruct-fp16
gemini-1.5-flash
SuperNova-Medius-f16
deepseek-chat
gemma-2-9b-it.f16.Q8_0
gemini-1.5-flash-8b
gemini-1.5-flash-8b-exp-0827
qwen2.5-coder:32b-instruct-q8_0
claude-3-haiku-20240307
Llama-3.1-Swallow-70B-Instruct-v0.1-Q4_K_M
qwen2.5-coder:32b-instruct-q4_K_M
c4ai-command-r-plus-Q4_K_M
c4ai-aya-expanse-8b
Tiger-Gemma-9B-v3-f16
qwen2.5:7b-instruct-fp16
gpt-3.5-turbo-0125
Llama-3.1-Swallow-8B-Instruct-v0.1-f16
llama3.2-vision:90b-instruct-q4_K_M
Llama-3.1-Swallow-8B-Instruct-v0.2-Q8_0
command-r-plus:104b-08-2024-q4_K_M
mistral-small:22b-instruct-2409-q8_0
Llama-3.1-Swallow-8B-Instruct-v0.2-Q4_K_M
mixtral:8x22b-instruct-v0.1-q4_K_M
llama3.1:70b-instruct-q4_K_M
EZO-Common-T2-2B-gemma-2-it-f16
Ministral-8B-Instruct-2410-HF-f16
gemma-2-baku-2b-it-f16
EZO-gemma-2-2b-jpn-it-f16
ministral-8b-instruct-2410_q4km
EZO-gemma-2-2b-jpn-it-f16-mod
lfm-40b
llm-jp-3-13b-instruct-f16
gemma2:2b-instruct-fp16
gemma-2-baku-2b-it-f16-mod
qwen2.5-coder:7b-instruct-fp16
gemma-2-2b-jpn-it-f16
Mistral-NeMo-Minitron-8B-Instruct-f16
mixtral:8x7b-instruct-v0.1-q8_0
llm-jp-3-3.7b-instruct-EZO-Humanities-f16
qwen2.5:3b-instruct-fp16
Borea-Phi-3.5-mini-Instruct-Common-f16
Borea-Phi-3.5-mini-Instruct-Jp-f16
gemma-2-2b-jpn-it-f16-mod
llm-jp-3-3.7b-instruct-EZO-Common-f16
Phi-3.5-mini-instruct-f16
llama3.1:8b-instruct-fp16
llm-jp-3-3.7b-instruct-f16
llama3.2-vision:11b-instruct-q4_K_M
granite3-dense:8b-instruct-fp16
llama3.2:3b-instruct-fp16
qwen2.5:1.5b-instruct-fp16
llm-jp-3-1.8b-instruct-f16
granite3-dense:2b-instruct-fp16
llm-jp-3-172b-beta1-instruct-Q3_K_S
raspberry-3B-f16
granite3-moe:3b-instruct-fp16
llama3.2:1b-instruct-fp16
qwen2.5:0.5b-instruct-fp16
granite3-moe:1b-instruct-fp16
SmolLM2-1.7B-Instruct:fp16
Select Metrics
Weighted Mean
ELYZA-tasks-100
MT-Bench
Tengu-Bench
Chart View
Table View
chatgpt-4o-latest
claude-3-5-sonnet-20241022
o1-mini-2024-09-12
gpt-4o-2024-05-13
gemini-1.5-pro-exp-0827
7.5
7.9
8.3
9.0
Score
Evaluator : gemini-1.5-flash-exp-0827
Weighted Mean
ELYZA-tasks-100
MT-Bench
Tengu-Bench
7.5