@bOt

@bOt

The original post: /r/localllama by /u/HauntingMoment on 2025-01-06 12:58:59.

Hi Guys! Very excited to share Lighteval, the evaluation framework we use internally at Hugging Face. Here are the key features:

Python API with training / eval loop: Simple integration with the Python API, easily integrate Lighteval into your training loop!
Speed: Use vllm as a backend for fast evals.
Completeness: Choose from multiple backends to launch models from almost any provider and compare closed and open-source models at the speed of light. You can choose from local backends (transformers, vllm, tgi) or API providers (litellm, inference endpoints)
Seamless Storage: Save results in S3 or Hugging Face Datasets.
Custom Tasks: Easily add custom tasks.
Versatility: Tons of metrics and tasks ready to go.

Here is how to get started fast and evaluate llama-3.1-70B-Instruct on the gsm8k benchmark and compare results with openai’s o1-mini!

pip install lighteval[vllm,litellm]
lighteval vllm "pretrained=meta-llama/Llama-3.1-70B-Instruct,dtype=bfloat16" "lighteval|gsm8k|5|1" --use-chat-template
lighteval endpoint litellm "o1-mini" "lighteval|gsm8k|5|1" --use-chat-template

If you have strong opinions on evaluation and think there are still things missing, don’t hesitate to help us; we would be delighted to have your help and build what will help us get better and safer AI.

Lighteval: the Evaluation framework from Hugging Face

Lighteval: the Evaluation framework from Hugging Face

The original post: /r/localllama by /u/HauntingMoment on 2025-01-06 12:58:59.