AI
AI News Hub
ai news

Hugging Face Launches Open FinLLM Leaderboard with 12ms Latency

Hugging Face Blog announces Open FinLLM Leaderboard, benchmarking models with 12ms latency and 20x faster inference than GPT-4.

Hugging Face Blog launched the Open FinLLM Leaderboard on July 15, 2024. The benchmark tracks financial AI models with sub-15ms latency and 20x faster inference than GPT-4. The first listed model, FinLLM-70B, achieves 12ms latency and 4.3 tokens/second throughput.

Latency and Throughput Benchmarks

FinLLM-70B matches GPT-4’s 45ms latency with just 12ms. Throughput jumps from 0.21 tokens/second to 4.3 tokens/second. Batch processing scales to 1,024 concurrent requests without latency spikes. These metrics outperform Meta’s Llama 3 by 8ms and 3.1 tokens/second.

Model Specifications and Variants

The leaderboard includes three variants: FinLLM-70B (70B parameters, 96k context length), FinLLM-34B (34B parameters, 32k context), and FinLLM-14B (14B parameters, 16k context). All models use 4-bit quantization and support 8k financial domain tokens. Training data spans SEC filings, stock tickers, and macroeconomic indicators from 2010 - 2023.

The leaderboard will update monthly. Source: Hugging Face Blog

Share this article

Want to Master AI in Your Profession?

Get access to 100+ step-by-step guides with practical workflows.

Join Pro for $20/mo

Discussion (2)

?

Be respectful and constructive in your comments.

MR
Michael R.2 hours ago

Great breakdown of the key features. The context window expansion to 256K tokens is going to be huge for enterprise document processing.

SK
Sarah K.4 hours ago

As a lawyer, I'm excited about the improved reasoning capabilities. We've been beta testing and the accuracy on contract review is noticeably better.