alerting, prometheus, secret management, management endpoints, ui, prompt management, finetuning, batch
New / Updated Models​
- Mistral large pricing - https://github.com/BerriAI/litellm/pull/7452
- Cohere command-r7b-12-2024 pricing - https://github.com/BerriAI/litellm/pull/7553/files
- Voyage - new models, prices and context window information - https://github.com/BerriAI/litellm/pull/7472
- Anthropic - bump Bedrock claude-3-5-haiku max_output_tokens to 8192
General Proxy Improvements​
- Health check support for realtime models
- Support calling Azure realtime routes via virtual keys
- Support custom tokenizer on
/utils/token_counter - useful when checking token count for self-hosted models
- Request Prioritization - support on
/v1/completion endpoint as well
LLM Translation Improvements​
- Deepgram STT support. Start Here
- OpenAI Moderations -
omni-moderation-latest support. Start Here
- Azure O1 - fake streaming support. This ensures if a
stream=true is passed, the response is streamed. Start Here
- Anthropic - non-whitespace char stop sequence handling - PR
- Azure OpenAI - support Entra ID username + password based auth. Start Here
- LM Studio - embedding route support. Start Here
- WatsonX - ZenAPIKeyAuth support. Start Here
Prompt Management Improvements​
- Langfuse integration
- HumanLoop integration
- Support for using load balanced models
- Support for loading optional params from prompt manager
Start Here
Finetuning + Batch APIs Improvements​
- Improved unified endpoint support for Vertex AI finetuning - PR
- Add support for retrieving vertex api batch jobs - PR
NEW Alerting Integration​
PagerDuty Alerting Integration.
Handles two types of alerts:
- High LLM API Failure Rate. Configure X fails in Y seconds to trigger an alert.
- High Number of Hanging LLM Requests. Configure X hangs in Y seconds to trigger an alert.
Start Here
Prometheus Improvements