- Best-in-class reasoning and writing
- Strong ecosystem and integrations
- Advanced multimodal capabilities
Google is introducing an AI music generator called Lyria 3 to the Gemini app. The model can generate 30-second music tracks with vocals.
Artificial Analysis has released version 2.0 of its AA-WER speech-to-text benchmark, which measures the accuracy of speech recognition models. In the overall ranking, ElevenLabs’ Scribe v2 takes first place with a word error rate of just 2.3%.
AI search engine Perplexity has introduced two new text-embedding models that aim to match or outperform Google’s and Alibaba’s offerings while using only a fraction of the usual memory footprint. Both models are open source.
OpenAI says the programming benchmark SWE-bench Verified has lost much of its value as a reliable measure of coding ability. The company cites two main reasons. First, an internal review found that at least 59.4% of the evaluated tasks were flawed, with tests rejecting correct solutions because they enforce specific implementation details or check for undocumented behavior.
With Gemini 3.1 Pro, Google aims to significantly strengthen the core intelligence of its model family. On a demanding reasoning benchmark, performance has more than doubled compared with its predecessor. That said, benchmarks are still just benchmarks.
Anthropic has released an updated mid-tier AI model, Sonnet, with a primary focus on stronger coding performance, better instruction-following, and improved computer-use capabilities.
Alibaba’s cloud division has unveiled a new open-source AI model, Qwen-3.5, according to the South China Morning Post (SCMP).
The Chinese AI company MiniMax, based in Shanghai, has released its new open-weights model M2.5 under the MIT license.