Google Launches Gemini 3.1 Flash TTS with Audio Tags, 70+ Languages, and Multi-Speaker Support

Details: By Alex Rowland; Category: Platforms; 3 m; 16 April 2026; 147

Google is rolling out its new text-to-speech model based on Gemini 3.1 Flash. According to Google, it delivers the company’s most natural and expressive speech synthesis so far. A key new feature is so-called audio tags, which let developers control speaking style, pace, tone, and accent through text prompts. The model supports more than 70 languages and can generate multi-speaker dialogues.

On the Artificial Analysis leaderboard, the model reaches an Elo score of 1,211 and is rated as offering a particularly strong price-to-quality ratio. In overall quality, it ranks ahead of ElevenLabs v3 and just behind Inworld 1.5 Max.

Gemini 3.1 Flash TTS includes a free tier, where Google may use the data for product improvement. In the paid tier, text input costs $1.00 per million tokens and audio output costs $20.00 per million tokens. In batch mode, pricing drops to $0.50 per million tokens for text input and $10.00 per million tokens for audio output. With the paid tier, data is not used for product improvement.

Gemini 3.1 Flash TTS is available now in preview through the Gemini API, Vertex AI for enterprise customers, and Google Vids for Workspace users. It can also be tested for free in Google AI Studio. All generated audio files are marked with Google’s SynthID watermark to help identify AI-generated content.

Google is positioning Gemini 3.1 Flash TTS as a strong developer-focused alternative in the AI voice market by combining expressive output, controllable speech parameters, and aggressive pricing. Its multilingual support, multi-speaker capability, and built-in watermarking make it especially relevant for scalable enterprise and media workflows.

About The Hosts

Alex Rowland

AI Industry Analyst

Is an AI industry analyst covering major AI platforms, enterprise adoption, and strategic moves by Big Tech companies. His work focuses on how AI systems are deployed at scale and how they reshape products, markets, and user behavior.

AI News

Accenture Tracks AI Tool Usage and Ties Adoption to Promotions

Adobe Firefly Introduces Unlimited AI Image and Video Generation for Subscribers

Adobe Unveils CX Enterprise AI Agent Platform as It Searches for a New CEO

AGI May Arrive by 2026–2027, Warns Anthropic CEO Dario Amodei

AI & Society

AI Boom Drives Cybersecurity Hiring Despite Tech Sector Layoffs

Anthropic Expands Claude With New AI Tools for Legal Professionals

ChatGPT Adds Job Search and Resume Tools for Career Support

Chinese Court Rules Companies Cannot Fire Workers Solely for Being Replaced by AI

AI Insights

Adobe Reinvents Document Work with Acrobat Studio and AI

AI agents could disrupt ads and reshape internet commerce

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

AI as a Toy: Why Humanity Always Misuses New Technology First

Google Launches Gemini 3.1 Flash TTS with Audio Tags, 70+ Languages, and Multi-Speaker Support

About The Hosts

More From Alex Rowland

Industry

Tech Companies Use AI to Explain Layoffs as Job Cuts Rise Across the Sector

Models

Anthropic Launches Claude Fable 5 and Private Mythos 5 AI Models

Analysis

China’s AI Superapps Enter a New Era of Digital Competition

Robotics

Waymo Recalls 3,800 Robotaxis After Flooded Road Incident in Austin

Policy & Security

AI-Powered Identity Theft Is Becoming a Major Fraud Threat in the U.S.

Platforms

Anthropic Doubles Claude Code Limits After SpaceX Compute Deal

Platforms

OpenAI's First Hardware Device Will Be an AI Smartphone — Mass Production Could Start in 2027

Robotics

Japan Airlines Tests Humanoid Robots at Haneda Airport to Combat Labor Shortage

Policy & Security

Anthropic Launches Claude Security: AI-Powered Code Vulnerability Scanner Powered by Opus 4.7

Platforms

Alphabet Beats Estimates as Google Cloud and AI Drive Record Growth

Categories

AI News

Categories

AI & Society

Categories

AI Insights

Google Launches Gemini 3.1 Flash TTS with Audio Tags, 70+ Languages, and Multi-Speaker Support

About The Hosts

More From Alex Rowland