Microsoft is expanding its roster of in-house AI models, releasing a new speech-to-text system and making two existing models broadly available to developers for the first time.
The moves by Microsoft AI (MAI) are part of a broader effort by the company to expand its proprietary AI capabilities beyond its partnership with OpenAI, giving Microsoft more control over its own destiny in the competition against Google, Amazon, and others.
Microsoft announced MAI-Transcribe-1 on Thursday, a speech-to-text model that it says is the most accurate currently available. The company also released its existing voice and image generation models, known as MAI-Voice-1 and MAI-Image-2, for broad commercial use.
It’s Microsoft’s first major model release since a March reorganization, announced by CEO Satya Nadella, in which Microsoft AI CEO Mustafa Suleyman shifted away from day-to-day Copilot oversight to focus on frontier model development and superintelligence.
Suleyman told The Verge that the transcription model runs at “half the GPU cost of the other state-of-the-art models.” He told VentureBeat that the model was built by a team of just 10 people, and that Microsoft plans to eventually build a frontier large language model to be “completely independent” if needed.
Microsoft also recently hired former Allen Institute for CEO Ali Farhadi and other top AI researchers from the Seattle-based institute to further bolster Suleyman’s team, as GeekWire reported last week.
MAI-Transcribe-1 is designed to handle noisy real-world conditions such as call centers and conference rooms, and Microsoft says it is testing integrations with Copilot and Teams. Microsoft says it offers the best price-performance of any large cloud provider, competing directly with OpenAI’s Whisper and Google’s Gemini on the FLEURS benchmark.
In a blog post, Suleyman called the model “not just the most accurate but also lightning fast.”
MAI-Voice-1 generates natural-sounding speech and now lets developers create custom voices from short snippets of sample audio. MAI-Image-2 ranks in the top three on the Arena.ai image generation leaderboard and is rolling out in Bing and PowerPoint.
All three are available on the Microsoft Foundry developer AI platform and MAI Playground.
Read the full article here

