MAI-Voice-1 generates “natural, realistic speech, rich with nuance, emotional range, and expression,” according to Microsoft, and was built to preserve speaker identity across long-form content. The model can generate a minute of audio in “a single second,” and its low GPU usage makes it speedy and affordable.
MAI-Image-2 has “turbocharged” image generation performance and speed on Copilot, according to Redmond. It debuted among the top three model families on the Arena.ai leaderboard, and will soon be rolled out in Bing and PowerPoint.
Microsoft said the model was created with the aid of photographers, designers, and visual storytellers who “demand natural lighting, accurate skin tones, and texture,” as well as requiring clear text for graphics, layouts, and diagrams.
Read the full article here

