Stability AI’s newest model for image generation is Stable Cascade promises to be faster and more powerful than its industry-leading predecessor, Stable Diffusion, which is the basis of many other text-to-image generation AI tools.
Stable Cascade can generate photos and give variations of the exact image it created, or try to increase an existing picture’s resolution. Other text-to-image editing features include inpainting and outpainting, where the model will fill edit only a specific part of the image, as well as canny edge, where users can make a new photo just by using the edges of an existing picture.
The new model is available on GitHub for researchers but not commercial use, and brings more options even as companies like Google and even Apple release their own image generation models.
Unlike Stability’s flagship Stable Diffusion models, Stable Cascade isn’t one large language model — it’s three different models that rely on the Würstchen architecture, The first stage, stage C, compresses text prompts into latents (or smaller pieces of code) that are then passed to stages A and B to decode the request.
Breaking the requests into smaller bits compresses the request to require less memory (and fewer hours of training on those hard-to-find GPUs) and run faster. while performing better “in both prompt alignment and aesthetic quality.” It took about 10 seconds to create an image compared to 22 seconds for the SDXL model used currently.
Stability AI helped popularize the stable diffusion method and has also been the subject of several lawsuits alleging Stable Diffusion trained on copyrighted data without permission from rights holders — a UK lawsuit by Getty Images against Stability AI is scheduled to go to trial in December. It began offering commercial licenses through a subscription in December, which the company said was necessary to help fund its research.