Stability.ai has come up with an important announcement regarding the launch of an AI image generator. Stable Diffusion, an image generation software that uses consumer level hardware, is soon going to be in the public domain.
Announcement indicates that the AI model will operate on “under 10GB of VRAM on consumer GPUs.” Essentially, you can run it on a 10GB Nvidia GeForce RTX 3080, an AMD Radeon RX 6700 or potentially something less powerful. However, there’s no information regarding the minimum graphics requirements. That’s still in contrast with a lot of AI generation models, which tend to be hosted by servers since they take several Nvidia A100 GPUs to run.
The image generator has been led through development by Robin Rombach of LMU Munich’s Machine Vision & Learning Research group, and Patrick Esser who helped develop video editing software, Runway.
Stable Diffusion is trained on Stability AI’s 4,000 A100 Ezra-1 AI ultracluster, with more than 10,000 beta testers generating 1.7 million images per day in order to explore this approach.
The main dataset for Stable Diffusion originates from the ensuing CLIP-based AI model LAION-Aesthetics, which filters the images based on how “beautiful” they are. I’m not exactly sure how beauty has been defined in this instance, however. LAION-Aesthetics selects and reprograms images from LAION 5B(opens in new tab)’s massive database, that was created to fix address the problem that datasets—such as the billions of image and text pairs used by Dall-E and CLIP—have not been made openly available.
The AI can create images at 512×512 pixel resolution in only a few seconds, still I assume upscaling to larger images may take a bit longer. There’s still a long way to go, with the Stability AI team still researching the current method of image generation.
The great news is that “this will provide the template for the release of many open models we are currently training to unlock human potential.”
What a time to be alive, hey?
“We look forward to the open ecosystem that will emerge around this and further models to truly explore the boundaries of latent space,” the announcement says.
There’s also a note at the bottom from LAION’s Organizational Lead & Researcher, Christoph Schuhmann, who says: “With this project we continue to pursue our mission to make state of the art machine learning accessible for people from all over the world. 100% open. 100% free.”