How AI image generators work, like DALL-E, Lensa and stable diffusion



AI

Warning: This graphic requires JavaScript. Please enable JavaScript for the best experience.

A strange and powerful collaborator is waiting for you. Offer it just a few words, and it will create an original scene, based on your description.

This is artificial-intelligence-generated imagery, a rapidly emerging technology now in the hands of anyone with a smart phone.

The results can be astonishing: crisp, beautiful, fantastical and sometimes eerily realistic. But they can also be muddy and grotesque: warped faces, gobbledygook street signs and distorted architecture.

How does it work? Keep scrolling to learn step by step how the process unfolds.

a photo

Van Gogh

stained glass

a magazine cover

Like many frontier technologies, AI-generated artwork raises a host of knotty legal, ethical and moral issues. The raw data used to train the models is drawn straight from the internet, causing image generators to parrot many of the biases found online. That means they may reinforce incorrect assumptions about race, class, age and gender.

The data sets used for training also often include copyrighted images. This outrages some artists and photographers whose work is ingested into the computer without their permission or compensation.

[AI selfies — and their critics — are taking the internet by storm]

Meanwhile, the risk of creating and amplifying disinformation is enormous. Which is why it is important to understand how the technology actually works, whether to create a Van Gogh that the artist never painted, or a scene from the Jan. 6 attack on the U.S. Capitol that never appeared in any photographer’s viewfinder.

Faster than society can reckon with and resolve these issues, artificial intelligence technologies are racing ahead.

About this story

The Washington Post generated the AI images shown in this article using stable diffusion 2.0. Each image was generated using the same settings and seed, meaning the “noise” used as the starting point was the same for each image.

The animations on this page show the actual de-noising process. We tweaked the stable diffusion code to save intermediate images as the de-noising process occurred.

In addition to interviewing researchers and examining the diffusion model in detail, The Washington Post analyzed the images used to train stable diffusion for the database section of this story. The images selected for this explainer were either from stable diffusion’s database and in the public domain or licensed by The Post, or closely resembled the images. The database of images used to train stable diffusion includes copyrighted images that we do not have the rights to publish.

Editing by Karly Domb Sadof, Reuben Fischer-Baum and Ann Gerhart. Copy-editing by Paola Ruano.