@PotomacBobIn a way, all of the image 'existed', it existed as part of thousands of other images. The software knows that thing X ('Spaceship') is in these 10,000 images and not in those 999,990,000 images. It can figure out what's common in the 'spaceship' images and make a spaceship out of them. Add terms and the elements blend together (e.g. 'A spaceship in the middle of Times Square, in the style of Vincent Van Gogh' will probably get you about what you expect from a well-tuned generator).
Generators tend (right now) to be horrible with hands, but since that's a focus area, it'll probably change.
Audio fakery is much older (about a decade or more).
Similarly, face replacement is much similar - given 100 or so images of someone's face, it's reasonably possible to map their face and put it on some other body reliably. So the resulting video is plausibly that person doing something they never did.
Same technology that was used in Jurassic Park, but it cost millions then. Now it's downscaled.
Google 'Pope in a puffer jacket'. That image is a great cautionary story.
Humans tend to assume that, if you see it, it exists, but we've learned that that's not true in terms of movies. We're going to have to learn that it's not true in terms of everything, and figure out how to create universes of trusted sources.
Or we're going to screw it all up and everyone's going to live in their own totally bogus information silos.
Optimist, pessimist.