@irvmullWhile picking a single seed is fine, it's still random. Using a different seed would get you a different picture (as I demonstrated). There's no one 'Donald Trump' or 'Joe Biden' picture from Stable Diffusion, there are tens of thousands, and a very quick test verifies that different seeds produce wildly different pictures.
The existence of a picture that seems biased within a random range doesn't prove that the collection is biased. To conclusively prove bias, one would need to generate all possible pictures within the random seeding limits and compare the level of bias. Right now, we have the equivalent of walking into a library with ten thousand books, picking up a book at random, pointing to a picture of a person, and claiming that the library overrepresents whoever's picture has been found. Someone else walking in, picking up ten books, and finding ten pictures of other people doesn't disprove the first claim, because the library has such a large number of remaining books that the sample set is too small to have any statistical validity.
In my random subsample (which is statistically meaningless, since the range is far, far larger than the few dozen I generated) I found no biased pictures of either figure. That is proof of nothing, of course (as noted), but it does at least create a equal and opposite counterexample.
Again, my supposition is that the input data is factually
skewed, since the algorithm was trained on easily accessible content on the internet over the past few years, and parody pictures of Donald Trump abound. Indeed, to avoid that sort of skew, some person would have needed to look at every training image and remove or retag parody images. But they would have to do that for every political figure, or else the resulting model would be biased in favor of those for whom parody images had been pruned or retagged. The mere use of a potentially skewed training database doesn't mean there's political bias on the part of anyone associated with Stable Diffusion, nor that the dataset was chosen in a biased manner. They chose the largest available inexpensive-to-obtain dataset.
Also, Stable Diffusion is a family of models and an architecture, not just one. There are at least four different datasets (which produce significantly different results) which could plausibly be called 'official'. There are thousands of third-party models which fall within the family of models and which anyone could plausibly also call 'Stable Diffusion'. Anyone could train a model to be politically biased by adding a large number of negative images.
The overall point here is that there are so many possible sources of bias in the process, most of which are not intentionally political, that describing the model as 'politically active' when one particular random seed generates a parody image of one person is an enormous stretch. By that same logic, I could find an image that I perceive as negatively biased towards Donald Trump that has appeared on foxnews.com (there are plenty) and, on that basis, claim that foxnews.com is clearly 'politically active' against Donald Trump.
Note for the moderators: I hope this doesn't cross into 'politics,' notwithstanding that we are discussing politicians and biased images of them. The point of the discussion has nothing to do with 'politics,' except perhaps for the perception of bias. It has to do with understanding the models and what is and is not evidence of bias.
Mind you, it is certainly possible that, in fact, someone on the Stable Diffusion team intentionally trained the model on a large supply of images intended to reflect negatively on Donald Trump (or Joe Biden, or whoever), and that the resulting published model is more likely to generate negative images of that person. The problem is that it's nigh impossible to prove that it happened at all, who did it if it happened, why they did it, or anything else.
But the starting point would most likely be to generate a enough images created with a specific version of the model using a fixed search string and random seeds (I suspect sequential would be fine, but randomized would likely be superior) to create a statistically valid pool of images, inspect that for bias, then generate the same number of images of someone else, inspect that for bias, and report on the results. That would create at least a reasonable claim towards bias, but would still not tell us whether the bias was 'politically active,' nor whether that political activity was on the part of the model trainers, the internet as a whole, or something else.
It might even be a false positive or negative; anyone who's paid attention to polling knows that a bad sample produces an inaccurate poll, and there is no claim made that the random seeds to Stable Diffusion actually produce a similarly random sampling of the potential outputs. Perhaps, for some unknown reason, the negative images collect in a certain range of seed values. If so, a sample including that range will show bias at an enormous rate, while any other sample would show no bias.