@Switch BlaydeWhat is "making a copy" that is a violation of copyright law in the internet age?
Every time I read an article, blog post, etc. on my computer, a copy is produced. This is not a violation of copyright law; it is expected. In the case of e-books, for example, my use may be constrained by license agreements and, in some cases, digital rights management software. But the point is, just making a copy is not a copyright violation.
As I understand it, the key issue in copyright challenges to the use of "works" to train AI involves the question of a type of "fair use," specifically, whether the use is "transformative." My understanding is that all of these issues are still unresolved in the various court cases -- there has been no ruling, and certainly not a controlling precedent from the Supreme Court.
However, the claim that "Because fair use is determined on a case-by-case basis, no broad statement can be made about when generative AI qualifies for fair use," seems mostly false as far as the use of materials to train AIs is concerned. The argument on the side of fair use is that the use is transformative -- the process analyzes the material, extracts patterns and principles, then does the same with vast quantities of other material (much of which may not be copyrighted) and the end result is a large language model -- an "AI." I get this is vastly oversimplified, but I think that is the gist. If this argument ultimately prevails, it will, indeed, be a broad statement that approves of the use of copyrighted materials for training purposes of AI.
And, if AI training is allowed, the production of work with AI will not automatically be copyright violation, even if it mimics the style of a particular artist. I suspect those cases would get into the weeds of "similarity" much like some of the cases based on the similarity of music.
Going back to the training question, I suspect that this is going to come down to two things: the "transformative use" question and relative harm.
What is the balance between the harm to the public -- to society -- of hindering or even stopping development of this technology vs. the harm to any individual copyright holder of allowing it to go forward?
Since the use for training involves only one copy, the loss to the holder is miniscule. The loss to society is potentially huge, and the burden on the companies to compensate copyright holders individually would be huge. (And, of course, the companies and large copyright holders are already negotiating and making licensing agreements to address this issue.)
Of course, the horse is already out of the barn, and whether this affects legal analysis, it will affect the decisions and strategies of the participants, especially the plaintiffs. Some will likely choose not to pursue a long and expensive legal case for little to no effect in the real world.
Edit: This is obviously from a United States perspective. While I suspect that what the US does will carry significant weight on the final international framework, it will not necessarily be controlling. See differences in privacy laws, "subversive" materials, etc. that are different between countries and require compliance schemes by the big international players.