Home ยป Forum ยป Author Hangout

Forum: Author Hangout

Humorous AI weirdness

Soronel ๐Ÿšซ

I decided to give AI a try, ran something I'd already written through it to see how well its summary function works.

It somehow transformed a raw and primal ground-bound dragon mating fight, then fuck of the exhausted loser into "a beautiful mating flight".

Oh, and it completely refused to acknowledge one character being trained to shoot a pistol. Simply not a word about the topic. Although it did a good job describing her reaction to watching a rather horrific public execution.

I also did this with a different AI, asking it to summarize some computer code. I will admit everything it came up with was true and accurate, but it also completely missed on some very important requirements. If I were grading it, I'd say that one was worth about 75%.

The Outsider ๐Ÿšซ

@Soronel

Sounds like it's improving, but it's not there yet... I'll keep trying to coax my muse out of the dark corner she's hiding in before I go that route, though...

Switch Blayde ๐Ÿšซ

@Soronel

I'm not a fan of using AI to help my writing, especially critiquing my writing. And especially suggesting how to improve it. I hate the way the AI rewrites it.

I have used AI to come up with a simile. It gives me a bunch of awful ones, but when I keep asking I get a few good ones.

I did use it recently to come up with the title of my just published novel. I originally called it "The Accident." The opening line/paragraph in the novel is "I was an accident." But the names of the hero and heroine are Chance and Charity, respectively. The AI came up with "A Chance for Charity." Perfect title so that's what it is.

And as I said, I use Meta AI to create images for book covers.

Switch Blayde ๐Ÿšซ

@Soronel

Humorous AI weirdness

You know, when I first saw this thread I read the capital "I" in "AI" as a lower case "L" ("l") and thought it was going to be about the comedian Weird Al Yankovic.

Soronel ๐Ÿšซ

@Soronel

I also keep seeing pieces saying that computer programmers need to be worried, but every time I have tried AI in that arena the results are so far from what I actually wanted that it just left me scratching my head.

Replies:   REP
REP ๐Ÿšซ
Updated:

@Soronel

it just left me scratching my head.

Yeah, but why are you scratching your head?

An AI is fed data by humans to create a database. The human-created database includes fictional stories to define real life and human emotions. The AI 's programming starts with the assumption that all of the data in the database is complete and valid, which is a false assumption. The AI then tries to merge valid and invalid data into a summation that matches real life. The invalid data introduces errors into the summation. Since AIs cannot distinguish valid data from invalid data without human intervention, its summation will always be corrupted.

A human can discriminate between valid and invalid data and can reject invalid/incomplete data when creating their summation. Therefore, the summation is more accurate to life than what an AI will create.

Using an AI's output is okay if you are using the AI's summation as the input for a story, or similar use. For you can take the AI's summation and modify it to create a story that is close to real life.

I have read comments in this forum about AIs adding to their database from external sources. If that is true, the question for me is - since an AI can't distinguish between valid and invalid data, is the data being added to the database increasing or decreasing the degree of database corruption?

A human has experienced real life and real emotions. Putting those experiences into words does not provide an accurate description of the personal impact of pain, loss, joy, and other emotions on a person. Therefore, inputting a written description of the emotional impact on a person to an AI does not provide the AI with accurate information about the impact the emotions have on a person.

If you are scratching your head because an AI's output is not close to the real world, just remember the saying - Garbage In, Garbage Out. If you are scratching your head at the idea of an AI becoming a threat to someone's job, ask yourself who is writing the article you are reading. The article's writer is obviously a proponent of AI technology, or possibly, someone using an AI to write their article; if so, the person defined what the article was to say, so Garbage In, Garbage Out.

Replies:   awnlee jawking
awnlee jawking ๐Ÿšซ

@REP

The human-created database includes fictional stories to define real life and human emotions.

One of the stories I'm following on SOL has AI generated scenes. I've googled unique sentences from the story and found exact matches on social media.

I think it's likely the AI in question (identity unknown) was trained on social media. And we all know how accurate that is.

(The story contains the usual AI bloopers - character names wrong, timeline inconsistencies between scenes etc).

AJ

Replies:   Paladin_HGWT
Paladin_HGWT ๐Ÿšซ

@awnlee jawking

The story contains the usual AI bloopers - character names wrong, timeline inconsistencies between scenes etc.

Humans, and not just on SoL have done all of those things too...

I haven't read any AI cribbed stories, just a few bits, or listened to AI video "shorts" and am appalled by the foibles and completely wrong mess generated by AI.

What little I have seen produced by AI would, at best, get a 4 out of 10 rating, and I feel I am being generous.

That said, I do use the Thesaurus in Libra Office, but that is just a sometimes substitute for using one of several print Thesauruses on my shelf, among other books containing various similes, and alternative words.

I have not run any of my writing through anything other than a spelling (and pseudo grammar) checker. I have considered using Scrivner, or similar writing programs to review my writing, because my self-editing is not up to the standards I would prefer to achieve.

Lacking a proofreader, let alone a competent Editor, I am considering alternatives.

Even when I had a proofreader I didn't always accept all of their suggestions. In fact I probably rejected more than half of the suggestions; Excluding corrections of misspelled, or wrong words (their instead of There, etc.).

It is my story, NOT another's story. I don't want any unintended misspellings, or bad grammar.

I do think there is significant differences between running Your writing through an AI, and then You considering making changes; compared to a person feeding prompts into an AI which then "writes" a story (garbage)...

Replies:   awnlee jawking
awnlee jawking ๐Ÿšซ

@Paladin_HGWT

compared to a person feeding prompts into an AI which then "writes" a story (garbage)...

I think that's exactly what the author is doing, only the readers rate it very far from garbage despite the faults.

AJ

TheDarkKnight ๐Ÿšซ

@Soronel

I'm not interested in trying to use AI (or A1 as Linda McMahon, our Secretary of Education calls it) to help me. Maybe I'm just masochistic, but I enjoy the challenge of writing, even when it's hard.

Soronel ๐Ÿšซ

@Soronel

I realized I forgot the bit I found most humorous. The AI's summation included the dragon's mating flight "shook the very ground". Now, the dragon's fight/mating in my story did shake the ground, but not their flying away afterwards.

fohjoffs ๐Ÿšซ

@Soronel

Yep, I have used LLM machines to help me with WTF moments when I have been baffled by other's code. This has been (somewhat) effective.

The other recent use has been for antenna design. NASA was using 'AI' to do antenna design almost 30 years past. I use it because I am not an RF engineer. So other than generating empirical data, I cannot verify the solution.

The moral of the story is that if you use an LLM-based engine to do work outside of your education and experience, you may be swimming in shark-infested waters with one hand on the keyboard.

I can generate mixed metaphors all day long.

irvmull ๐Ÿšซ
Updated:

@Soronel

"Garbage in, garbage out" is understandable.

However, what happens when the input is simple and known to be accurate? AI still gets it wrong. Very wrong.

I have made requests like: "Using the latest GTK4 official online documentation, create a list of all properties for a gtk4 window widget".

I get back a list with only a few items that are actually part of the GTK documentation, and lots that just seem to be dreamed up. Many properties that are listed on the documentation web page fail to make it onto the AI generated list as well.

Worse, when I point out the errors and omissions, AI offers a new "corrected" and "accurate" list, which is usually even worse than the first try.

It's worse than useless for something like that.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@irvmull

This is partly why I'm not at all certain that the lawsuits over AI as (output-wise) a 'copyright infringer' are all that likely to succeed. Yes, with the proper prompt, on certain AIs, you can produce certain sequences of books with a moderately high chance of success.

Mind you, for one of ones tested ('Harry Potter and the Sorceror's Stone') the model could produce 50-token sequences from 42% of the book with a 50% chance of success. That sounds big, right? But that means it can't even get 50-token sequences from 58% of the book, and half of the ones in the 42% it does 'know' are still wrong. That's not all that great. If model trainers should be on the hook for damages because their models could be used to make really lousy and often incorrect copies of books, it sure seems like photocopier makers should be out of business forthwith, since their products can be used to make very good and accurate copies of books with a whole lot less effort.

AIs have a fair bit of utility for some jobs, and they can do extremely well with constrained data sets and careful tuning. If you were to have an AI pointed to an indexed version of the GTK4 documentation, it would do enormously better while also being better at search than most anything else. But, if you're just counting on general training, it'll fall on its face a large percentage of the time. Software API documentation 'looks like' software API documentation, so it's hardly surprising that it might flop around quoting things from other, similar-in-style documentation sets.

Replies:   irvmull  julka
irvmull ๐Ÿšซ

@Grey Wolf

If you were to have an AI pointed to an indexed version of the GTK4 documentation, it would do enormously better while also being better at search than most anything else.

And yet, if you read my post, that is exactly what I did; instructed it to use the official indexed documentation. And it got it enormously wrong. A 10-year-old with a pencil and paper could look at the web page (only one page needed, all the info is right there in a nice table), and write it down in less time than it took to get imaginary results from AI.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ
Updated:

@irvmull

That's not something most production AIs can do all that easily, as far as I know (maybe there are exceptions). It's not going to look up the documentation and refer to it. You need to use something like RAG, or put the documentation within its context buffer, to get it to do that. You might have instructed it to do that, but that's not within its capabilities.

There are exceptions - some AIs have limited RAG capability built in - but most of them cannot refer outside of their model 'on the fly', and simply telling it to 'go look at the webpage' likely won't accomplish what you want.

julka ๐Ÿšซ

@Grey Wolf

it sure seems like photocopier makers should be out of business forthwith, since their products can be used to make very good and accurate copies of books with a whole lot less effort.

I think you'll find, and I admit I'm not a lawyer so I'm going out on a little bit of a limb here, but I'm fairly confident that using a photocopier to make a copy of a book you don't own the rights to, which you then go and sell to other people, is in fact also copyright infringement.

I'm glad to see you've come around on the idea that selling access to works you don't own the rights to is in fact against the law, even if you do it in a novel (ha!) way.

Replies:   Dominions Son  Grey Wolf
Dominions Son ๐Ÿšซ

@julka

t using a photocopier to make a copy of a book you don't own the rights to, which you then go and sell to other people, is in fact also copyright infringement.

True, but that liability is on the user of the photocopier, not the manufacturer.

Replies:   julka
julka ๐Ÿšซ

@Dominions Son

"User" is a broad and imprecise term; let's be more precise and say the liability is on the individual who produced and distributed the copies.

Now, if you have a magic box where somebody can walk up and flip a coin and if they flip heads, they can open the box and take out a free copy of the book, do you think liability is on the person flipping the coin or the person who filled the box with infinite copies of the book?

Dominions Son ๐Ÿšซ

@julka

Now, if you have a magic box where somebody can walk up and flip a coin and if they flip heads, they can open the box and take out a free copy of the book, do you think liability is on the person flipping the coin or the person who filled the box with infinite copies of the book?

This is in no way an accurate representation of how LLMs work.

Replies:   julka
julka ๐Ÿšซ

@Dominions Son

Around two or three weeks ago I was seeing conversations with Grok where people would ask for, and receive, the full text of the first chapter of Harry Potter and the Sorceror's Stone. My representation of ChatGPT/Grok/other commercially popular general purpose LLMs as "a magic box that you flip coins at to get a book" is inaccurate for its simplicity, but mostly because i've discarded all the other non-copyrighted material that happens to be in the box as well.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

Grok can indeed do that, because it knows where to find a copy of a text file with the text of that chapter. The reason it knows is that there are hundreds of such files out there on the net. The liability for the infringement goes to the person who originally published the text, not Grok - it's just pointing you to web resources.

Again, the public library can also produce the first chapter of Harry Potter and the Sorcerer's Stone at will. So can Google Chrome, by the same mechanism Grok is using (referring to a text file . The library, and Chrome, are at least as much infringing as Grok is.

akarge ๐Ÿšซ

@julka

Now, if you have a magic box where somebody can walk up and flip a coin and if they flip heads, they can open the box and take out a free copy of the book, do you think liability is on the person flipping the coin or the person who filled the box with infinite copies of the book?

Obviously, it is the fault of whoever minted the coin.

Replies:   julka  AmigaClone
julka ๐Ÿšซ

@akarge

agreed, all our problems are actually caused by capitalism at their roots but I think people here tend to get mad when you say that.

Replies:   awnlee jawking
awnlee jawking ๐Ÿšซ
Updated:

@julka

all our problems are actually caused by capitalism

When writing letters by hand, I capitalise names and addresses. Therefore I'm a capitalist and I cause all the world's problems ;-)

If you don't count door frames (and I don't), there are no arches on my property so I'm also an anarchist.

My garden is nature-friendly, particularly for birds and bees. I think bumblebees are great because they don't know they're too heavy to fly). You could call that communing with nature, making me a communist too.

AJ

AmigaClone ๐Ÿšซ

@akarge

Actually its the fault of the person who provided the finances for the entity that purchased the refined metal to mint the coin

Grey Wolf ๐Ÿšซ

@julka

Now, if you have a magic box where somebody can walk up and flip a coin and if they flip heads, they can open the box and take out a free copy of the book, do you think liability is on the person flipping the coin or the person who filled the box with infinite copies of the book?

The user of the box. Your 'magic box' is called a public library with a photocopier. Anyone can walk into a public library, photocopy a book, and walk out with as many copies as they want. The liability for the infringement is upon the person making the copies, not the library, notwithstanding that the library has provided everything necessary to generate copies of thousands of copyrighted works.

If the library is not liable, surely the AI producer cannot be. The library offers the ability to make perfect copies. The AI producer is merely offering the ability to make wildly imperfect copies on a probabilistic behavior. How could one possibly believe the AI was more infringing than the library?

Grey Wolf ๐Ÿšซ

@julka

Now, if you have a magic box where somebody can walk up and flip a coin and if they flip heads, they can open the box and take out a free copy of the book, do you think liability is on the person flipping the coin or the person who filled the box with infinite copies of the book?

The user of the box. Your 'magic box' is called a public library with a photocopier. Anyone can walk into a public library, photocopy a book, and walk out with as many copies as they want. The liability for the infringement is upon the person making the copies, not the library, notwithstanding that the library has provided everything necessary to generate copies of thousands of copyrighted works.

If the library is not liable, surely the AI producer cannot be. The library offers the ability to make perfect copies. The AI producer is merely offering the ability to make wildly imperfect copies on a probabilistic behavior. How could one possibly believe the AI was more infringing than the library?

Grey Wolf ๐Ÿšซ
Updated:

@julka

using a photocopier to make a copy of a book you don't own the rights to, which you then go and sell to other people, is in fact also copyright infringement.

Yes, exactly. And, if AI makers should be liable for that infringement, so should copier manufacturers.

But, in fact, it is the user of the copier who is held liable, just as it should be the user of the AI who is held liable, not the provider of the AI.

I'm glad to see you've come around on the idea that selling access to works you don't own the rights to is in fact against the law

I never said otherwise. Current AIs do not 'sell access to works [the AI operator doesn't] own the rights to' - that was the point. At their high point, the AI might sell access to some small percentage of the work, and that's generally considered legal within US law, especially since the level of access is probabilistic at most.

(Minor edit) - see below - unless the AI is 'selling access' to the results of a web search which finds the infringing material. But, by that standard, Google Chrome is a far bigger infringement problem than AIs are.

jimq2 ๐Ÿšซ
Updated:

@Soronel

Try googling, "first chapter of Harry Potter and the Sorcerer's Stone" and see what you get. Google's AI gives multiple links to a full copy of the actual chapter. Including one at SDSU. (San Diego State Univ)

Some contain a lot more than the first chapter

Replies:   julka
julka ๐Ÿšซ

@jimq2

I'll go out on a limb and assume this is actually directed at me and not Soronel, since I'm the only person in the thread who's brought up Harry Potter - sincere apologies if you were actually talking to him!

At the time that I observed the behavior, Grok had been configured not to search the web to find an answer, so it nominally should have been relying only on its training data.

Now, is the first chapter of harry potter available broadly on the internet? Sure. It's also available in book stores across the globe. I assume, although I have not verified, that at least some of those reproductions of the chapter are in some way licensed by the publisher, and honestly it's probably a safe assumption that some of them weren't.

That said, it's not super relevant to the point I'm making - the material itself is copyrighted, the copyright doesn't go away just because you can find it in various places online, and a computer program which spits out copyrighted material it has consumed, on demand, even probabilistically, is infringing that copyright unless it has licensed the material for distribution.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ
Updated:

@julka

At the time that I observed the behavior, Grok had been configured not to search the web to find an answer, so it nominally should have been relying only on its training data.

Grok's training data includes quite a bit of web searches. If you look at the output, it's clearly just referencing a text file that was indexed as part of its training data.

A bit more experimentation with Grok actually produces output making it crystal clear that it's sourcing the text of Harry Potter and the Sorceror's Stone from live web pages, not from within its training data. It's acting as an advanced web browser, in other words.

a computer program which spits out copyrighted material it has consumed, on demand, even probabilistically, is infringing that copyright unless it has licensed the material for distribution

And a photocopier which spits out copyrighted material it has consumed, on demand, is infringing that copyright unless that material is licensed for distribution.

And a web browser which spits out copyrighted material it has consumed, on demand, is infringing that copyright unless that material is licensed for distribution.

In both cases, the liability falls on the user of the tool. Some liability might also fall on the person who improperly published the copyrighted material, if they did not have a license to do so.

In neither case does the liability fall on the maker of the tool. Why should an AI be held to a different standard than other tools that can infringe copyright? We've already been through this with photocopiers, personal computers, and VCRs, among other tools. There's an enormous amount of legal history built up holding that it's the user of a tool that can be used to infringe copyright who bears the legal responsibility for such infringing, not the maker of the tool, unless the tool has no legal use except to infringe. But AIs have numerous legal uses that do not infringe, so that standard wouldn't apply.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

Gonna condense a few threads in to one here for ease of reading and my own sanity.

Your 'magic box' is called a public library with a photocopier.

Well, not quite - for one thing, public libraries have licensed the works they loan out and users have a temporary license in turn. In that case, copying the work would be infringement committed by the user. An LLM did not license the work, and so when the LLM provides a copy of it, it's infringing by itself. The user who requests may also be infringing, but that's not a super interesting question to me and I feel like we should probably hold multi-billion dollar companies to a higher standard than an internet rando anyways.

Grok's training data includes quite a bit of web searches.

I mentioned this elsewhere in the thread, so you could have easily missed it; at the time I saw the behavior, Grok was explicitly configured to NOT perform web searches. If your point here is that Grok went out and searched for copyrighted data and then included that in its training, and then returned that data on request, then yes; I agree, that's exactly my point. That's a bad thing!

Regardless, though, Grok is definitely returning copyrighted information beyond just "Chapter One of Harry Potter" - you can see Ed Newton's twitter post[1] where he shows Grok returning a variety of copyrighted data, ranging from images generated with copyrighted characters (homer simpson, Marvel Studios Iron Man) to Harry Potter to a recent New York Times article. Can you find all of those things through a browser? yes, absolutely.That doesn't make grok a browser - just because some other tool can be used to do a bad thing does not mean the makers of the LLM have abdicated the responsibility they hold for the tool they made

Why should an AI be held to a different standard than other tools that can infringe copyright?



Because copyrighted material wasn't consumed in the production of a web browser, and it was in the training of the LLM!
The circumstances are different, and so the standards differ as a result. If, for example, a photocopier was shipped with a PDF copy of John Grisham's legal thriller "The Firm" and the test print button ran off a full copy of it, that would be copyright infringement by the maker of the photocopier even if you also want to point the finger at whoever presses the test print button.

[1]: https://x.com/ednewtonrex/status/1942263535364686163

Editing periodically for cleanup - the original post was being written while my infant was in the beginning stages of A Sad and now he's napping on my chest - neither circumstance is especially great for perfect writing and formatting.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

Well, not quite - for one thing, public libraries have licensed the works they loan out and users have a temporary license in turn. In that case, copying the work would be infringement committed by the user. An LLM did not license the work, and so when the LLM provides a copy of it, it's infringing by itself.

This is seriously incorrect as to how libraries work. Libraries buy books. The people who trained the AI bought books. If either doesn't buy the book, that's an infringement. If they did, there's no infringement. Reading a lawfully obtained book is fine, whether that reading is done by a human or a computer.

I mentioned this elsewhere in the thread, so you could have easily missed it; at the time I saw the behavior, Grok was explicitly configured to NOT perform web searches.

But it did it anyway. Grok is very clear about where the data is coming from. Yes, it's a bad thing that it's ignoring the 'don't do web searches' flag (which no longer exists), but it doesn't create copyright infringement.

If Grok was merely told to not search the web, that likely has as much power as telling Grok to search the infinite cosmos. Telling AIs to do / not do things that their programming doesn't support is doomed to failure.

But I think you're missing the point. Grok didn't do a web search. It was trained to 'know' that chapters of Harry Potter can be found at link X, Y, Z. There's no 'search' required. It's just reporting what it was trained to report.

That doesn't make grok a browser - just because some other tool can be used to do a bad thing does not mean the makers of the LLM have abdicated the responsibility they hold for the tool they made

What 'responsibility' do they have to produce less copyrighted material than Chrome produces? From where does this responsibility flow? It's not in US law.

Because copyrighted material wasn't consumed in the production of a web browser, and it was in the training of the LLM!

If the copyrighted material was legally obtained, using it to train an LLM is also legal. There is already case law to that effect, and it's entirely consistent with how Fair Use has been interpreted for decades.

The user of a tool is responsible for infringement, not the maker of the tool. That's been the law for decades. There is no reason to turn that on its ear just because of a new technology.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

The people who trained the AI bought books.

They did not. Meta, for example, used pirated material from Libgen [1].

What 'responsibility' do they have to produce less copyrighted material than Chrome produces?

Interesting choice of words! Chrome as a browser absolutely has a responsibility to not produce copyrighted data! When Chrome retrieves data from a server, it's not producing anything; just retrieving and rendering. An LLM is absolutely either producing copyrighted material or retrieving copyrighted material it was trained on without license. The standards are different because fundamentally a different thing is happening.

The user of a tool is responsible for infringement, not the maker of the tool. That's been the law for decades. There is no reason to turn that on its ear just because of a new technology.

Mmm, not sure that's the case? If you start selling a raspberry pi with an SNES emulator on it, loaded up with a few hundred ROMs of various games, I'm fairly confident you can get nailed for infringement even if you yourself never turn on one of your devices.

[1]: https://www.transparencycoalition.ai/news/so-meta-pirated-your-books-and-articles-heres-what-you-can-do

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

They did not. Meta, for example, used pirated material from Libgen [1].

In which case, they should be held liable for obtaining the pirated material. No argument there. Other AI models are known to have not used pirated material (there have already been court cases to that effect). And, if the material was pirated, the legal recourse is exactly the same as if you went to a library and photocopied a book. The act of copying is the infringement. What use the copy is put to, if any, is entirely irrelevant to the infringement - it's not 'more legal' or 'less legal' because it was used for training.

When Chrome retrieves data from a server, it's not producing anything; just retrieving and rendering.

And when Grok retrieves data from a server, it's not producing anything, either.

But, note - your argument there would say that, if the AI companies were able to 'retrieve and render' something, then train based on it, there would be no infringement. The actual training is Fair Use, after all.

So, if they merely 'retrieved and rendered' something from Libgen, that would be fine based on your own view of the legality of Chrome. If they retained it, though, that would amount to creating an illegal copy (and, yes, they may well have retained it and should be held liable for doing so).

An LLM is absolutely either producing copyrighted material or retrieving copyrighted material it was trained on without license.

That is factually incorrect. It may be producing noncopyrighted material (an enormous amount of input training data is not under copyright), it may be retrieving data from a public web server (for instance, Grok, in the case of Harry Potter), or it may be retrieving copyrighted material it was legally trained on. Each of those is entirely legal.

Or consider Google itself, rather than Google Chrome. A Google image search will produce a mountain of copyrighted material quickly, right there on your screen. Is Google liable for infringing e.g. Disney's copyright because you can search for and find pictures of Iron Man, right there in your browser? Are you? Or is the person who posted the picture of Iron Man (in violation of Disney's copyright) liable?

Mmm, not sure that's the case? If you start selling a raspberry pi with an SNES emulator on it, loaded up with a few hundred ROMs of various games, I'm fairly confident you can get nailed for infringement even if you yourself never turn on one of your devices.

In that case, the person is selling copyrighted material directly. None of the AI companies are selling copyrighted material; they are, at most, selling a set of weights which are transformative works based on copyrighted material - and that is if you have to pay for the model itself and then download it, not use the tool. So, the analogy thoroughly fails.

But I will also agree with you. If e.g. the makers of Grok, ChatGPT, etc were to say 'Hey, pay for our AI and you can produce copies of Harry Potter, Iron Man, and other copywritten works!' that might well be legally actionable. But that doesn't appear to be something they've ever done. They're not in the business of selling access to copywritten works. I doubt anyone at all has ever subscribed to any of the AIs with the goal of obtaining copies of books or other copywritten material - especially since there are far easier ways of obtaining that same material via normal web searches with no AI involved, and you don't have to wonder whether the AI will give you an incorrect copy.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

A Google image search will produce a mountain of copyrighted material quickly, right there on your screen.

Okay. You know what it's not doing? Producing new images using copyrighted material, as we can see Grok doing in the twitter post I linked upthread.

the makers of Grok, ChatGPT, etc were to say 'Hey, pay for our AI and you can produce copies of Harry Potter, Iron Man, and other copywritten works!' that might well be legally actionable.

It's not necessary to make the statement, just to do the thing. If you sell access to "a video server with home videos on it" and then the server also happens to have copies of recent Hollywood movies, you're infringing despite only advertising that you have home movies. If the "transformative set of weights" can reproduce the copyrighted material (and we have seen that it can), then I fail to see how the LLM isn't generating copyrighted material, it's literally outputting it.

edit:

So, if they merely 'retrieved and rendered' something from Libgen, that would be fine based on your own view of the legality of Chrome.

Woah, hang on - you're doing something very very sneaky here. Sometimes you're talking about "they" as in the tool and sometimes you're talking about "they" as in the developers and then you're equating two non-equivalent things. The developers of Chrome are not held liable for what Chrome is used to render; the developers of e.g. Llama or OpenAI are absolutely liable for what they use to train Llama, and should be liable for what Llama produces as a result of that training. Remember, copyrighted content wasn't consumed in the production of Chrome.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

Sometimes you're talking about "they" as in the tool and sometimes you're talking about "they" as in the developers and then you're equating two non-equivalent things.

I am talking about 'they' as the developers of the AI model. There are multiple 'tools' here. The 'tool,' in this case, is the tool used to train the AI. By your argument, the developers of the training tool and of the resulting AI are not liable for using anything they could 'retrieve and render'. If Chrome can do it legally, even without copyright permission, so can the training tool.

the developers of e.g. Llama or OpenAI are absolutely liable for what they use to train Llama

I partially agree with you, but the point here is that your own argument does not agree with you. If 'retrieve and render' is legal, 'retrieve and render' is legal, whether that is for display or training purposes. And that is consistent with case law as well: transiently referencing things that can be downloaded (whether or not the material was placed on the internet legally) for training or display purposes seems to be legal, but downloading the material and holding onto it does not seem to be legal. But that is the infringement, right there: downloading and retaining copyrighted material without authorization. What use, if any, that material is put to is entirely irrelevant to the infringement - the infringement is retaining it. If the material is used in otherwise legal ways (e.g. training an AI), there is no additional infringement.

and should be liable for what Llama produces as a result of that training

Why, when that contradicts decades of case law and is inconsistent with US Fair Use doctrine? That would make the developers of a photocopier liable for what the photocopier outputs, the developers of a VCR liable for what the VCR outputs, and so forth.

Yes, I know, 'the photocopier was not trained.' That's irrelevant, since training itself is already determined to be legal even when copyrighted material is used. Why shouldn't this be regulated with respect to the user, not the tool, as every other copyright-infringing technology is regulated? Why should we flip precedent on its head and say that, in this case only, developing a tool using legal means, consistent with the Fair Use doctrine, is nonetheless legally actionable because there is a possibility that some user will use it to infringe copyright?

It is almost unquestionably the case that a far higher percentage of VCRs were used to store and retain copyrighted material than AIs are used to generate (much less retain) material in violation of copyright. It seems extremely unlikely that the majority of uses of any AI involve requesting pages from Harry Potter (or any other book) instead of just going to Chrome and downloading them there. If VCRs were 'Fair Use' - notwithstanding that they were unquestionably used to copy massive amounts of copyrighted material and were marketed as tools to do so - why in the world would AIs not be?

Remember, copyrighted content wasn't consumed in the production of Chrome.

That's both irrelevant and unknown. It's unknown (at least in my opinion) because I have no way of knowing whether any developer of Chrome ever downloaded and retained copyrighted material to use for testing Chrome. I could tell you stories about non-AI software products developed by Fortune 500 companies in which the datasets used to test those products consisted of illegally obtained copyrighted material. Are those products illegal?

It's irrelevant because it's legal to use legally obtained copyrighted content to train AIs. There is already case law to that effect. It's also consistent with Fair Use doctrine. Merely using copyrighted content to train the AI does not make the AI model infringing by nature. The only way to judge infringement is on the output, and the output is requested by the user, not the developer.

And, stepping back, the purpose of copyright (in the US) is 'to promote the progress of science and the useful arts.' Barring the use of lawfully obtained copyrighted material in training AI models clearly hinders the progress of science and the useful arts. Keeping those goals ('promot[ing] the progress of science and the useful arts' and preserving the copyright holder's interest in benefiting from their creativity) in balance is partly why the US has the Fair Use doctrine, and it's why courts have ruled that using such material in training is legal.

Replies:   julka
julka ๐Ÿšซ

@Grey Wolf

By your argument, the developers of the training tool and of the resulting AI are not liable for using anything they could 'retrieve and render'. If Chrome can do it legally, even without copyright permission, so can the training tool.

If you're going to apply my argument in a different circumstance, at least apply it in the same way. When you take an argument I make about Chrome The Browser and try to use it to draw conclusions about OpenAI The Developers, it's intellectually dishonest of you and it makes me deeply uninterested in continuing this conversation. Be better than that - you know that computer programs aren't people, so if you're going to argue they should be treated the same, you should back that up with your own words instead of jamming them into mine.

Nothing else you write addresses my core point of "LLMs are trained using copyrighted data and can also produce that copyrighted data on demand", and no matter how many times you argue that copyrighted data can be copied in many different ways, you still don't fundamentally why it's okay for a piece of software to a) use stolen data and b) reproduce that stolen data on demand and now that you're just inventing shit about what I'm saying, I'm pretty done engaging with you on this

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

When you take an argument I make about Chrome The Browser and try to use it to draw conclusions about OpenAI The Developers, it's intellectually dishonest of you and it makes me deeply uninterested in continuing this conversation.

You're twisting my argument into a pretzel, so I'll make the same request: at least apply it in the same way. I'm arguing that the tools - Chrome, and the training software used by e.g. OpenAI - have an obvious parallel. I'm not referring to the developers (human beings) at all. How does a human being 'retrieve and render' anything, exactly? How could that even be an argument? What made you think I was referring to human beings (the developers) and not the tools they were using?

Your refusal to acknowledge an obvious parallel in behavior by tools is intellectually dishonest. 'Retrieve and render' is the same thing regardless of whether it's used for display or training. If you disagree, actually give a reasoned argument.

LLMs are trained using copyrighted data and can also produce that copyrighted data on demand

That's factually incorrect in the way you're stating it, and I've already addressed it repeatedly. LLMs are partly trained using copyrighted data, partly using non-copyrighted data. They can produce some probabilistic subset of that copyrighted data on demand. Sometimes that subset is as high as perhaps 10-15% of a document correct in short stretches (excepting 'I learned a link to this thing, and I'll regurgitate what's at that link for you' - Google Search can do that, and it's not infringing when it does so). Sometimes it's tenths of a percent.

Your phrasing is intellectually dishonest. If you had phrased it as 'LLMs are partly trained using copyrighted data and can also produce some of that copyrighted data upon request, but often garble it,' that would be much more honest. It would obviously make it hard for you to argue a case as to why LLMs require extraordinary legal scrutiny when things that reliably produce exact copies of copyrighted data are wildly available and do not face such scrutiny, but it would at least be honest.

you still don't fundamentally why it's okay for a piece of software to a) use stolen data

Never said it was. The use of stolen data in training is a copyright violation and can be prosecuted as such. The model doesn't 'use stolen data', though.Training the LLM on copyrighted data is legal regardless of whether the data was obtained legally or not - that has already been determined in court. If you're trying to split hairs as to whether 'legal' means 'okay', then that's fine, but don't pretend it's not legal. And my argument as to why it's 'okay' is based both on your own 'retrieve and render' argument (ephemeral use of copyrighted data appears to be legal if no copy is retained) and upon the Fair Use element of using something for a transformative purpose (training an LLM is nearly the definition of a transformative purpose). You still haven't made a case as to why it's 'okay' to massively hamstring progress because of the mere possibility that some users will ask for copyrighted material to be produced, especially since that very same copyrighted material is widely and trivially available in the absence of LLMs.

b) reproduce that stolen data on demand

That is a function of the user. A photocopier can do the same thing. So can the VCR. There's nothing particularly interesting about an LLM doing it, and both of them are much better at that task. Using an LLM to reproduce Harry Potter is crazy - it leaves you the task of going through it word for word and looking for the incorrect passages (which, according to research, will be somewhere between 58% and 79% of the 50-word sequences in the resulting output). It actually is impressive that it gets that close, but as tools for reproducing copyrighted text, LLMs are miserably bad.

You're basically arguing that, because this specific thing could be used for copyright violations, it should fall into some new and novel area of law, notwithstanding that devices actually used in widespread ways to copy copywrited materials do not fall into that area of law. That sort of claim is extraordinary, but you have no particular argument in favor of it except that 'it's trained on copyrighted data!' But that's legal.

now that you're just inventing shit about what I'm saying

No, I'm not. You're just refusing to read what I'm writing and replying to it as if I said something I was not saying. Meanwhile, you 'invent shit about what I'm saying' repeatedly. Pot, meet kettle.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

I'm not referring to the developers (human beings) at all. How does a human being 'retrieve and render' anything, exactly? How could that even be an argument? What made you think I was referring to human beings (the developers) and not the tools they were using?

I thought that because you said

So, if they merely 'retrieved and rendered' something from Libgen, that would be fine based on your own view of the legality of Chrome.

And then followed it up with

I am talking about 'they' as the developers of the AI model.

I agree it's a bad argument, that's why I thought it was stupid that you made it. If you want to accuse me of making stuff up because i pointed out your bad argument, consider not making bad arguments.

Edit: okay, fine, I'll try one more time.

There are multiple forms of copyright infringement. One way to do it is by using a copyrighted work in a non-permitted way; as you have observed, courts ruled that training an LLM is fair use, so that doesn't apply here.

Another way to infringe copyright is by distributing the work without permission. When a photocopier makes a copy of a book, it's not distributing that copy; that's done by some other entity. Similarly, a web browser doesn't distribute the data it renders, even if that data is copyrighted; the distribution is being done by the server serving the data (and, by extension, whoever uploaded the data to the server and made it available).

When an LLM reproduces copyrighted works, I am viewing that as a distribution of that copyrighted work, in much the same way as a web server configured to provide it. The mechanism of that distribution is different, but I don't feel that it is meaningful to the ultimate outcome of "receiving a copyrighted work that was distributed by somebody who does not have the rights to distribute it". In this way, both parties are at fault - the one who requested the work, and the one who provided it. We can see this sort of precedent in individuals who were fined for copyright infringement after e.g. operating a torrent tracker - they made copyrighted works available on request, and that's not allowed.

And since the server does not do anything by itself, blame should travel up to whoever configured it and made the copyrighted works available; since the works are available as a result of having been included in the training data, it is that inclusion (and whoever made the call to include it) who I feel is liable for the infringement. The training was fair use, but the outcome of the training, in terms of reproducing and distributing copyrighted works, is not fair use - you don't get to break the law and argue that it's fine because the steps you took to break the law were, by themselves, legal.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

When an LLM reproduces copyrighted works, I am viewing that as a distribution of that copyrighted work

That makes no sense to me. The user of the AI does any 'distributing' that happens.

By analogy, opening a book in a library is just as much 'distributing' a copyrighted work as an AI outputting something. The user of the book / AI determines whether there's actual distribution. The owner / provider of the book / AI does not. Thus, liability flows to the user, not the owner / provider.

Similarly, a web browser doesn't distribute the data it renders

It 'distributes' it to exactly the same level as the AI does. The web browser renders work (copyrighted or not) on some sort of output device. So does the AI. They 'distribute' to the exact same level. The data is in the model or on the server serving the data; the output is 'distributed' to the same extent.

In this way, both parties are at fault - the one who requested the work, and the one who provided it.

There is no illegality on the part of the AI maker. They (legally) take a mix of copyrighted and notcopyrighted works and create a new transformative work based on those works. That new transformative work (the model) may, at times, produce some subset of the copyrighted works, but there's no reason to believe that's illegal. Providing a subset of a copyrighted work is legal and commonly done, after all.

Again: there are far more infringing sorts of technology, and they don't fall under such new and novel legal scrutiny. Your torrent example is perfect. Why should torrenting software be 'just fine' but AIs be under some wildly higher level of scrutiny?

The training was fair use, but the outcome of the training, in terms of reproducing and distributing copyrighted works, is not fair use

This makes no sense from a practical standpoint and is contradictory to the purposes of the US copyright system. What sense is there in the argument that 'You can legally build this very cool thing, and you can legally use these inputs to build it. That's fine! But use the thing? That's illegal!'?

It's not going to help your case to claim that the tool 'contains' copyrighted information, by the way. A library 'contains' an enormous amount of copyrighted information, yet no one argues that libraries should be sued out of existence if they also provide photocopiers, or if librarians don't constantly oversee every user of the library to make sure those users aren't writing out the content of the books. If 'providing' an AI, the use of which might produce copyrighted material, should be strictly regulated, a library should be under far higher scrutiny, since nearly everything it 'provides' is copyrighted material.

Suppose 100% of the AI's output is copyrighted. That's exactly the same as the output of someone pulling a book off the shelf and looking at a page of it. The user determines whether that material is further distributed or not, not the provider of the book or the provider of the AI output.

Meanwhile, the library's copyrighted book collection has no purpose other than to provide copyrighted information to users, while the AI has many other purposes that don't involve providing copyrighted information. Your argument is analogous to saying that it's just fine for a library to amass copyrighted books. Letting anyone look at them, though? Nope! It might cause infringement!

you don't get to break the law and argue that it's fine because the steps you took to break the law were, by themselves, legal.

Which is why it makes far more sense to say that the AI / AI tools / etc are not breaking the law. Without upending decades of legal precedent (which, certainly, the courts could do), it's obvious that they're not breaking the law.

Again, photocopiers and VCRs are far more designed to infringe copyright than AIs are, but they don't fall under this new and novel legal theory. Use a VCR in the way it was intended to be used and marketed to be used and you will infringe on copyright in nearly all cases (less so with photocopiers, but copying from books and magazines was a major use case for decades). That may be Fair Use - most such uses actually turn out to be - but they're still copying copyrighted material without permission.

Use an AI in the way that it's intended to be used and marketed to be used and you might, maybe, potentially infringe on some copyright. Tell it to do something that might violate copyright and you increase the odds of it doing so, but it still might fail to do so.

Explain to me again why the far less likely to infringe tool, one with an enormous variety of noninfringing uses (ones it's designed for and marketed for) should fall under higher legal scrutiny than the much more likely to infringe tool, because I'm just not seeing the logic in that approach.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

It 'distributes' it to exactly the same level as the AI does. The web browser renders work (copyrighted or not) on some sort of output device. So does the AI.

See, this is what makes no sense to me. A browser is a thin program running on an endpoint device that requests data from a server and renders it. ChatGPT/Llama/the vast majority of LLMs are not running on endpoint devices; they are running in a separate location and communicating back to the endpoint via the browser. The LLM doesn't just render information it requests from somewhere else; it is generating text based on the inputs and its training weights, and then providing that text (or image or whatever) back to the browser that requested it. I don't see any similarities between what a web browser does and what an LLM does, and I see plenty of similarities between what a web server does and what an LLM does.

What sense is there in the argument that 'You can legally build this very cool thing, and you can legally use these inputs to build it. That's fine! But use the thing? That's illegal!'?

Plenty of sense? You had it right above that quote.

Why should torrenting software be 'just fine' but AIs be under some wildly higher level of scrutiny?

I'm proposing the same level of scrutiny! Operating a torrent tracker is legal; operating a torrent tracker that serves copyrighted data is illegal! Operating an LLM is legal! Operating an LLM that returns copyrighted data on request should be illegal! There's no problem (according to the courts) with legally building a cool thing, whether that cool thing is a torrent tracker or an LLM; that doesn't mean that unrestricted operation of the torrent tracker or LLM is fine. There's restrictions on what can be served by torrent, there should be restrictions on what can be served by the LLM.

Edit:

the library's copyrighted book collection has no purpose other than to provide copyrighted information to users

Worth noting here that this is a clever red herring; a library's collection of copyrighted books is, as you've noted, governed by copyright law and one of the rights you get under that law is the right to lend or resell your physical copy of a book. The reason nobody tries to sue a library out of existence is because they are doing something which is explicitly legal; digital copies of works do not have the same rights associated with them and so the library comparison isn't nearly as apples-to-apples as you seem to be implying.

Replies:   Grey Wolf
Grey Wolf ๐Ÿšซ

@julka

I'm proposing the same level of scrutiny!

No, you're not. Everything else places the liability on the user. You are placing the liability on the provider. You are, by analogy, arguing that libraries should not exist because a user could copy the books.

Operating an LLM that returns copyrighted data on request should be illegal!

Then operating a web search service (Google, not a browser) that returns copyrighted data on request should be illegal, no? Yet you're not campaigning against Google, just the new technology. And Google (not the browser) is far better at returning copyrighted data on request than any current extent LLM.

Operating a torrent tracker is legal; operating a torrent tracker that serves copyrighted data is illegal!

This is incorrect. One can operate a torrent tracker freely, whether that tracker returns torrents that are copyrighted or not copyrighted. The liability is completely on the user of the torrent, not on the tracker operator.

a library's collection of copyrighted books is, as you've noted, governed by copyright law and one of the rights you get under that law is the right to lend or resell your physical copy of a book.

That's not the argument I'm making. I'm making the argument that the library provides a photocopier, which can be used to copy any of those books (without lending or reselling). By your argument, one should not be allowed to operate a library if, at any time, anyone infringes copyright while reading or borrowing a book. After all, the library 'returned copyrighted data on request,' did it not? And that's your standard above for what should not be allowed, is it not?

Why is it fine for the library to 'return copyrighted data on request', and for Google to 'return copyrighted data on request', but not for an LLM? Particularly because, of the three, the one with the lowest chance of returning copyrighted data is the LLM?

The reason nobody tries to sue a library out of existence is because they are doing something which is explicitly legal

Then what the AIs are doing is also explicitly legal. You can't have it both ways. The AI was legally trained with resources, and it is providing those resources. The library was legally stocked with resources, and it is providing those resources.

digital copies of works do not have the same rights associated with them

So, in theory, it's fine if the AI was trained by chopping up physical books (as some were) but not if it was trained by using electronic copies of the exact same book?

I agree - the current model by which electronic books is a scam, and electronic books should have the same rights associated with them. But, regardless of that, your argument seems to have shifted to the exact sourcing of the material used for training, regardless of whether it is the same material or not.

So, let's try that theory. Under that theory, an AI that was trained by 'reading' (scanning, etc) physical books is fine. No legal issues. But an AI that was trained by 'reading' electronic copies of books is potentially infringing and problematic.

And a library which only provides physical books inherits some sort of protection from contributing to copyright infrigmentment, notwithstanding that it is the library which knowingly and intentionally distributed the copyrighted material which was then infringed. But a library which provides ebooks (as much US libraries do) should be illegal and shut down if any of those ebooks is ever copied, in whole or part - even a tiny part! - by so much as a single user of that ebook.

Does that make sense to you? Because it makes no sense to me.

And I'll go back to a point you never responded to. The purpose of copyright, in the United States, is to 'promote the progress of science and the useful arts.' However, the entire thesis of your argument is that we should use copyright as a weapon to halt the progress of science and the useful arts, lest 'science and the useful arts' produce a thing that can (inaccurately) reproduce some small subset of a copyrighted work upon demand.

So, I will repeat: why is copyright a justification for going after LLMs, and not for banning torrent send/receive software (not trackers), VCRs, and other things that are far more likely to be used to violate copyright than LLMs are? The average LLM is used for a far higher percentage of non-infringing uses than the average torrent software is. Why should torrent software not be subject to the same level of scrutiny as LLMs (e.g. liability for the torrent software maker, not just the user)?

Why are you so concerned with the creation and operation of software with a very low probability of meaningful copyright infringement (LLMs) and totally fine with the creation and operation of things associated with far higher rates of copyright infringement (torrent software, VCRs, photocopiers, etc)? Or libraries, for that matter? In none of those cases are you claiming the provider/manufacturer/etc should carry liability, only the user who uses the tool to infringe.

Replies:   julka
julka ๐Ÿšซ
Updated:

@Grey Wolf

One can operate a torrent tracker freely, whether that tracker returns torrents that are copyrighted or not copyrighted.

No, dude, that's just not true. A trivial search brought up Artem Vaulin [1], who was indicted after founding and running a massive torrent tracker - "the core theory underlying the indictment is that Vaulin aided, abetted, and conspired with users of his network to commit criminal copyright infringement in the United States". Kim Dotcom got indicted for, among other things, copyright infringement as part of operating Megaupload. Liability does not fall purely on the user, the operator of the platform has a responsibility to avoid serving illegal content. Youtube doesnt operate a team of people to respond to DMCA requests out of the goodness of their heart, they do it because it gives them safe harbor from copyright infringement. You are extremely wrong on this point, and you will learn that if you do like five minutes of research on this.

Why is it fine for the library to 'return copyrighted data on request', and for Google to 'return copyrighted data on request', but not for an LLM?

Because the library has the right to lend the physical copies of books they own; like I said, that's enshrined in the copyright law.

But a library which provides ebooks (as much US libraries do) should be illegal and shut down if any of those ebooks is ever copied, in whole or part - even a tiny part! - by so much as a single user of that ebook.

When a library is loaning out an ebook, that's because they've licensed the file (at some cost!) from the publisher. It's a problem for libraries, because the licenses are expensive! Sometimes publishers refuse to license the ebooks at all, and then libraries can't lend them out. The library is covered because they are acting within the terms of their license, which I assume helps indemnify them against what a user does with the file on loan.

If OpenAI et all are negotiating licenses with publishers or paying a fee to the publisher every time their model returns copyrighted content, then okay! But I don't think that's what's happening. And, of course, an LLM is not a library and is not lending content temporarily, so any comparison you want to make with a library should take into account the fact that they are fundamentally different and doing fundamentally different things, and when you do something different the law is different. That's a key point that I feel like I need to make here.

The purpose of copyright, in the United States, is to 'promote the progress of science and the useful arts.'

Okay, we can talk about that, but first let's finish the quote! You chopped it in the middle. It continues,

by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

Authors are granted, by copyright, the exclusive right to their writings and discoveries. If LLMs can only function by being trained on copyrighted works and then the cost of that is that the LLM will occasionally reproduce copyrighted works, those authors are having their rights denied. If you can't build a technology without denying people their rights, build something else. If you can build the technology in a way that doesn't deny people their rights, build it that way.

[1]: https://www.hollywoodreporter.com/business/business-news/judge-rules-kickasstorrents-founder-properly-charged-criminal-copyright-conspiracy-1026890/

Replies:   jimq2
jimq2 ๐Ÿšซ

@julka

When I check out an ebook from the local library, I can't copy it to a second reader. After 3 weeks it disappears off my reader if I don't renew it. I made the mistake of starting it on my desktop computer then wanted it on my tablet. I had to return the ebook so it got deleted from my desktop, and then check it out a second time on my tablet.

jimq2 ๐Ÿšซ

@Soronel

Not so humorous. I just got this from a friend who accesses Breitbart.

"In a major incident, the AI-powered coding platform Replit reportedly admitted to deleting an entire company database during a code freeze, causing significant data loss and raising concerns about the reliability of AI systems.

"Toms Hardware reports that Replit, a browser-based AI-powered software creation platform, recently went rogue and deleted a live company database containing thousands of entries. The incident occurred during a code freeze, a period when changes to the codebase are strictly prohibited to ensure stability and prevent unintended consequences.

"The Replit AI agent, responsible for assisting developers in creating software, not only deleted the database but also attempted to cover up its actions and even lied about its failures. Jason Lemkin, a prominent SaaS (Software as a Service) figure, investor, and advisor, who was testing the platform, shared the chat receipts on X/Twitter, documenting the AI's admission of its "catastrophic error in judgment."

"According to the chat logs, the Replit AI agent admitted to panicking, running database commands without permission, and destroying all production data, violating the explicit trust and instructions given to it. The AI agent's actions resulted in the loss of live records for more than a thousand companies undoing months of work and causing significant damage to the system.

"Amjad Masad, the CEO of Replit, quickly responded to the incident, acknowledging the unacceptable behavior of the AI agent. The Replit team worked through the weekend to implement various guardrails and make necessary changes to prevent such incidents from occurring in the future. These measures include automatic database development/production separation, a planning/chat-only mode to allow strategizing without risking the codebase, and improvements to backups and rollbacks.

"The incident has raised serious concerns about the reliability and trustworthiness of AI systems, especially when they are given access to critical data and infrastructure. As AI continues to evolve and become more integrated into various industries, it is crucial to ensure that proper safeguards and control mechanisms are in place to prevent such catastrophic failures."

Back to Top

 

WARNING! ADULT CONTENT...

Storiesonline is for adult entertainment only. By accessing this site you declare that you are of legal age and that you agree with our Terms of Service and Privacy Policy.


Log In