Recent Publication: Does AI have a copyright problem?
Here is another exploration of a different set of questions about AI and copyright that I recently wrote and had published on the London School of Economics and Political Science Impact Blog (LSE Impact for short). It continues the conversation that I have had here in some previous posts including this one on academic fracking and the question of whether copyright has been violated, as well as connected to my own research and previous work on the commons.
LSE Impact reached out to me to see if I would write this and funny enough, I was already several paragraphs in when I got the email because, surprise, I had some thoughts. I like how this piece turned out because it captures a tension that I don't hear in all of this and that is the reasonable return of copyrighted work back to the commons; something that is impossible to happen for any work that arrives in our lifetime, which just seems disappointing. I keep thinking about how the literary record would look if copyright law existed in the past as it does today (something I cover with this talk).
![]() |
Image Source: ChatGPT |
A recent analysis by Alex Reisner shows that major AI companies have sourced training data from platforms like LibGen (Library Genesis) that contain copyrighted material. The suggestion is they have done this knowingly, perhaps reasoning that it is better to simply beg for forgiveness later rather than ask for permission. Reisner has also published a searchable database of copyrighted content on LibGen to help identify what has been used.
The anger that has since been directed toward AI companies is understandable. Many authors took to social media to decry the extent to which their work is being used to train AI tools. But the issue also raises some difficult questions about copyright. Chief among them is whether we should be concerned about using material in this way if it ultimately makes information more accessible.
People vs profits
Reisner argues that while “LibGen and other such pirated libraries make information more accessible”, AI companies go further as “their goal is to absorb the work into profitable technology products that compete with the originals”. In essence, it is acceptable for people to benefit from resources like LibGen, but not companies.
The problem with this argument is that both individuals and companies are essentially doing the same thing: benefitting from illegally acquired copyrighted material. In both cases, they have decided that illegal means are more affordable and accessible than legal ones. They are both “ingesting” content that can be used for future creative, experiential or monetary purposes.
It is easier to criticise faceless companies than the countless individuals who illegally download millions of titles each month. Yet in essence, they are both using the work of authors without offering compensation. Indeed, the recent revelations have similarities to the outrage last year about “academic fracking”, where publishers sell access to material to AI companies. They represent yet another example of an industry built on exploiting the work of authors.
While Reisner states that society is still grappling with “how to manage the flow of knowledge and creative work in a way that benefits society most”, the truth is that this has already been decided. The “flow” goes to those who can control, afford, or steal material and very rarely do society’s interests enter the equation.
You can read the rest of the post either on the LSE Impact Blog or over on my Substack.Did you enjoy this read? Let me know your thoughts down below or feel free to browse around and check out some of my other posts!. You might also want to keep up to date with my blog by signing up for them via email.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Comments
Post a Comment