AI tapping copyrighted content to learn from it is not piracy

Published in the Australia Financial Review, 9 August 2025

The Productivity Commission announced this week that it was investigating how artificial intelligence models could be more easily trained on Australian copyrighted content. The backlash from our creative industry has been severe and instant.

For the past few years, AI labs have been accused in Australia and elsewhere of large-scale piracy. It is, we are told, outrageous that the PC would be providing moral cover for this theft.

But the PC is right to probe here. We need a copyright regime that reflects how AI models actually work, and our policymakers need to understand the full economic and geopolitical stakes that the AI revolution represents.

The PC’s report into data and digital technology is more modest than you would expect from the reaction of the creative industry. It is “seeking feedback about whether reforms are needed to better facilitate the use of copyrighted materials” for AI training.

But we need to be clear about how AI training actually works. AI models do not copy the content they are trained on. They learn from that content. Specifically, when they “read” a text, they identify patterns in it and relate those to patterns they’ve learnt from other texts.

If a person reads a book and learns from it – updating the weights in their own neural network – we do not accuse them of piracy. What we do when we learn, and what AI labs do when they train their models, is quite different from copying. There are some legal subtleties here.

In the US, courts have distinguished between how the models are trained and how the training data is collected.

Meta and Anthropic are accused of downloading large quantities of copyrighted books and papers from piracy websites to feed them into the training process.

If they were to do so in Australia that would probably be a violation of our copyright laws. But that doesn’t mean the training itself would necessarily be.

The PC notes that the process of AI training necessarily involves temporarily copying content onto the labs’ servers. But that proves too much. We do the same when we read anything on the internet. The moment we browse to a website, our computer downloads that website into a cache folder. But that downloading is a technical necessity, is not economically meaningful, and we don’t treat it as a violation of intellectual property.

All these subtleties around AI training were, of course, completely unforeseen by the parliaments that created our copyright regime decades ago. We don’t need to review the economic upside of AI here. It has been interesting to watch the Albanese government over the past year realise that AI could be a Hail Mary pass.

We might be able to fix our deep productivity problems without the need for tedious reform. AI presents the best chance we have right now to bring about a surge in economic growth.

But there are also real geopolitical reasons not to hamper AI development in Australia and the rest of the free world. We are in the middle of a great global technological contest around AI capability. The contest is of a larger scale and is more economically consequential than the space race of the 1950s and 1960s.

The Western world dominates AI chip development. This domination allows the US to exert a degree of influence over Chinese AI capabilities through export controls. But there is, almost certainly, a moment coming when Chinese chips will be competitive, and China will have full sovereign capability over the complete stack necessary for state-of-the-art AI.

Mark it: this will be a political shock in the West, much greater than when Deepseek R1 was released in January. When it happens, I hope it will finally pop the sense of complacency that has allowed us to indulge the idea that US tech firms are the bad guys.

Some in the creative industry would like AI training to be a matter of negotiation between rights holders and AI labs, book by book, photo by photo.

The Chinese AI labs do not share the same view. In a statement published this year, one website hosting pirated books – they call themselves “shadow-libraries” – stated that while most US firms have shied away, “Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality”.

The more data a model is trained upon, the better the model. We should not be trying to cripple AI in Australia while others rush ahead.

There are good reasons that authors and other creatives should want their work to be part of AI training sets. What writer would wish their work to be unknown by the first superintelligence?

But policymakers have a choice here. If they want Australia to shape the future of AI, they need to develop a policy regime that adapts to innovation, not a stagnant one that gives our geopolitical rivals an advantage.