Copyright is currently something of a minefield when it comes to AI, and there’s a new report claiming that Apple’s generative AI – specifically its ‘Ajax’ large language model (LLM) – may be one of the few that is both legal and ethical educated. It is claimed that Apple is trying to maintain privacy and legality standards by employing innovative training methods.
Copyright in the age of generative AI is difficult to deal with, and is becoming increasingly important as AI tools become more common. One of the most glaring issues that comes up time and time again is that many companies train their large language models (LLMs) using copyrighted works, typically not disclosing whether they license that training material. Sometimes the output of these models contains entire sections of copyrighted works.
The current justification for why copyrighted material is so widely used by some of these companies to train their LLMs is that these models, like humans, require a significant amount of information (called training data for LLMs) to learn and generate coherent information. and compelling responses – and as far as these companies are concerned, copyrighted materials are fair game.
Many critics of generative AI consider it copyright infringement for technology companies to use works in the training and output of LLMs without explicit agreements with copyright holders or their representatives. Yet these criticisms have not stopped tech companies from doing just that, and this is believed to be the case for most AI tools, creating increasing resentment towards the companies in the generative AI space.
The forest of legal battles and ethical dilemmas in generative AI
In fact, more and more legal challenges have emerged against these tech companies. OpenAI and Microsoft indeed have been sued by the New York Times for copyright infringement in December 2023, with the publisher accusing the two companies of training their LLMs on millions of New York Times articles. In September 2023, OpenAI and Microsoft were also sued by a number of leading authors, including George RR Martin, Michael Connelly and Jonathan Franzen. More than 15,000 authors by July 2023 signed an open letter targeting companies like Microsoft, OpenAI, Meta, Alphabet and others, calling on technology industry leaders to protect writers, and calling on these companies to properly credit and compensate authors for their works when they use to train generative AI models.
In April this year, The Register reports this that Amazon was hit with a lawsuit by a former employee who alleged she suffered abuse, discrimination and harassment, testifying about her experiences when it came to copyright infringement issues. This employee claims that she was told that she must deliberately ignore and violate copyright law to improve Amazon’s products and make them more competitive, and that she was told by her supervisor that “everyone is doing it” when it comes to copyright violations . Apple Insider repeats this claimstating that this appears to be an accepted industry standard.
As we have seen with many other new technologies, legislation and ethical frameworks always come after an initial delay, but it seems that this is becoming a more problematic aspect of generative AI models that the companies responsible for them will have to respond to.
The Apple approach to ethical AI training (the one we know so far)
It seems that at least one major tech player is trying to take the more careful and deliberate route to avoid as many legal (and moral!) challenges as possible – and somewhat surprisingly, it’s Apple. According to Apple Insider, in its search for AI training materials, Apple has diligently licensed the works of major news publications. Back in December, Apple has petitioned to license the archives from several major publishers to use this as training material for its own LLM, known internally as Ajax.
There is speculation that Ajax will be the software for the basic on-device functionality of future Apple products, and could instead license software such as Gemini from Google for more advanced features, such as those that require an internet connection. Apple Insider writes that this allows Apple to avoid certain liability for copyright infringement, as Apple would not be responsible for copyright infringement by, for example, Google Gemini.
An article published in March detailed how Apple plans to train its internal LLM: a carefully chosen selection of images, visual text, and text-based input. In its methods, Apple simultaneously prioritized better image captioning and multi-step reasoning while paying attention to privacy preservation. The last of these factors is made all the more possible for the Ajax LLM by the fact that it is entirely on-device and therefore does not require an internet connection. There is a trade-off, as this means that Ajax cannot check for copyrighted content and plagiarism on its own, as it cannot connect to online databases that store copyrighted material.
There’s another caveat that Apple Insider reveals about this when speaking to sources familiar with Apple’s AI testing environments: There currently don’t appear to be many, if any, restrictions on users using copyrighted material themselves as input for research. device testing environments. It’s also worth noting that Apple isn’t technically the only company taking a rights-first approach: the art AI tool Adobe Firefly is also said to be fully copyright compliant, so hopefully more AI tools will startups are wise enough to follow the example of Apple and Adobe.
Personally, I welcome this approach from Apple, because I think human creativity is one of the most incredible capabilities we have, and I think it should be rewarded and celebrated – not fed to an AI. We’ll have to wait to find out more about what Apple’s regulations regarding copyright and training the AI look like, but I agree Apple Insider review that this definitely sounds like an improvement – especially since some AIs have been documented regurgitating copyrighted material word for word. We can look forward to learning more soon about Apple’s generative AI efforts, which are expected to be a key driver of the developer-focused software conference WWDC 2024.