GitHub’s new “Copilot” tool (created by Microsoft and OpenAI) shares the autocompletion suggestions of an AI trained on code repositories. But can that violate the original coder’s license? Now the Free Software Foundation (FSF) is calling for a closer look at these and many other issues…
“We already know that Copilot as it stands is unacceptable and unjust, from our perspective,” they wrote in a blog post this week, arguing that Copilot “requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute. These are settled questions as far as we are concerned.”
“However, Copilot raises many other questions which require deeper examination…”
The Free Software Foundation has received numerous inquiries about our position on these questions. We can see that Copilot’s use of freely licensed software has many implications for an incredibly large portion of the free software community. Developers want to know whether training a neural network on their software can really be considered fair use. Others who may be interested in using Copilot wonder if the code snippets and other elements copied from GitHub-hosted repositories could result in copyright infringement. And even if everything might be legally copacetic, activists wonder if there isn’t something fundamentally unfair about a proprietary software company building a service off their work.
With all these questions, many of them with legal implications that at first glance may have not been previously tested in a court of law, there aren’t many simple answers. To get the answers the community needs, and to identify the best opportunities for defending user freedom in this space, the FSF is announcing a funded call for white papers to address Copilot, copyright, machine learning, and free software.
We will read the submitted white papers, and we will publish ones that we think help elucidate the problem. We will provide a monetary reward of $500 for the papers we publish.
They add that the following questions are of particular interest:
- Is Copilot’s training on public repositories infringing copyright? Is it fair use?
- How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
- How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
- Is there a way for developers using Copilot to comply with free software licenses like the GPL?
- If Copilot learns from AGPL-covered code, is Copilot infringing the AGPL?
- If Copilot generates code which does give rise to a violation of a free software licensed work, how can this violation be discovered by the copyright holder on the underlying work?
- Is a trained artificial intelligence (AI) / machine learning (ML) model resulting from machine learning a compiled version of the training data, or is it something else, like source code that users can modify by doing further training?
- Is the Copilot trained AI/ML model copyrighted? If so, who holds that copyright?
- Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?