-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This follows from a discussion on a policy on generative AI assisted contributions.
See that discussion for links to other discussions.
The use of generative AI brings two things into tension:
- A desire to honor the copyright of code from which the AI training data has been derived and
- The hope that generative AI will lead to more contributions and / or contributions of greater quality.
It would be useful and timely to set out the issues in some detail in preparation for:
- A SPEC offering guidance on the use of generative AI in contributions and
- Collecting further specialist advice on copyright law and the nature of generated AI code, relative to its training set.
At a first pass, the SPEC should cover the issues that Ralf laid out in his discussion post, but it would also be useful to cover:
- The distinction between legal liability and our desire, as project authors, to respect the copyright of our fellow authors.
- How and whether a policy should be shaped by the difficulty of enforcement. For example, one might argue that it would be practically impossible to be sure that a contributor had not used generative AI - while also accepting that setting the policy to forbid such use would likely reduce AI generated code submissions.
- Whether there are uses of AI that cannot plausibly leak substantial copyrighted code - for example, perhaps this is true of automated code review.
Different projects will put different weights on different factors. For example, some may feel that copyright has effectively become unenforceable, and should therefore no longer be a factor in accepting submissions, while others will put substantial weight on preserving copyright, and be prepared to accept some loss of volume or quality as a result. Hence the SPEC should lay out these issues carefully and clearly, in order to help projects choose their own position on the factors involved.