Add an AI and Autonomous code contribution policy.#328
Add an AI and Autonomous code contribution policy.#328freakboy3742 wants to merge 4 commits intomainfrom
Conversation
.github/pull_request_template.md
Outdated
| - [ ] All new features have been documented | ||
| - [ ] I have read the **CONTRIBUTING.md** file | ||
| - [ ] I will abide by the code of conduct | ||
| - [ ] I have disclosed the use of any autonomous coding tooling |
There was a problem hiding this comment.
Expecting the author to tick "I have disclosed any", even if there isn't any, is likely to be confusing even to native English speakers.
Could we merge this with the "assisted-by" above; something like:
[ ] This PR used AI coding assistance – <!-- If so, enter tool names here -->
After all, it does say "all the boxes that apply", not "all the boxes".
I think "AI assistance" is better than "autonomous coding tool", since some PRs may have been generated in a "one-shot" manner without any kind of autonomous activity, but still have many of the same problems. This may also apply to some places in the policy document.
There was a problem hiding this comment.
Definitely agreed that the structure of the checkbox could be improved. I wasn't 100% happy with what is in the PR, but needed something to start a discussion.
I think there's value in preserving the Assisted-by: tag, because it means the text is easier to find - but I agree it would make sense to simplify the language and put the assistance closer to the checkbox.
I also have some concern about having that content in <!-- --> markers, because evidence suggests many PR submitters don't understand that as a HTML/Markdown comment - and so they could update the content in the comment, but it will be invisible to us as reviewers.
So - here's a revised suggestion:
| - [ ] I have disclosed the use of any autonomous coding tooling | |
| - [ ] This PR was generated or assisted using an AI tool | |
| <!-- update or delete the next line to reflect your usage --> | |
| Assisted-by: <!--name of tool; e.g., Claude Opus 4.5--> |
Avoiding "AI" as term was somewhat deliberate - one of the pieces of feedback on beeware/beeware#621 was the suggestion to avoid specific references to "AI" as a form of future proofing. Right now, the tools call themselves "Generative AI"; next year, they might have a different name. We want the general pattern of "a human is responsible" to be the thing that we capture as policy, rather than enumerating specific tools or naming.
That said - I also found, in the process of drafting, that referencing "autonomous code generation" tools was linguistically awkward; and in that awkwardness, we run the risk of obscuring the point of the document.
Maybe the solution is to call the policy an "AI policy", and refer to "Generative AI tools", but put a paragraph in the preface that makes it clear that although this document refers to "AI", we're referring to any tool that is able to generate significant functionality, or can operate autonomously, based on limited prompted input from a human, regardless of the name that the specific technology may use to describe itself.
Thoughts?
There was a problem hiding this comment.
Maybe the solution is to call the policy an "AI policy", and refer to "Generative AI tools", but put a paragraph in the preface that makes it clear that although this document refers to "AI", we're referring to any tool that is able to generate significant functionality, or can operate autonomously, based on limited prompted input from a human, regardless of the name that the specific technology may use to describe itself.
I know I'm not @mhsmith but since you ended with "thoughts"... I agree this would be a significant improvement. 1, the wording is more modern and less awkward, 2, the scope is explicitly generalized.
|
|
||
| ## 4. Copyright & Legal | ||
|
|
||
| By submitting a contribution to BeeWare, you represent and warrant that: |
There was a problem hiding this comment.
Since most of this also applies to fully-human PRs, it should be in a more general location like CONTRIBUTING.md, and referenced from here. That way, the author implicitly agrees to it when they tick "I have read the CONTRIBUTING.md file".
The CONTRIBUTING.md files all link to a central page on the website, which is good, but the page contains about 20 sub-pages, and it's not reasonable to expect authors to read every single one before creating a PR. So maybe something as important as this should be specifically called out in the CONTRIBUTING.md file – but the full text should still be put on the website to ensure there's a single source of truth.
On a related note, why add the policy to the .github repository rather than the website?
There was a problem hiding this comment.
Part of the challenge here is that the conventions for where this content goes are still being established. I went with AI_POLICY.md following the example from Mastodon - but I'm definitely open to a better place for this content.
I put it on .github because it felt more like a "core policy" document for the project, akin to the Code of Conduct. As noted in the PR description, I'd imagine the ultimate "user visible" location would be a link from the website and/or contribution guide, much as the code of conduct is linked today - I'd anticipate a "Policies" subheading towards the bottom of the contribution guide, with smaller admonition in the "Can I contribute" section that draws attention to and links to that policy.
I don't think I agree that putting this content in CONTRIBUTING.md is the right approach. My understanding is that CONTRIBUTING.md is a good place to link off to important information that a contributor should know about, rather than a place for policy information. However, I do think we could make good use of CONTRIBUTING.md in this context. As part of (or possibly as a precursor to) this effort, we can/should come up with a better template for CONTRIBUTING.md for each project. (This PR does this to some extent, but I'm sure there's almost certainly more we could do).
The end goal would be CONTRIBUTING.md is a "one pager" inviting people to contribute, and pointing people at the key resources and project policies they should be aware of as contributors:
- We have a Code of Conduct which you must follow (link)
- We have an AI policy you must follow (link)
- We have templates for PRs and issues that you have to use (link to the pages in the contribution guide on creating issues and PRs)
- We have a code style guide (link to the code style guide)
- We require all code to be tested (link to the contribution guide on how to run the tests)
- We require all features are documented (link to the contribution guide on how to build/preview docs, and the style guide)
The PR template essentially then has a checkbox for each of the things that they should have done if they've followed the contribution guide (acknowledging the CoC, acknowledging the AI policy, code is tested, code is documented etc)
We can then roll that CONTRIBUTING.md file out across all repos (using a variant of the tooling we already have for CoC). I'm not sure we necessarily want to do that for all repos for the AI_POLICY.md - maybe just the "core" repos (Toga, Briefcase, Rubicon) where there's potentially value in having the content in the repo for an AI agent to injest.
Does that make sense as an approach?
There was a problem hiding this comment.
Good policy, esp. on emphasizing the responsibility. I feel like it was communicated very clearly.
I wonder if a more informal tl;dr section at the beginning would be useful, so people won't find this document overwhelming. After all it is ~40-50 nonempty lines...
There was a problem hiding this comment.
That's not a bad idea - to some extent, that's what the first line of the "accountability" section is anyway; moving that summary above the fold might make some sense.
|
I've incorporated updates reflecting the feedback that has been given to date. Barring significant additional feedback, my current plan is put this to the core team for endorsement towards the end of this week. |
| # AI Policy | ||
|
|
||
| The BeeWare project neither encourages nor prohibits the use of AI tools when making contributions. However, if you do use an AI tool to support your work processes, this policy describes the conditions governing that tool use. | ||
|
|
||
| The one paragraph summary of this policy: **The human contributor is the sole party responsible for any contribution.** That human is responsible for understanding all contributions they make, declaring the usage any tools used, and ensuring compliance with any project guidelines and processes. | ||
|
|
There was a problem hiding this comment.
Reviewing this commit as the edits made on top of the original influencing policy was helpful.
Additions like this are super helpful for summarising the policy.
|
|
||
| Generative AI and Large Language Models (as seen in tools such as ChatGPT, Codex, Claude, and Copilot) have become an unavoidable detail of modern software development. While we recognize that some people find these tools to be useful aids for software development, they also make it very easy to generate contributions without the submitter fully understanding how the code works, or the consequences of specific implementation choices. It is also possible for these tools to operate autonomously, without any human involvement or oversight. | ||
|
|
||
| The overhead associated with managing contributions that have been generated by autonomous or semi-autonomous AI tools, with little or no effort on the part of the human contributor, is not an effective use of the core team's limited time and resources. |
There was a problem hiding this comment.
More of a comment/query: you are using "autonomous" where I would probably have used "automated" which has a slightly different feel (things can be automated but not autonomous). I presume it's intentional, but it does feel like it leaves a gap for things like code written by (human-written) code generation tools (eg. something which takes a C header file and spits out a ctypes wrapper; or a parser generator) which we would want the same sort of human responsibility to apply to (although the social problems around those are far less).
There was a problem hiding this comment.
The use of "autonomous" was intentional.I'll admit it's a subtlety, but to my reading, an automated tool is something that is simple and programmable (Github Actions are an automated system - push a PR, run this sequence of instructions, etc); an autonomous tool is less directly predictable as it has some abstract "agency" that might not be directly predictable.
I can see the point you're raising, but personally I'm less concerned about an automated tool because they are less likely to be used en masse - a once off PR that generates a meaningless ctypes wrapper is certainly annoying, but not that difficult to manage; and the automation can really only achieve one thing.
An "autonomous" AI tool on the other hand is likely to engage in an ongoing debate with you about the meaning of life, and then open another ticket tomorrow on a completely different topic.
There was a problem hiding this comment.
I can see the point you're raising, but personally I'm less concerned about an automated tool because they are less likely to be used en masse - a once off PR that generates a meaningless ctypes wrapper is certainly annoying, but not that difficult to manage; and the automation can really only achieve one thing.
I agree with this. Also, for automated generation, if a, say, C binding tool makes a mistake, then it's a bug in the upstream generation tool used; blaming it on the user of the binding tool that they've failed to check over every single binding generated may be too much, as the C binding tool behaves predictably, anyone can reproduce things and it doesn't add extra work for someone else to reproduce ti. On the other hand, entirely autonomous tools should be verified more — users should verify these; there's no way — and it'd actively waste time for maintainers to figure out what went wrong with an autonomous system.
This is the first step at adding an AI policy for BeeWare: Adding an actual policy.
Once ratified, links to this document will be added to the contribution guide.
It includes an updated pull request template, adding a checkbox for declaring AI tooling and prompt for declaring that usage.
It also includes an update to the contribution guide that can be used as a template for other projects. This is a significant change to the contribution guide in this repository - the current version has a number of dead links. It replaces that content with references to the current contribution guide on the website. When rolled out to other projects, this content can be used as-is, or can have references to that project's contribution guide (for Briefcase, Toga etc).
Submitted in draft form to allow discussion and ratification by the core team.
PR Checklist: