Skip to content

Use a short SPDX license header for LLM-centered files#1489

Open
Dev-iL wants to merge 3 commits intoapache:mainfrom
SummitSG-LLC:2602/spdx
Open

Use a short SPDX license header for LLM-centered files#1489
Dev-iL wants to merge 3 commits intoapache:mainfrom
SummitSG-LLC:2602/spdx

Conversation

@Dev-iL
Copy link
Collaborator

@Dev-iL Dev-iL commented Feb 22, 2026

Following the approach from apache/airflow#62073 and apache/airflow#62145, files intended for LLM/agent consumption (not distributed in releases) now use a minimal SPDX license identifier instead of the full Apache 2.0 header - for LLM token efficiency.

See also:
https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53

Changes

  • Mark the .github folder as export-ignore.
  • Add a short and long license templates.
  • Add pre-commit hooks to ensure the right license header exists in every file.
  • Add missing license headers to two PR templates.

How I tested this

  • Hooks pass locally.

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Comment on lines -2 to -3
# Copyright(c) Open Law Library. All rights reserved. #
# See ThirdPartyNotices.txt in the project root for additional notices. #
Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this might be incorrect in this case. Is this here because the code was vendored in from pygls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I dont' recall. so maybe revert?

Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to a NOTICE file if the code in question is licensed under ALv2 too? CC: @potiuk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It should be placed in the NOTICE file https://infra.apache.org/licensing-howto.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so I think this one should stay, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't move license info to the NOTICE. If you use 3rd party code that has a NOTICE then its NOTICE contents must be included in your NOTICE.

This whole issue is caused by the possibility that there is 3rd party code. @Dev-iL and I have looked at that file and if there ever was 3rd party code, it now appears to have been removed from the file.

If we can make a call on if there is 3rd party code in that file - that is the most important decision here. Everything else depends on that starting block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see this is another file. Not the conftest.py file. This does look like it has 3rd party source. The source header needs to be retained and our LICENSE needs to mention this file.

Can we take all of this license mess into its own issues and PRs and not not try to deal with everything in one PR? It is really messy to have all this happening in one PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjfanning Indeed, this PR has accumulated more discussion than I'd have liked. That said, I think we're actually at the finish line now. The only "license mess" was the two LSP test files (conftest.py and ls_setup.py), and that's been resolved thanks to @skrawcz's analysis. Splitting this into separate PRs at this point would mean re-doing the pre-commit hook configuration and exclusion logic across multiple branches, which is likely more churn than just landing it as-is.

Let's use this as a learning experience for the future.

@Dev-iL Dev-iL force-pushed the 2602/spdx branch 3 times, most recently from 8f6cdd6 to c1f2647 Compare March 6, 2026 18:39
@Dev-iL Dev-iL force-pushed the 2602/spdx branch 2 times, most recently from d86280f to 8fa6989 Compare March 8, 2026 09:55
Dev-iL added 3 commits March 22, 2026 13:10
- Mark .github as export-ignore in .gitattributes
- Add short and long license templates for pre-commit hooks
- Add pre-commit hooks to enforce license headers (replaces CI scripts)
- Delete scripts/add_license_headers.py and scripts/check_license_headers.py
- Remove CI license check step from hamilton-lsp workflow
- Fix inconsistent license header indentation in several files
- Add missing license headers to PR templates
- Add vendored code attributions (Open Law Library, Palantir) to NOTICE file
- Exclude contrib/docs/ from markdown license hook (Docusaurus frontmatter)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants