feat: Improve handling of deep taxonomies (perf + limits + bugfixes)#511
feat: Improve handling of deep taxonomies (perf + limits + bugfixes)#511bradenmacdonald wants to merge 3 commits intomainfrom
Conversation
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
796a8b4 to
e74f5f0
Compare
fix: API results are now correct regardless of tag depth feat: refuse to create tags deeper than TAXONOMY_MAX_DEPTH perf: make "depth" and "lineage" concrete
e74f5f0 to
a74e685
Compare
|
@ormsbee @kdmccormick do either of you have time to review this tagging backend PR, to help unblock the tag editor UI work? |
|
@bradenmacdonald Sure thing |
kdmccormick
left a comment
There was a problem hiding this comment.
Nice, just one question.
Haven't tested it myself--would you like me to?
| ), | ||
| ) | ||
| lineage = case_insensitive_char_field( | ||
| max_length=3006, |
| assert self.charlie.depth == 0 | ||
| assert self.charlie.lineage == "Charlie\t" | ||
| assert self.bob.depth == 1 | ||
| assert self.bob.lineage == "Charlie\tBob\t" |
There was a problem hiding this comment.
does tag renaming happen? if so, let's add a test for it.
| assert self.bob.lineage == "Charlie\tBob\t" | |
| assert self.bob.lineage == "Charlie\tBob\t" | |
| def test_rename(self): | |
| """ | |
| Renaming the tag updates its lineage and its children's. | |
| Before: Charlie -> Alice -> Delta -> Echo -> Foxtrot | |
| Before: Charlie -> Alicia -> Delta -> Echo -> Foxtrot | |
| """ |
| if old_values is not None and old_values["lineage"] and old_values["lineage"] != self.lineage: | ||
| depth_delta = self.depth - old_values["depth"] | ||
| update_kwargs: dict = { | ||
| "lineage": Concat(Value(self.lineage), Substr(F("lineage"), len(old_values["lineage"]) + 1)), |
There was a problem hiding this comment.
this query looks good but it took me some head-scraching to parse. i recommend breaking it up with some comments:
| "lineage": Concat(Value(self.lineage), Substr(F("lineage"), len(old_values["lineage"]) + 1)), | |
| # Computed lineage for each descendent: | |
| "lineage": Concat( | |
| # New absolute lineage of the changed tag. | |
| Value(self.lineage), | |
| # Descendent's lineage, relative to the changed tag. | |
| # Computed by left-trimming out old absolute lineage of changed tag. | |
| Substr(F("lineage"), len(old_values["lineage"]) + 1)), | |
| ), |
ChrisChV
left a comment
There was a problem hiding this comment.
@bradenmacdonald Great work! I found some nits and a bug:
This error occurs when importing a taxonomy:
[2026-03-23 19:18:28] Starting execute actions
[2026-03-23 19:18:29] #1: Create a new tag with values (external_id=37153, value=hierarchical taxonomy tag 1, parent_id=None). [Started]
[2026-03-23 19:18:29] AttributeError("'int' object has no attribute 'strip'")
| next_ancestor_id = row["parent__parent__parent_id"] | ||
| while next_ancestor_id: # If there are even deeper ancestors, add them (inefficiently): | ||
| next_ancestor_id = Tag.objects.get(pk=next_ancestor_id).parent_id | ||
| matching_ids.append(next_ancestor_id) |
There was a problem hiding this comment.
When reaching a root tag, the next_ancestor_id is None, and that value is added to matching_ids. Is that expected?
| @@ -198,15 +200,14 @@ def get_object_tags( | |||
| base_qs | |||
| # Preload related objects, including data for the "get_lineage" method on ObjectTag/Tag: | |||
| .select_related("taxonomy", "tag", "tag__parent", "tag__parent__parent") | |||
There was a problem hiding this comment.
By removing the previous query, the tag__parent and the tag__parent__parent in the select_relatedwould no longer be necessary.
This is a fix for openedx/modular-learning#257 .
In short: the API was very inconsistent with how it handles deeply-nested tags, and arguably the behavior was buggy.
Approach: This PR updates the
Tagdata model to storedepthandlineageas columns, rather than computing them dynamically. Then, I rewrote all the queries to support unlimited tag depth. With thedepthandlineagecolumns available, we can perform all the same queries very efficiently without having to hard-code things likeparent__parent__parent__...that assume a certain depth limit. Now all the API methods work with an unlimited tag depth.Actual depth limit: This PR also clarifies the definition of
TAXONOMY_MAX_DEPTHand actually enforces it to limit the allowed depth to six levels. (Before this, no limit was enforced when creating tags. A limit of 3 levels was enforced when reading tags from multiple levels at once, but it didn't work well below the root.)Perfomance: pretty much on par with the
mainbranch in every way I could measure. Significantly faster than my CTE approach #510 .AI disclosure: Claude assisted with this PR.