-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
gh-139899: Introduce MetaPathFinder.discover and PathEntryFinder.discover #139900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8099249
6816705
31d1a8f
051cd1e
c343d32
a324d96
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1323,6 +1323,21 @@ def find_spec(cls, fullname, path=None, target=None): | |
| else: | ||
| return spec | ||
|
|
||
| @classmethod | ||
| def discover(cls, parent=None): | ||
| if parent is None: | ||
| path = sys.path | ||
| else: | ||
| path = parent.submodule_search_locations | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this can be
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can it be a non-package module if it has a child? It could be a namespace package, but that's still a package, and Unless I am missing something? Do we support package-like module extensions? |
||
|
|
||
| for entry in path: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. path can contain duplicate entries. should we dedupe?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, sure 👍 |
||
| if not isinstance(entry, str): | ||
| continue | ||
| if (finder := cls._path_importer_cache(entry)) is None: | ||
| continue | ||
| if discover := getattr(finder, 'discover', None): | ||
| yield from discover(parent) | ||
|
|
||
| @staticmethod | ||
| def find_distributions(*args, **kwargs): | ||
| """ | ||
|
|
@@ -1472,6 +1487,27 @@ def path_hook_for_FileFinder(path): | |
|
|
||
| return path_hook_for_FileFinder | ||
|
|
||
| def _find_children(self): | ||
| for entry in _os.scandir(self.path): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use a also consider handling exceptions similar to what document exception handling semantics of discover. I doubt callers ever expect to need be prepared for those?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be cached or not? we should document the cache interaction behavior either way regardless.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so? Because it could change at runtime, no? |
||
| if entry.name == _PYCACHE: | ||
| continue | ||
| # packages | ||
| if entry.is_dir() and '.' not in entry.name: | ||
| yield entry.name | ||
| # files | ||
| if entry.is_file(): | ||
| yield from [ | ||
| entry.name.removesuffix(suffix) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. entry.name could exist with multiple loader suffixes on the filesystem. dedupe to avoid redundant specs? |
||
| for suffix, _ in self._loaders | ||
| if entry.name.endswith(suffix) | ||
| ] | ||
|
|
||
| def discover(self, parent=None): | ||
| module_prefix = f'{parent.name}.' if parent else '' | ||
| for child_name in self._find_children(): | ||
| if spec := self.find_spec(module_prefix + child_name): | ||
| yield spec | ||
|
|
||
| def __repr__(self): | ||
| return f'FileFinder({self.path!r})' | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Introduced :meth:`importlib.abc.MetaPathFinder.discover` | ||
| and :meth:`importlib.abc.PathEntryFinder.discover`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be the first time we have a public API
yielding things from importlib, do we actually want to do that or should this return a list?callers consuming the results might be writing code that makes changes that could impact future results... yielding could get messy.
what are the intended use cases for the API? if it's something we expect callers to short circuit and stop iterating on after the first match maybe yield makes sense, but then we should probably just have an explicit direct discover_first API for that instead.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I considered.
The main use-case is finding a similar-named module to show as a hint on
ModuleNotFoundError(eg. "Did you meant numpy?", when trying to import numby), so I think it would make sense to make this new API a generator, or at least some kind of lazy container.For cases such as you describe, the user could just consume the full generator into a list to avoid any issue. Still leaving opportunity for the code that could leverage the benefit of this being a lazy API — scaning directories with a lot of files can take a while, not to mention the other exotic finders out there that may operate over the network or something like that.
I am not fundamentally opposed to make this method return a list, but I can't see the value in the trade-of if we document it properly.
While this is technically possible, I would find it extremely uncommon. And I think it should be reasonable to assume that people who are knowledgeable enough to do that, would probably be aware of the downsides of making changes to the import machinery, while consuming the API
And what would that look like? Would it take a predicate function and return the first entry that matches?
So, would have a warning in the documentation regarding your concern be a good enough compromise?