Skip to content

Conversation

@fishmingyu
Copy link

Motivation

The motivation of the PR is in many repositories; we don't want to include some files, e.g., tests*)
and it may also include a file that would be either broken or meaningless. However, all these files will not only affect the processing time of pyright-scip, but also will cause abortion. One example I showed below is the failed log while I process the sympy repo. I also attached the success log after applying our new exclude pattern.

Summary

This PR adds the ability to exclude files and directories from SCIP indexing using command-line flags or a configuration file. The exclusion feature supports both exact paths and glob patterns (e.g., test_*), and works as a filter that gracefully handles non-matching patterns without errors.

Changes Made

1. MainCommand.ts

  • Added exclude?: string[] to IndexOptions interface
  • Added excludeConfig?: string to IndexOptions interface
  • Added --exclude <paths...> flag to accept multiple file/directory paths
  • Added --exclude-config <file> flag to accept a config file with exclusion paths

2. indexer.ts

  • Added import { minimatch } from 'minimatch' for glob pattern matching
  • Implemented exclusion logic after targetOnly filtering (lines 122-179)
  • Reads patterns from --exclude flag
  • Reads patterns from config file if --exclude-config is provided
  • Config file supports:
  • Pattern matching features:
  • Exact path matching (original functionality)
  • Glob patterns: dir*, file*, tests/**, etc.
  • Relative and absolute paths
  • Works as a filter - patterns matching nothing don't cause errors

3. package.json

  • Added minimatch dependency for glob pattern matching

Usage

  • Exclude specific files/directories via command line:*
scip-python index --project-name=myproject --exclude path/to/broken.py --exclude path/to/circular/
  • Exclude using patterns:*
scip-python index --project-name=myproject --exclude "test_*" "build/**"
  • Exclude using a config file:*
scip-python index --project-name=myproject --exclude-config=.scipignore
  • Example config file format (.scipignore):*

# Broken files

src/broken_module.py

# Directories with circular dependencies

src/experimental/

tests/broken/

# Glob patterns

test_*

build/**

# Another problematic file

lib/legacy.py

Benefits

  • Flexibility: Supports both exact paths and glob patterns
  • Robustness: Works as a filter - no errors if patterns match nothing
  • Usability: Config file support for managing complex exclusion rules
  • Consistency: Follows the same pattern as existing --target-only flag

Testing

The feature can be tested by:

  1. Using --exclude with exact paths

  2. Using --exclude with glob patterns like test_*

  3. Using --exclude-config with a file containing mixed patterns and comments

  4. Verifying that non-matching patterns don't cause errors

Log when directly indexing the sympy

(11:57:21) pyproject.toml file found at /home/zhongming/.codeminer/sympy_sympy.
(11:57:21) Loading pyproject.toml file at /home/zhongming/.codeminer/sympy_sympy/pyproject.toml
Assuming Python version 3.11
Assuming Python platform Linux
Auto-excluding **/node_modules
Auto-excluding **/**pycache**
Auto-excluding **/.*
(11:57:21) Total Project Files 1522
(11:57:21) Indexing /home/zhongming/.codeminer/sympy_sympy with version d293133e81194adc11177729af91c970f092a6e7
(11:57:21) Evaluating python environment dependencies
(11:57:21)   Gathering environment information
(11:57:22) Parse and search for dependencies
(11:57:32)   152 / 1522
(11:57:43)   211 / 1522
(11:57:57)   377 / 1522
(11:58:07)   577 / 1522
(11:58:17)   864 / 1522
(11:58:27)   958 / 1522
(11:58:51)   1084 / 1522
(11:59:01)   1276 / 1522
(11:59:11)   1419 / 1522
(11:59:14) Index workspace and track project files
(11:59:14) Analyze project and dependencies
(11:59:26)   76 / 1524
(11:59:37)   114 / 1524
(11:59:47)   165 / 1524
(11:59:57)   224 / 1524
(12:00:08)   264 / 1524
(12:00:33)   301 / 1524
(12:00:43)   432 / 1524
(12:00:53)   477 / 1524
(12:01:03)   526 / 1524
(12:01:13)   584 / 1524
(12:01:25)   614 / 1524
(12:01:37)   642 / 1524

<--- Last few GCs --->

[2024902:0x7c30a30]   258240 ms: Mark-Compact 3985.9 (4128.9) -> 3970.3 (4129.4) MB, 1982.51 / 0.00 ms  (average mu = 0.180, current mu = 0.020) allocation failure; scavenge might not succeed
[2024902:0x7c30a30]   260659 ms: Mark-Compact 3986.5 (4129.4) -> 3970.9 (4129.9) MB, 2370.20 / 0.00 ms  (average mu = 0.103, current mu = 0.020) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

1: 0xb8d0a3 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
2: 0xf06250 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
3: 0xf06537 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
4: 0x11180d5  [node]
5: 0x1118664 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
6: 0x112f554 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
7: 0x112fd6c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0x1106071 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
9: 0x1107205 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0x10e4856 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x1540686 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x7ecdc3cd9ef6

Log after the exclude feature applied

INFO     Running in conda environment: ['scip-python', 'index', '--cwd', '/home/zhongming/.codeminer/sympy_sympy',

'--project-name', 'test_swebench', '--output', '/home/zhongming/.codeminer/sympy__sympy-27223/index.scip',

'--exclude', 'sympy/polys/numberfields/resolvent_lookup.py', '--exclude', 'test_*']

(13:28:16) No configuration file found.
(13:28:16) pyproject.toml file found at /home/zhongming/.codeminer/sympy_sympy.
(13:28:16) Loading pyproject.toml file at /home/zhongming/.codeminer/sympy_sympy/pyproject.toml
Assuming Python version 3.11
Assuming Python platform Linux
Auto-excluding **/node_modules
Auto-excluding **/**pycache**
Auto-excluding **/.*
(13:28:16) Total Project Files 915
(13:28:16) Indexing /home/zhongming/.codeminer/sympy_sympy with version d293133e81194adc11177729af91c970f092a6e7
(13:28:16) Evaluating python environment dependencies
(13:28:17)   Gathering environment information
(13:28:17) Parse and search for dependencies
(13:28:28)   101 / 915
(13:28:43)   226 / 915
(13:28:53)   515 / 915
(13:29:04)   591 / 915
(13:29:14)   786 / 915
(13:29:17) Index workspace and track project files
(13:29:17) Analyze project and dependencies
(13:29:27)   28 / 917
(13:29:37)   145 / 917
(13:29:49)   163 / 917
(13:29:59)   480 / 917
(13:30:11)   508 / 917
(13:30:21)   684 / 917
(13:30:28) Parse and emit SCIP
(13:30:29)   - (14/916): /home/zhongming/.codeminer/sympy_sympy/sympy/assumptions/facts.py
(13:30:30)   - (48/916): /home/zhongming/.codeminer/sympy_sympy/sympy/calculus/tests/**init**.py
(13:30:31)   - (74/916): /home/zhongming/.codeminer/sympy_sympy/sympy/combinatorics/free_groups.py
(13:30:33)   - (85/916): /home/zhongming/.codeminer/sympy_sympy/sympy/combinatorics/permutations.py
(13:30:34)   - (120/916): /home/zhongming/.codeminer/sympy_sympy/sympy/core/expr.py
(13:30:35)   - (129/916): /home/zhongming/.codeminer/sympy_sympy/sympy/core/multidimensional.py
(13:30:36)   - (177/916): /home/zhongming/.codeminer/sympy_sympy/sympy/functions/elementary/exponential.py
(13:30:38)   - (218/916): /home/zhongming/.codeminer/sympy_sympy/sympy/holonomic/holonomicerrors.py
(13:30:39)   - (264/916): /home/zhongming/.codeminer/sympy_sympy/sympy/logic/inference.py
(13:30:40)   - (339/916): /home/zhongming/.codeminer/sympy_sympy/sympy/ntheory/generate.py
(13:30:41)   - (355/916): /home/zhongming/.codeminer/sympy_sympy/sympy/parsing/autolev/_listener_autolev_antlr.py
(13:30:42)   - (381/916): /home/zhongming/.codeminer/sympy_sympy/sympy/parsing/latex/**init**.py
(13:30:43)   - (474/916): /home/zhongming/.codeminer/sympy_sympy/sympy/physics/quantum/qft.py
(13:30:44)   - (573/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/polyconfig.py
(13:30:46)   - (581/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/polyutils.py
(13:30:47)   - (654/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/numberfields/galoisgroups.py
(13:30:48)   - (699/916): /home/zhongming/.codeminer/sympy_sympy/sympy/printing/pretty/pretty_symbology.py
(13:30:49)   - (771/916): /home/zhongming/.codeminer/sympy_sympy/sympy/solvers/solveset.py
(13:30:50)   - (808/916): /home/zhongming/.codeminer/sympy_sympy/sympy/stats/sampling/**init**.py
(13:30:51)   - (832/916): /home/zhongming/.codeminer/sympy_sympy/sympy/tensor/toperators.py
(13:30:52)   - (902/916): /home/zhongming/.codeminer/sympy_sympy/sympy/vector/deloperator.py
(13:30:53) Writing external symbols to SCIP index
(13:30:53) Sucessfully wrote SCIP index to /home/zhongming/.codeminer/sympy__sympy-27223/index.scip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant