This guide helps developers adapt to the new repository structure introduced in the clean-up-repository epic.
The Clustrix repository has been reorganized from a cluttered structure to a clean, standards-compliant Python project layout. This migration improves maintainability, enables better tooling integration, and follows Python packaging best practices.
- 100+ files in root directory
- Tests scattered throughout different locations
- Duplicate and orphaned files
- Large monolithic modules (>2000 lines)
- Mixed content types in single directories
- Root Directory: Only essential project files (README, setup.py, etc.)
- tests/: All tests organized by category
- clustrix/: Modular source code with focused responsibilities
- docs/: Comprehensive documentation structure
- scripts/: Essential utility scripts only
- Git History: Preserved through
git mvoperations
- Old: Tests in root, scripts/, and various subdirectories
+ New: Organized test structureNew Test Structure:
tests/
├── unit/ # Fast, isolated unit tests (run in CI)
├── integration/ # Integration tests (run in CI)
├── real_world/ # Tests requiring cluster access
├── comprehensive/ # Performance and edge case tests
└── infrastructure/ # Test infrastructure setup
Large modules have been broken into focused components:
notebook_magic.py (2883 lines → 5 modules):
notebook_magic.py(88 lines) - Main entry pointnotebook_magic_config.py(213 lines) - Configuration handlingnotebook_magic_core.py(74 lines) - Core magic functionalitynotebook_magic_mocks.py(171 lines) - Mock objectsnotebook_magic_widget.py(1977 lines) - Widget implementation
executor.py (2362 lines → 7 modules):
executor.py(39 lines) - Main interfaceexecutor_core.py(466 lines) - Core execution logicexecutor_connections.py(390 lines) - Connection managementexecutor_schedulers.py(378 lines) - Scheduler interfacesexecutor_scheduler_status.py(651 lines) - Status monitoringexecutor_kubernetes.py(461 lines) - Kubernetes integrationexecutor_cloud.py(470 lines) - Cloud provider support
All imports remain backward compatible. The refactoring maintained public APIs:
# These imports continue to work unchanged
from clustrix import cluster, configure
from clustrix import ClusterConfig
from clustrix.filesystem import cluster_ls, cluster_findPytest configuration updated to properly discover tests:
# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests/unit", "tests/integration"]
markers = [
"real_world: marks tests as real world tests",
"slow: marks tests as slow",
"unit: marks tests as unit tests",
"integration: marks tests as integration tests",
]GitHub Actions workflows updated for new structure:
- Test paths fixed to use
tests/unit/andtests/integration/ - Coverage reporting configured for new layout
- Documentation build paths updated
- Real world test paths corrected
# Old workflow (no longer works)
pytest . # Tests scattered everywhere
python some_script.py # Scripts mixed with tests# New workflow
pytest tests/unit/ tests/integration/ # Run CI tests
python scripts/check_quality.py # Quality validation
python scripts/run_real_world_tests.py # Real world testsgit pull origin master# Reinstall in development mode
pip install -e ".[dev,test]"# Test imports work
python -c "import clustrix; from clustrix import cluster; print('✅ Imports working')"
# Test CLI works
clustrix --help
# Run quick test
pytest tests/unit/test_dartmouth_network_detection.py -v- Tests: Use
tests/unit/andtests/integration/instead of root directory - Quality Checks: Use
python scripts/check_quality.py - Documentation: Build with
cd docs && make html
- Faster Test Discovery: Focused test directories reduce collection time
- Better IDE Support: Standard structure enables better code navigation
- Clearer Separation: Unit vs integration vs real-world tests clearly distinguished
- Improved Tooling: Better support from pytest, coverage, and linting tools
- Intuitive Structure: Standard Python project layout
- Clear Entry Points: Easy to understand where different functionality lives
- Better Documentation: Comprehensive guides and examples
- Reduced Confusion: No more duplicate or orphaned files
- Modular Code: Smaller, focused modules easier to maintain
- Better Testing: Clear test categorization enables better CI/CD
- Reduced Technical Debt: Cleanup removed 76MB of unnecessary files
- Prevention: Git configuration prevents future accumulation
If you encounter import errors:
# Reinstall package
pip uninstall clustrix
pip install -e ".[dev]"If pytest can't find tests:
# Verify pytest configuration
pytest --collect-only tests/unit/
pytest --collect-only tests/integration/If scripts can't find files:
- Update paths to use new structure
- Check that you're running from repository root
- Verify working directory in scripts
If you have local changes conflicting with reorganization:
# Stash local changes
git stash
# Pull latest
git pull origin master
# Apply stash (resolve conflicts if any)
git stash popIf you encounter issues with the migration:
- Check this guide for common solutions
- Search existing issues on GitHub for similar problems
- Create a new issue with details about your specific problem
- Tag issues with
migrationlabel for quick response
After migration, verify everything works:
# Run comprehensive validation
python scripts/check_quality.py
# Test core functionality
python -c "
from clustrix import cluster, configure
configure(cluster_host=None) # Local execution
@cluster(cores=1)
def test():
return 'success'
result = test()
print(f'✅ Migration successful: {result}')
"The migration is successful when all quality checks pass and core functionality works without errors.