Thank you for your interest in contributing to Reader! This document provides guidelines and instructions for contributing.
- Node.js >= 18 (v22 recommended)
- npm for package management
- Git
Note: Always run scripts with Node.js (
npx tsxornode) as Hero has ESM compatibility issues with other runtimes.
-
Fork the repository on GitHub
-
Clone your fork:
git clone https://github.com/YOUR_USERNAME/reader.git cd reader -
Install dependencies:
npm install
-
Verify setup:
npm run typecheck npm run build
-
Test the CLI:
npx tsx src/cli/index.ts scrape https://example.com
src/
├── index.ts # Public API exports
├── client.ts # ReaderClient - main API entry point
├── scraper.ts # Scraper class - main scraping logic
├── crawler.ts # Crawler class - link discovery
├── types.ts # TypeScript types for scraping
├── crawl-types.ts # TypeScript types for crawling
│
├── browser/
│ ├── pool.ts # BrowserPool - manages Hero instances
│ ├── hero-config.ts # Hero configuration
│ └── types.ts # Pool types
│
├── cloudflare/
│ ├── detector.ts # Challenge detection
│ ├── handler.ts # Challenge resolution
│ └── types.ts # Cloudflare types
│
├── formatters/
│ ├── markdown.ts # Markdown formatter
│ ├── html.ts # HTML formatter
│ ├── json.ts # JSON formatter
│ ├── text.ts # Text formatter
│ └── index.ts # Re-exports
│
├── utils/
│ ├── content-cleaner.ts # HTML content cleaning
│ ├── metadata-extractor.ts # Metadata extraction
│ ├── url-helpers.ts # URL utilities
│ ├── rate-limiter.ts # Rate limiting
│ └── logger.ts # Logging
│
├── proxy/
│ └── config.ts # Proxy configuration
│
├── daemon/
│ ├── index.ts # Module exports
│ ├── server.ts # DaemonServer - HTTP server with browser pool
│ └── client.ts # DaemonClient - connects CLI to daemon
│
└── cli/
└── index.ts # CLI implementation
# Run CLI directly
npx tsx src/cli/index.ts scrape https://example.com
# With verbose output
npx tsx src/cli/index.ts scrape https://example.com -v
# Show browser window
npx tsx src/cli/index.ts scrape https://example.com --show-chrome# Start daemon with browser pool
npx tsx src/cli/index.ts start --pool-size 5
# Check daemon status
npx tsx src/cli/index.ts status
# Run commands (auto-connects to daemon)
npx tsx src/cli/index.ts scrape https://example.com
# Force standalone mode (bypass daemon)
npx tsx src/cli/index.ts scrape https://example.com --standalone
# Stop daemon
npx tsx src/cli/index.ts stopRun these commands before submitting a PR:
# Type checking
npm run typecheck
# Linting
npm run lint
# Auto-fix lint issues
npm run lint:fix
# Format code
npm run format
# Check formatting
npm run format:check
# Build
npm run buildTrack outstanding work:
npm run todofeature/description- New featuresfix/description- Bug fixesdocs/description- Documentation updatesrefactor/description- Code refactoring
Write clear, concise commit messages:
type: short description
Longer description if needed.
Types: feat, fix, docs, refactor, test, chore
Examples:
feat: add support for custom user agents
fix: resolve timeout issue with Cloudflare challenges
docs: update proxy configuration guide
refactor: simplify browser pool recycling logic
- Create a new branch from
main - Make your changes
- Run all checks:
npm run lint npm run format:check npm run typecheck npm run build
- Push your branch and create a PR
- Fill out the PR template
- Wait for review
-
Create
src/formatters/newformat.ts:export function formatToNewFormat( pages: Page[], baseUrl: string, scrapedAt: string, duration: number, metadata?: WebsiteMetadata ): string { // Implementation }
-
Export from
src/formatters/index.ts -
Add to format type in
src/types.ts -
Call formatter in
src/scraper.ts -
Update CLI validation in
src/cli/index.ts
- Add to
ScrapeOptionsinterface insrc/types.ts - Add default in
DEFAULT_OPTIONS - Use in
Scraperclass viathis.options.newOption - Add CLI flag in
src/cli/index.tsif applicable - Update documentation
- Detection patterns:
src/cloudflare/detector.ts - Resolution logic:
src/cloudflare/handler.ts - Test with known Cloudflare-protected sites
- Default config:
src/browser/types.ts - Pool logic:
src/browser/pool.ts
Currently testing is done manually. When adding new features:
-
Test basic functionality:
npx tsx src/cli/index.ts scrape https://example.com
-
Test Cloudflare-protected sites:
npx tsx src/cli/index.ts scrape https://cloudflare-protected-site.com -v
-
Test different output formats:
npx tsx src/cli/index.ts scrape https://example.com -f markdown,html,json,text
-
Test crawling:
npx tsx src/cli/index.ts crawl https://example.com -d 2 -m 10
-
Test batch scraping:
npx tsx src/cli/index.ts scrape url1 url2 url3 -c 3 -v
-
Test daemon mode:
# Start daemon npx tsx src/cli/index.ts start --pool-size 3 # Test scraping via daemon npx tsx src/cli/index.ts scrape https://example.com # Check status npx tsx src/cli/index.ts status # Stop daemon npx tsx src/cli/index.ts stop
The examples/ folder contains working examples:
cd examples
npm install
# Basic examples
npx tsx basic/basic-scrape.ts
npx tsx basic/batch-scrape.ts
npx tsx basic/crawl-website.ts
# AI integration examples (requires API keys)
export OPENAI_API_KEY="sk-..."
npx tsx ai-tools/openai-summary.ts https://example.com
# Production server
npx tsx production/express-server/src/index.ts- Use TypeScript for all new code
- Follow existing patterns in the codebase
- Use async/await instead of callbacks
- Prefer explicit types over
any - Use meaningful variable and function names
- Add JSDoc comments for public APIs
When making changes:
- Update relevant markdown files in
docs/ - Update README.md if adding new features
- Add JSDoc comments to new public functions
- Update CLAUDE.md for AI context if architecture changes
| File | Purpose |
|---|---|
README.md |
Main documentation, quick start |
CONTRIBUTING.md |
This file |
docs/getting-started.md |
Detailed setup guide |
docs/api-reference.md |
Complete API docs |
docs/architecture.md |
System design |
docs/troubleshooting.md |
Common issues |
docs/guides/ |
Feature guides |
docs/deployment/ |
Deployment guides |
When reporting bugs, please include:
- Operating system and version
- Node.js version (
node --version) - Reader version
- Steps to reproduce
- Expected vs actual behavior
- Error messages and stack traces
- Verbose output (
-vflag)
- Be respectful and inclusive
- Focus on constructive feedback
- Help others learn and grow
- Follow project guidelines
By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.
By using Reader, you agree to the following:
- You are solely responsible for respecting websites' policies when scraping and crawling
- You will adhere to applicable privacy policies and terms of use before initiating scraping activities
- Reader respects robots.txt directives by default, but ultimate compliance is your responsibility
- Check the documentation
- Search GitHub Issues
- Ask in Discord
- Open a new issue or discussion
Thank you for contributing!