Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.4.0] - 2025-12-28

### Added
- **Python feedparser compatibility improvements**:
- Field alias mappings for deprecated field names (`description` → `subtitle`, `guid` → `id`, etc.)
- Dict-style access on feed objects (`d['feed']['title']`, `d['entries'][0]['link']`)
- Container aliases (`channel` → `feed`, `items` → `entries`)
- Auto-URL detection in `parse()` function (URLs are automatically fetched when http feature enabled)
- Optional HTTP parameters (`etag`, `modified`, `user_agent`) for `parse()` and `parse_with_limits()`

### Changed
- `parse_with_limits()` now uses keyword-only `limits` parameter for consistency

## [0.3.0] - 2025-12-18

### Added
Expand Down Expand Up @@ -147,7 +160,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Comprehensive test coverage
- Documentation with examples

[Unreleased]: https://github.com/bug-ops/feedparser-rs/compare/v0.3.0...HEAD
[Unreleased]: https://github.com/bug-ops/feedparser-rs/compare/v0.4.0...HEAD
[0.4.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.3.0...v0.4.0
[0.3.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.2.1...v0.3.0
[0.2.1]: https://github.com/bug-ops/feedparser-rs/compare/v0.2.0...v0.2.1
[0.2.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.1.8...v0.2.0
Expand Down
7 changes: 4 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ members = [
resolver = "2"

[workspace.package]
version = "0.3.0"
version = "0.4.0"
edition = "2024"
rust-version = "1.88.0"
authors = ["bug-ops"]
Expand All @@ -29,6 +29,7 @@ memchr = "2.7"
mockito = "1.6"
napi = "3.7"
napi-derive = "3.4"
once_cell = "1.20"
pyo3 = "0.27"
quick-xml = "0.38"
regex = "1.11"
Expand Down
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ High-performance RSS/Atom/JSON Feed parser written in Rust, with Python and Node
- **Conditional GET** — ETag/Last-Modified support for bandwidth-efficient polling
- **Podcast support** — iTunes and Podcast 2.0 namespace extensions
- **Multi-language bindings** — Native Python (PyO3) and Node.js (napi-rs) bindings
- **Familiar API** — Inspired by Python's feedparser, easy to migrate existing code
- **feedparser drop-in** — Dict-style access, field aliases, same API patterns as Python feedparser

## Supported Formats

Expand Down Expand Up @@ -146,18 +146,28 @@ See [Node.js API documentation](crates/feedparser-rs-node/README.md) for complet
### Python

```python
import feedparser_rs
import feedparser_rs as feedparser # Drop-in replacement

# Parse from bytes or string
d = feedparser_rs.parse(b'<rss>...</rss>')
# Parse from bytes, string, or URL (auto-detected)
d = feedparser.parse(b'<rss>...</rss>')
d = feedparser.parse('https://example.com/feed.xml') # URL auto-detected

# Attribute-style access
print(d.version) # 'rss20'
print(d.feed.title)
print(d.bozo) # True if parsing had issues
print(d.entries[0].published_parsed) # time.struct_time

# Dict-style access (feedparser-compatible)
print(d['feed']['title'])
print(d['entries'][0]['link'])

# Deprecated field aliases work
print(d.feed.description) # → d.feed.subtitle
print(d.channel.title) # → d.feed.title
```

> [!NOTE]
> Python bindings provide `time.struct_time` for date fields, matching feedparser's API for easy migration.
> Python bindings provide full feedparser compatibility: dict-style access, field aliases, and `time.struct_time` for date fields.

## Cargo Features

Expand Down
2 changes: 1 addition & 1 deletion crates/feedparser-rs-node/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "feedparser-rs",
"version": "0.3.0",
"version": "0.4.0",
"description": "High-performance RSS/Atom/JSON Feed parser for Node.js",
"main": "index.js",
"types": "index.d.ts",
Expand Down
1 change: 1 addition & 0 deletions crates/feedparser-rs-py/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ crate-type = ["cdylib"]
feedparser-rs = { path = "../feedparser-rs-core" }
pyo3 = { workspace = true, features = ["extension-module", "chrono"] }
chrono = { workspace = true, features = ["clock"] }
once_cell = { workspace = true }

[features]
default = ["http"]
Expand Down
82 changes: 59 additions & 23 deletions crates/feedparser-rs-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ High-performance RSS/Atom/JSON Feed parser for Python with feedparser-compatible
- **Tolerant parsing**: Bozo flag for graceful handling of malformed feeds
- **Multi-format**: RSS 0.9x/1.0/2.0, Atom 0.3/1.0, JSON Feed 1.0/1.1
- **Podcast support**: iTunes and Podcast 2.0 namespace extensions
- **Familiar API**: Inspired by feedparser, easy migration path
- **feedparser-compatible**: Dict-style access, field aliases, same API patterns
- **DoS protection**: Built-in resource limits

## Installation
Expand All @@ -33,15 +33,20 @@ pip install feedparser-rs
```python
import feedparser_rs

# Parse from string or bytes
# Parse from string, bytes, or URL (auto-detected)
d = feedparser_rs.parse('<rss>...</rss>')
d = feedparser_rs.parse(b'<rss>...</rss>')
d = feedparser_rs.parse('https://example.com/feed.xml') # URL auto-detected

# Access data
# Attribute-style access (feedparser-compatible)
print(d.feed.title)
print(d.version) # "rss20", "atom10", etc.
print(d.bozo) # True if parsing errors occurred

# Dict-style access (feedparser-compatible)
print(d['feed']['title'])
print(d['entries'][0]['link'])

for entry in d.entries:
print(entry.title)
print(entry.published_parsed) # time.struct_time
Expand All @@ -55,35 +60,63 @@ for entry in d.entries:
```python
import feedparser_rs

# Fetch and parse in one call
# Option 1: Auto-detection (recommended)
d = feedparser_rs.parse('https://example.com/feed.xml')

# Option 2: Explicit URL function
d = feedparser_rs.parse_url('https://example.com/feed.xml')

print(d.feed.title)
print(f"Fetched {len(d.entries)} entries")
# With conditional GET for efficient polling
d = feedparser_rs.parse(
'https://example.com/feed.xml',
etag=cached_etag,
modified=cached_modified
)
if d.status == 304:
print("Feed not modified")

# With custom limits
limits = feedparser_rs.ParserLimits(max_entries=100)
d = feedparser_rs.parse_url_with_limits('https://example.com/feed.xml', limits)
d = feedparser_rs.parse_with_limits('https://example.com/feed.xml', limits=limits)
```

> [!TIP]
> `parse_url` supports automatic compression (gzip, deflate, brotli) and follows redirects.
> URL fetching supports automatic compression (gzip, deflate, brotli) and follows redirects.

## Migration from feedparser

feedparser-rs is designed as a drop-in replacement for Python feedparser:

```python
# Option 1: alias import
# Drop-in replacement
import feedparser_rs as feedparser
d = feedparser.parse(feed_content)

# Option 2: direct import
import feedparser_rs
d = feedparser_rs.parse(feed_content)
# Same API patterns work
d = feedparser.parse('https://example.com/feed.xml')
print(d.feed.title)
print(d['feed']['title']) # Dict-style access works too
print(d.entries[0].link)

# Option 3: URL fetching (new!)
d = feedparser_rs.parse_url('https://example.com/feed.xml')
# Deprecated field names supported
print(d.feed.description) # → d.feed.subtitle
print(d.channel.title) # → d.feed.title
print(d.items[0].guid) # → d.entries[0].id
```

### Supported Field Aliases

| Old Name | Maps To |
|----------|---------|
| `feed.description` | `feed.subtitle` or `feed.summary` |
| `feed.tagline` | `feed.subtitle` |
| `feed.copyright` | `feed.rights` |
| `feed.modified` | `feed.updated` |
| `channel` | `feed` |
| `items` | `entries` |
| `entry.guid` | `entry.id` |
| `entry.description` | `entry.summary` |
| `entry.issued` | `entry.published` |

## Advanced Usage

### Custom Resource Limits
Expand All @@ -98,7 +131,7 @@ limits = feedparser_rs.ParserLimits(
max_links_per_entry=50,
)

d = feedparser_rs.parse_with_limits(feed_data, limits)
d = feedparser_rs.parse_with_limits(feed_data, limits=limits)
```

### Format Detection
Expand Down Expand Up @@ -132,20 +165,23 @@ for entry in d.entries:

### Functions

- `parse(source)` — Parse feed from bytes or str
- `parse_url(url)` — Fetch and parse feed from URL
- `parse_with_limits(source, limits)` — Parse with custom resource limits
- `parse_url_with_limits(url, limits)` — Fetch and parse with custom limits
- `parse(source, etag=None, modified=None, user_agent=None)` — Parse feed from bytes, str, or URL (auto-detected)
- `parse_url(url, etag=None, modified=None, user_agent=None)` — Fetch and parse feed from URL
- `parse_with_limits(source, etag=None, modified=None, user_agent=None, limits=None)` — Parse with custom resource limits
- `parse_url_with_limits(url, etag=None, modified=None, user_agent=None, limits=None)` — Fetch and parse with custom limits
- `detect_format(source)` — Detect feed format without full parsing

### Classes

- `FeedParserDict` — Parsed feed result
- `.feed` — Feed metadata
- `.entries` — List of entries
- `FeedParserDict` — Parsed feed result (supports both attribute and dict-style access)
- `.feed` / `['feed']` — Feed metadata
- `.entries` / `['entries']` — List of entries
- `.bozo` — True if parsing errors occurred
- `.version` — Feed version string
- `.encoding` — Character encoding
- `.status` — HTTP status code (for URL fetches)
- `.etag` — ETag header (for conditional GET)
- `.modified` — Last-Modified header (for conditional GET)

- `ParserLimits` — Resource limits configuration

Expand Down
2 changes: 1 addition & 1 deletion crates/feedparser-rs-py/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "maturin"

[project]
name = "feedparser-rs"
version = "0.3.0"
version = "0.4.0"
description = "High-performance RSS/Atom/JSON Feed parser with feedparser-compatible API"
readme = "README.md"
license = { text = "MIT OR Apache-2.0" }
Expand Down
Loading