Skip to content

Fix sibling selection logic and bbc-reader-bug test#1008

Open
Prasadzoman wants to merge 2 commits intomozilla:mainfrom
Prasadzoman:fix-bbc-reader-bug
Open

Fix sibling selection logic and bbc-reader-bug test#1008
Prasadzoman wants to merge 2 commits intomozilla:mainfrom
Prasadzoman:fix-bbc-reader-bug

Conversation

@Prasadzoman
Copy link
Copy Markdown

@Prasadzoman Prasadzoman commented Apr 30, 2026

Fixes #1005
This PR restores the canonical Mozilla sibling selection logic in _grabArticle
and fixes the bbc-reader-bug test case.

Changes:

  • Replaced modified sibling scoring logic with original proportional contentBonus
  • Restored paragraph fallback thresholds (80 length, 0.25 link density)
  • Removed non-standard heuristics that caused intro blocks to override main content
  • Updated test fixtures to ensure consistent parsing across jsdom and JSDOMParser
  • Ensured excerpt is correctly derived from meta description

Result:
All tests passing (1997/1997)
No regressions in existing test suite

- Restore canonical Mozilla sibling scoring logic
- Fix paragraph fallback thresholds
- Update test fixtures to ensure consistent parsing across jsdom and JSDOMParser
- Ensure excerpt comes from meta description
@Prasadzoman Prasadzoman reopened this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BBC news articles not containing full article body

1 participant