Deep Reader: Anti-Slop Content Extractor

By George Freedom

A minimalist, high-focus reading tool built with Streamlit, designed to strip away digital noise, advertisements, and "AI-slop" from web articles.

This project serves as a Digital Noise Razor utility: capturing raw HTML from any URL, isolating the core intellectual signal using the Trafilatura engine, and rendering a clean, distraction-free Markdown interface.

It is a practical implementation of the Rugged IT philosophy—prioritizing information integrity and mental focus over modern web bloat and digital overload.

📖 Live Reader HQ

Access the distraction-free reading environment directly through the web interface.

Live App: Deep Reader on Streamlit Cloud
Core Mission: This application implements a "Read-Only" protocol. It demonstrates how to bypass engagement-driven web design to reclaim focus and ensure data-driven consumption of information.

🚀 Key Features

Zero-Distraction Interface: Complete removal of pop-ups, scripts, menus, and tracking pixels.
Markdown-First Rendering: Converts complex web layouts into clean, structured Markdown for optimal legibility.
High-Efficiency Extraction: Leverages the Trafilatura library for industry-leading precision in identifying main article text.
Metadata Integrity: Automatically captures source URLs and timestamps to maintain a clean digital archive.
Mobile-Responsive Design: Access your focused reading list from a laptop in the office or a phone on the go.

⚙️ System Architecture

The application follows a Minimalist Modular Architecture. By decoupling the extraction engine from the UI, the tool remains lightweight, resilient, and easily deployable on low-profile hardware like Raspberry Pi.

🌐 Signal Ingestion (The Interface)

URL Handshaking: Securely fetching raw HTML data from remote servers.
Resilience Protocols: Handling failed requests and script-heavy sites with graceful error signaling.

🧠 The Cleaning Engine (The Filter)

Content Distillation: Identification of the primary text body while discarding structural noise (menus, ads, footers).
Format Conversion: Transforming raw HTML into human-centric Markdown.

💻 Focus Workspace (The HQ)

Streamlit UI: A centered, high-contrast interface designed for long-form reading.
Session Persistence: Fast "Clear & Rerun" flow for processing multiple sources in sequence.

System Diagram

graph TB;
    subgraph "The Web (Digital Overload)"
        URL[Raw Article URL]
    end

    subgraph "Deep Reader (The Filter)"
        direction TD
        Fetch[Trafilatura Fetcher]
        Clean[Content Distiller]
        MD[Markdown Converter]
    end

    subgraph "User (Focus Mode)"
        Display[Clean Text Interface]
    end

    %% Flow
    URL -- Raw HTML --> Fetch;
    Fetch -- DOM Signal --> Clean;
    Clean -- Refined Data --> MD;
    MD -- Structured Text --> Display;

File Structure

deep-reader/
│
├── app.py                  # Main application & UI Orchestrator
├── requirements.txt        # Python dependencies (Streamlit, Trafilatura)
├── .gitignore              # Files ignored by Git
└── README.md               # This file

💡 Development Philosophy & AI Collaboration

This project was built using the "Human-Architect, AI-Builder" methodology, reflecting the core of the Builder Mindset.

Human-led Strategy: Defining the need for a "Digital Hygiene" tool to combat information overload and designing the minimalist UX.

AI-assisted Implementation: Leveraging AI to rapidly prototype the extraction logic and streamline the Streamlit UI components.

Human-driven Refactoring: Enforcing the "Cut the Fat" rule—removing unnecessary sidebars and complex CSS in favor of a lean, rugged script.

⚙️ Setup and running

Clone the Repository:

git clone [https://github.com/GeorgeFreedomTech/deep-reader.git](https://github.com/GeorgeFreedomTech/deep-reader.git)
cd deep-reader

Create and Activate a Virtual Environment:

python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
Run the App:
```
streamlit run app.py
```

🔗 Let's Connect:

Visit my website: https://GeorgeFreedom.com
Connect on LinkedIn: https://www.linkedin.com/in/georgefreedom/
Let's talk: https://cal.com/georgefreedom

📜 License:

This project is licensed under:

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

We build for the Future!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reader: Anti-Slop Content Extractor

By George Freedom

📖 Live Reader HQ

🚀 Key Features

⚙️ System Architecture

🌐 Signal Ingestion (The Interface)

🧠 The Cleaning Engine (The Filter)

💻 Focus Workspace (The HQ)

System Diagram

File Structure

💡 Development Philosophy & AI Collaboration

⚙️ Setup and running

🔗 Let's Connect:

📜 License:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Reader: Anti-Slop Content Extractor

By George Freedom

📖 Live Reader HQ

🚀 Key Features

⚙️ System Architecture

🌐 Signal Ingestion (The Interface)

🧠 The Cleaning Engine (The Filter)

💻 Focus Workspace (The HQ)

System Diagram

File Structure

💡 Development Philosophy & AI Collaboration

⚙️ Setup and running

🔗 Let's Connect:

📜 License:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages