A minimalist, high-focus reading tool built with Streamlit, designed to strip away digital noise, advertisements, and "AI-slop" from web articles.
This project serves as a Digital Noise Razor utility: capturing raw HTML from any URL, isolating the core intellectual signal using the Trafilatura engine, and rendering a clean, distraction-free Markdown interface.
It is a practical implementation of the Rugged IT philosophy—prioritizing information integrity and mental focus over modern web bloat and digital overload.
Access the distraction-free reading environment directly through the web interface.
-
Live App: Deep Reader on Streamlit Cloud
-
Core Mission: This application implements a "Read-Only" protocol. It demonstrates how to bypass engagement-driven web design to reclaim focus and ensure data-driven consumption of information.
-
Zero-Distraction Interface: Complete removal of pop-ups, scripts, menus, and tracking pixels.
-
Markdown-First Rendering: Converts complex web layouts into clean, structured Markdown for optimal legibility.
-
High-Efficiency Extraction: Leverages the Trafilatura library for industry-leading precision in identifying main article text.
-
Metadata Integrity: Automatically captures source URLs and timestamps to maintain a clean digital archive.
-
Mobile-Responsive Design: Access your focused reading list from a laptop in the office or a phone on the go.
The application follows a Minimalist Modular Architecture. By decoupling the extraction engine from the UI, the tool remains lightweight, resilient, and easily deployable on low-profile hardware like Raspberry Pi.
- URL Handshaking: Securely fetching raw HTML data from remote servers.
- Resilience Protocols: Handling failed requests and script-heavy sites with graceful error signaling.
- Content Distillation: Identification of the primary text body while discarding structural noise (menus, ads, footers).
- Format Conversion: Transforming raw HTML into human-centric Markdown.
- Streamlit UI: A centered, high-contrast interface designed for long-form reading.
- Session Persistence: Fast "Clear & Rerun" flow for processing multiple sources in sequence.
graph TB;
subgraph "The Web (Digital Overload)"
URL[Raw Article URL]
end
subgraph "Deep Reader (The Filter)"
direction TD
Fetch[Trafilatura Fetcher]
Clean[Content Distiller]
MD[Markdown Converter]
end
subgraph "User (Focus Mode)"
Display[Clean Text Interface]
end
%% Flow
URL -- Raw HTML --> Fetch;
Fetch -- DOM Signal --> Clean;
Clean -- Refined Data --> MD;
MD -- Structured Text --> Display;
deep-reader/
│
├── app.py # Main application & UI Orchestrator
├── requirements.txt # Python dependencies (Streamlit, Trafilatura)
├── .gitignore # Files ignored by Git
└── README.md # This file
This project was built using the "Human-Architect, AI-Builder" methodology, reflecting the core of the Builder Mindset.
Human-led Strategy: Defining the need for a "Digital Hygiene" tool to combat information overload and designing the minimalist UX.
AI-assisted Implementation: Leveraging AI to rapidly prototype the extraction logic and streamline the Streamlit UI components.
Human-driven Refactoring: Enforcing the "Cut the Fat" rule—removing unnecessary sidebars and complex CSS in favor of a lean, rugged script.
- Clone the Repository:
git clone [https://github.com/GeorgeFreedomTech/deep-reader.git](https://github.com/GeorgeFreedomTech/deep-reader.git) cd deep-reader - Create and Activate a Virtual Environment:
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
- Install Dependencies:
pip install -r requirements.txt
- Run the App:
streamlit run app.py
- Visit my website: https://GeorgeFreedom.com
- Connect on LinkedIn: https://www.linkedin.com/in/georgefreedom/
- Let's talk: https://cal.com/georgefreedom
Copyright (c) 2025 Jiří Svoboda (George Freedom) / George Freedom Tech
This project is licensed under:
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
We build for the Future!