Skip to content

Conversation

Copy link

Copilot AI commented Dec 7, 2025

ABP blocklists use ||domain^ syntax to indicate domain-level blocks. The current regex extracts bare domains, which don't achieve subdomain blocking at the DNS level. For DNS blockers, these need to be converted to *.domain wildcard format.

Changes:

  • extractDomains() function: Added isAbp parameter. When enabled, converts extracted domains to wildcard format (*.domain) and strips path/parameter suffixes
  • ABP regex pattern: Simplified from complex pattern that missed rules with paths/parameters to r'^\|\|([a-zA-Z0-9][a-zA-Z0-9-_.]+)' which reliably matches all ||domain lines

Example:

# Before: ||example.com/path^ → example.com (incomplete blocking)
# After:  ||example.com/path^ → *.example.com (blocks all subdomains)

The simpler regex captures the domain portion from any ABP rule starting with ||, regardless of trailing paths, parameters, or modifiers. Domain cleanup (stripping /, $, ^ suffixes) happens in post-processing.

Tested with RPiList malware ABP list (293k domains extracted, all correctly prefixed with `*.'). Non-ABP formats unchanged.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • easylist-downloads.adblockplus.org
    • Triggering command: /usr/bin/curl curl -s REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve the ABP regex handling</issue_title>
<issue_description>Creating an issue while the pull request is closed as it would be easier to discuss in an issue.

So I found out where I would need to modify the ABP regex because where I currently modified it was the incorrect location.

Currently the code here is where I’am assuming the change needs to be.

it could possibly be changed to…

def extractDomains(txt, rgx, groupindex):
    domainlist = set()
    regexc = re.compile(rgx, re.M)

    for match in re.finditer(regexc, txt):
        g = match.groups()
        if g is None or len(g) <= groupindex:
            continue
        g2 = g[groupindex]
        g2 = g2.strip()

        if g2.startswith("||") and g2.endswith("^"):
            domain = g2[2:-1].replace("^", "*")
            domain = "*." + domain
            domainlist.add(domain)

        else:
            if g2 and g2[-1] != '.':
                domainlist.add(g2)

    if len(domainlist) <= 0:
        return ""

    return "\n".join(domainlist)

This replacement should replace || with *. and removes the ^ or at least that’s the point of it so it will be treated like a standard wildcard entry. While this doesn’t address situations where there is an ending like $thirdparty, it will address the standard dns abp cases.</issue_description>

<agent_instructions>ABP is the adblock plus format for specifying content blocking at both domain name level and at content level that the said domain name serves typically over the web rendered inside a browser. Some blocklists (as specified in config.json field "format") are in the ABP format. Our purpose is to extract out exact domain blocks and wildcard domain blocks. ABP may also have other irrelevant rules that don't matter to a DNS based blocker like ours. We simply need to discard those.

ABP filter how-to: https://help.adblockplus.org/hc/en-us/articles/360062733293-How-to-write-filters

ABP list url used in our config.json:

The code that isn't probably working as it should to extract domain rules from ABP is here:

blocklists/download.py

Lines 318 to 321 in c33a28a

elif format == "abp":
domains = extractDomains(response,
r'^(\|\||[a-zA-Z0-9])([a-zA-Z0-9][a-zA-Z0-9-_.]+)((\^[a-zA-Z0-9\-\|\$\.\*]*)|(\$[a-zA-Z0-9\-\|\.])*|(\\[a-zA-Z0-9\-\||\^\.]*))$',
1)
</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…format

Co-authored-by: ignoramous <852289+ignoramous@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve ABP regex handling Convert ABP ||domain^ patterns to *.domain wildcard format for DNS blocking Dec 7, 2025
Copilot AI requested a review from ignoramous December 7, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the ABP regex handling

2 participants