Skip to content

ipanalytics/IP-Knowledge-Layer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IP Knowledge Layer

License CI Last Commit Repo Size Dataset Formats


Open IP enrichment knowledge layer for cloud infrastructure, crawler networks, Tor, ASN attribution, and VPN-adjacent network intelligence.

The repository publishes normalized machine-readable datasets intended for SIEM pipelines, fraud systems, enrichment services, gateways, analytics stacks, and operational network tooling.

Primary outputs:

  • ip-knowledge.jsonl
  • ip-knowledge.csv
  • cloud-prefixes.csv
  • asn-signals.csv
  • cidr-tags.txt

Overview

Most public IP datasets focus on a single domain:

  • cloud ranges
  • Tor exits
  • crawler infrastructure
  • ASN ownership
  • VPN signals

IP Knowledge Layer consolidates those signals into a unified enrichment layer with normalized metadata, provider attribution, confidence scoring, and source provenance.

The goal is operational context.

CIDR / ASN
    -> layer
    -> provider
    -> service
    -> tags
    -> confidence
    -> source

Instead of only identifying a prefix, consumers can classify infrastructure characteristics and attach explainable metadata to network events.


Current Dataset Snapshot

Metric Value
Records 113,349
Prefix records 111,419
ASN signals 1,930
Sources 12
Collector errors 0

Layer Distribution

Layer Records
hosting-cloud 97,973
anonymity 11,615
asn-signal 1,930
crawler-bot 1,831

Top Providers

Provider Records
Azure 73,422
AWS 15,675
Tor 11,615
GitHub 6,677
Oracle Cloud 1,078

Architecture

                    Public Sources
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
   Cloud Ranges      Crawler Feeds       Tor Signals
        │                  │                  │
        └──────────────┬───┴──────────────────┘
                       ▼
              Normalization Layer
              CIDR + metadata merge
                       ▼
               Attribution Engine
            provider / tags / confidence
                       ▼
                 Export Pipeline
        JSONL / CSV / TXT / summaries
                       ▼
              Operational Consumers
      SIEM / WAF / Fraud / Analytics

Layers

hosting-cloud

Official cloud, CDN, edge, and developer-platform infrastructure ranges.

Providers currently include:

  • AWS
  • Azure
  • Google Cloud
  • Cloudflare
  • Fastly
  • GitHub
  • Oracle Cloud

crawler-bot

Crawler, AI bot, monitoring, scanner, SEO, and preview infrastructure derived from:

  • CrawlerScope

anonymity

Tor relay and exit infrastructure derived from:

  • Tor-Radar

asn-signal

ASN-level VPN-adjacent aggregate attribution.

This layer intentionally publishes ASN evidence only, not raw VPN endpoint inventories.


Files

File Description
ip-knowledge.jsonl Full normalized enrichment layer
ip-knowledge.csv Tabular export for analytics/SIEM tooling
cloud-prefixes.csv Cloud/CDN/developer platform prefixes
asn-signals.csv ASN-level VPN-adjacent signals
cidr-tags.txt Lightweight CIDR-to-tags feed
summary.json Build metadata and aggregate statistics
source-index.json Source inventory and provenance

Download

BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"

curl -fsSLO "$BASE/ip-knowledge.jsonl"
curl -fsSLO "$BASE/cloud-prefixes.csv"
curl -fsSLO "$BASE/asn-signals.csv"
curl -fsSLO "$BASE/cidr-tags.txt"

Record Format

Example JSONL record:

{
  "prefix": "104.16.0.0/13",
  "layer": "hosting-cloud",
  "provider": "Cloudflare",
  "service": "edge",
  "tags": [
    "cdn",
    "edge",
    "proxy"
  ],
  "confidence": 0.99,
  "source_id": "cloudflare-v4"
}

Usage Examples

Extract Cloudflare prefixes

curl -fsSL "$BASE/cloud-prefixes.csv" \
  | awk -F, '$3 == "Cloudflare" { print }'

Extract Tor exits

curl -fsSL "$BASE/ip-knowledge.jsonl" \
  | jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'

Extract AI crawler infrastructure

curl -fsSL "$BASE/ip-knowledge.jsonl" \
  | jq -r 'select(.tags | index("ai-crawler")) | .prefix'

Find ASN signals for a provider

curl -fsSL "$BASE/asn-signals.csv" \
  | awk -F, '$3 == "NordVPN" { print }'

Operational Use Cases

Domain Usage
Fraud Detection VPN/Tor/datacenter scoring
SIEM Enrichment Infrastructure attribution
WAF Pipelines Cloud and crawler classification
Threat Hunting Network context correlation
Bot Management AI crawler visibility
Internal Analytics Infrastructure intelligence

Local Update

python3 scripts/update.py

Preferred local enrichment sources:

../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv

If local datasets are unavailable, the collector falls back to public upstream sources.


GitHub Actions

Dataset builds run every 6 hours.

.github/workflows/ip-knowledge-layer.yml

Only current datasets are stored in full. Historical snapshots remain compact to avoid repository growth.


Notes

  • CIDRs are preserved without full IPv4 expansion
  • Overlapping provider ranges are intentionally retained
  • Confidence reflects source reliability, not maliciousness
  • ASN VPN signals are aggregate indicators, not endpoint dumps
  • The project avoids mass RDAP/WHOIS crawling during CI builds

Roadmap

Planned additions:

  • ASN rollup datasets
  • Prefix overlap analysis
  • Historical diff exports
  • Provider metadata index
  • Compressed ASN-to-prefix layers
  • Confidence weighting improvements

License

CC0-1.0. See LICENSE.


Disclaimer

This repository publishes operational network enrichment data derived from public and derived infrastructure sources. Consumers are responsible for validating suitability within their own environments.

About

Open IP enrichment knowledge layer: CIDR, ASN, cloud, CDN, crawler, Tor, and VPN-adjacent network context with source provenance and confidence.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages