Open IP enrichment knowledge layer for cloud infrastructure, crawler networks, Tor, ASN attribution, and VPN-adjacent network intelligence.
The repository publishes normalized machine-readable datasets intended for SIEM pipelines, fraud systems, enrichment services, gateways, analytics stacks, and operational network tooling.
Primary outputs:
ip-knowledge.jsonlip-knowledge.csvcloud-prefixes.csvasn-signals.csvcidr-tags.txt
Most public IP datasets focus on a single domain:
- cloud ranges
- Tor exits
- crawler infrastructure
- ASN ownership
- VPN signals
IP Knowledge Layer consolidates those signals into a unified enrichment layer with normalized metadata, provider attribution, confidence scoring, and source provenance.
The goal is operational context.
CIDR / ASN
-> layer
-> provider
-> service
-> tags
-> confidence
-> source
Instead of only identifying a prefix, consumers can classify infrastructure characteristics and attach explainable metadata to network events.
| Metric | Value |
|---|---|
| Records | 113,349 |
| Prefix records | 111,419 |
| ASN signals | 1,930 |
| Sources | 12 |
| Collector errors | 0 |
| Layer | Records |
|---|---|
hosting-cloud |
97,973 |
anonymity |
11,615 |
asn-signal |
1,930 |
crawler-bot |
1,831 |
| Provider | Records |
|---|---|
| Azure | 73,422 |
| AWS | 15,675 |
| Tor | 11,615 |
| GitHub | 6,677 |
| Oracle Cloud | 1,078 |
Public Sources
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
Cloud Ranges Crawler Feeds Tor Signals
│ │ │
└──────────────┬───┴──────────────────┘
▼
Normalization Layer
CIDR + metadata merge
▼
Attribution Engine
provider / tags / confidence
▼
Export Pipeline
JSONL / CSV / TXT / summaries
▼
Operational Consumers
SIEM / WAF / Fraud / Analytics
Official cloud, CDN, edge, and developer-platform infrastructure ranges.
Providers currently include:
- AWS
- Azure
- Google Cloud
- Cloudflare
- Fastly
- GitHub
- Oracle Cloud
Crawler, AI bot, monitoring, scanner, SEO, and preview infrastructure derived from:
- CrawlerScope
Tor relay and exit infrastructure derived from:
- Tor-Radar
ASN-level VPN-adjacent aggregate attribution.
This layer intentionally publishes ASN evidence only, not raw VPN endpoint inventories.
| File | Description |
|---|---|
ip-knowledge.jsonl |
Full normalized enrichment layer |
ip-knowledge.csv |
Tabular export for analytics/SIEM tooling |
cloud-prefixes.csv |
Cloud/CDN/developer platform prefixes |
asn-signals.csv |
ASN-level VPN-adjacent signals |
cidr-tags.txt |
Lightweight CIDR-to-tags feed |
summary.json |
Build metadata and aggregate statistics |
source-index.json |
Source inventory and provenance |
BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"
curl -fsSLO "$BASE/ip-knowledge.jsonl"
curl -fsSLO "$BASE/cloud-prefixes.csv"
curl -fsSLO "$BASE/asn-signals.csv"
curl -fsSLO "$BASE/cidr-tags.txt"Example JSONL record:
{
"prefix": "104.16.0.0/13",
"layer": "hosting-cloud",
"provider": "Cloudflare",
"service": "edge",
"tags": [
"cdn",
"edge",
"proxy"
],
"confidence": 0.99,
"source_id": "cloudflare-v4"
}curl -fsSL "$BASE/cloud-prefixes.csv" \
| awk -F, '$3 == "Cloudflare" { print }'curl -fsSL "$BASE/ip-knowledge.jsonl" \
| jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'curl -fsSL "$BASE/ip-knowledge.jsonl" \
| jq -r 'select(.tags | index("ai-crawler")) | .prefix'curl -fsSL "$BASE/asn-signals.csv" \
| awk -F, '$3 == "NordVPN" { print }'| Domain | Usage |
|---|---|
| Fraud Detection | VPN/Tor/datacenter scoring |
| SIEM Enrichment | Infrastructure attribution |
| WAF Pipelines | Cloud and crawler classification |
| Threat Hunting | Network context correlation |
| Bot Management | AI crawler visibility |
| Internal Analytics | Infrastructure intelligence |
python3 scripts/update.pyPreferred local enrichment sources:
../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv
If local datasets are unavailable, the collector falls back to public upstream sources.
Dataset builds run every 6 hours.
.github/workflows/ip-knowledge-layer.yml
Only current datasets are stored in full. Historical snapshots remain compact to avoid repository growth.
- CIDRs are preserved without full IPv4 expansion
- Overlapping provider ranges are intentionally retained
- Confidence reflects source reliability, not maliciousness
- ASN VPN signals are aggregate indicators, not endpoint dumps
- The project avoids mass RDAP/WHOIS crawling during CI builds
Planned additions:
- ASN rollup datasets
- Prefix overlap analysis
- Historical diff exports
- Provider metadata index
- Compressed ASN-to-prefix layers
- Confidence weighting improvements
CC0-1.0. See LICENSE.
This repository publishes operational network enrichment data derived from public and derived infrastructure sources. Consumers are responsible for validating suitability within their own environments.