Feeds
Synthient publishes every data stream as Parquet snapshots hourly rolls plus a daily rollup alongside the real-time NDJSON streams. Snapshots are served as 24-hour presigned R2 URLs and are accompanied by full metadata: size, row count, SHA-256 checksum, and parquet schema.
Streams
The same seven stream identifiers are used across exports and real-time streaming.
| Stream | Description |
|---|---|
proxies | Proxy IP observations residential, datacenter, and mobile. |
anonymizers | VPN, Tor, and relay-class anonymizer ranges. |
torrents | DHT and tracker peer sightings with info hash, metadata, and peers. |
honeypot_http | HTTP request captures from Helios honeypot sensors. |
honeypot_https | TLS ClientHello captures from Helios honeypot sensors. |
honeypot_dns | DNS resolution observations from Helios honeypot tunnels. |
honeypot_adb | Android Debug Bridge shell commands captured by Helios honeypot sensors. |
Authentication
All endpoints require your API key in the x-api-key header and are served from https://api.synthient.com under /api/v4. Each stream is gated by its own *_FEED scope, such as PROXY_FEEDS, ANONYMIZERS_FEED, or HONEYPOT_HTTP_FEED.
curl -G https://api.synthient.com/api/v4/feeds/proxies/export \
-H "x-api-key: $API_KEY"
Hourly snapshots and rollups
Hourly snapshots are addressable at /api/v4/feeds/{stream}/export/{date}/{hour} (with a matching /meta variant). Only the current UTC date is addressable for hourlies at 00:30 UTC each day, the previous day's hourlies are rolled up into the daily snapshot at /export/{date} and the per-hour artifacts are deleted. Use the snapshot id latest to track the most recent hourly without committing to a specific hour.
The Helios sensors live under their helio/ URL prefix for example /api/v4/feeds/helio/http/export/{date} (and /feeds/helio/https/... for TLS captures) but they appear in this list endpoint under the honeypot_http, honeypot_https, honeypot_dns, and honeypot_adb stream identifiers.
List snapshots
Returns one page of available daily and hourly Parquet snapshots for the given stream. Pages are ordered newest-first and capped at 500 rows; pass next_cursor from the response back as cursor to fetch the next page.
Path parameters
- Name
stream- Type
- string
- Description
One of
proxies,anonymizers,torrents,honeypot_http,honeypot_https,honeypot_dns, orhoneypot_adb.
Query parameters
- Name
limit- Type
- integer
- Description
Page size. Defaults to
100. Values above500are clamped.
- Name
cursor- Type
- string
- Description
Opaque pagination token returned in
next_cursorof the previous response. Omit on the first page.
Response
- Name
stream- Type
- string
- Description
The stream identifier the snapshots belong to.
- Name
feeds- Type
- array<object>
- Description
Page of snapshots, newest-first.
- Name
feeds[].kind- Type
- string
- Description
Either
hourlyordaily.
- Name
feeds[].date- Type
- string
- Description
UTC
YYYY-MM-DDof the snapshot.
- Name
feeds[].hour- Type
- integer
- Description
0–23for hourly snapshots; omitted for daily rollups.
- Name
feeds[].size_bytes- Type
- integer
- Description
Parquet file size in bytes.
- Name
feeds[].row_count- Type
- integer
- Description
Number of rows in the parquet file.
- Name
feeds[].checksum- Type
- string
- Description
Hex-encoded SHA-256 of the parquet file bytes.
- Name
feeds[].id- Type
- string
- Description
Stable identifier
YYYY-MM-DDfor daily rollups,YYYY-MM-DD/HHfor past hourlies, and the literallatestfor the most recent hourly snapshot.
- Name
feeds[].created_at- Type
- integer
- Description
Unix timestamp in seconds when the snapshot was indexed server-side.
- Name
feeds[].download_path- Type
- string
- Description
Relative API path to follow for the 307 download redirect.
- Name
next_cursor- Type
- string
- Description
Pagination token. Absent on the final page.
Request
curl -G https://api.synthient.com/api/v4/feeds/proxies/export \
-H "x-api-key: $API_KEY" \
--url-query "limit=50"
Response
{
"stream": "proxies",
"feeds": [
{
"kind": "hourly",
"date": "2026-05-07",
"hour": 22,
"size_bytes": 620273951,
"row_count": 43141039,
"checksum": "fd6c002ad6c6ae73344c2fdf1cb535a303d90edf9252358e0d30a44231649d36",
"id": "latest",
"created_at": 1778195206,
"download_path": "/api/v4/feeds/proxies/export/latest"
},
{
"kind": "hourly",
"date": "2026-05-07",
"hour": 21,
"size_bytes": 612481020,
"row_count": 42811027,
"checksum": "d3fb2ec3de2bbf6af66b6028afe668e365c4113133757153ad975be93609d1ea",
"id": "2026-05-07/21",
"created_at": 1778192818,
"download_path": "/api/v4/feeds/proxies/export/2026-05-07/21"
},
{
"kind": "daily",
"date": "2026-05-06",
"size_bytes": 14721234567,
"row_count": 1024411203,
"checksum": "a3e4b804e1112ed2dd11e4b01a25a637ceb3abae23fa1f7c8503d567639a11a2",
"id": "2026-05-06",
"created_at": 1778115429,
"download_path": "/api/v4/feeds/proxies/export/2026-05-06"
}
],
"next_cursor": "eyJkIjoiMjAyNi0wNS0wNyIsImgiOjIxfQ.GECAibkM_hEUt0ixRMKzzQ"
}
Download a snapshot
Returns a 307 redirect to a presigned R2 URL that is valid for 24 hours. The 307 (rather than 301) is intentional the URL is minted per request and expires, so intermediaries must not cache it. Follow the redirect to download the parquet file. Use the date string latest for the most recent hourly snapshot, or append a specific hour at /{date}/{hour} for a particular hourly within the current UTC day.
Path parameters
- Name
stream- Type
- string
- Description
One of the seven stream identifiers.
- Name
date- Type
- string
- Description
Either a
YYYY-MM-DDUTC date for a daily rollup, or the literal stringlatestfor the most recent hourly snapshot.
- Name
hour- Type
- integer
- Description
Optional.
0–23UTC hour, addressed as/api/v4/feeds/{stream}/export/{date}/{hour}. Only the current UTC date is addressable for hourlies.
Responses
| Code | Meaning |
|---|---|
307 | Redirect to a 24-hour presigned download URL. |
400 | Bad date format, hour out of range (0..23), or hour requested for a non-current UTC date. |
401 | Missing or invalid API key. |
403 | API key lacks the per-stream feed scope. |
404 | No snapshot exists for the requested date/stream. |
Request
# Follow the 307 with -L and write to a file
curl -L -o proxies-2026-05-06.parquet \
https://api.synthient.com/api/v4/feeds/proxies/export/2026-05-06 \
-H "x-api-key: $API_KEY"
Snapshot metadata
Returns JSON metadata for a parquet snapshot SHA-256 checksum, byte size, row count, parquet schema with column names and types, and the canonical date.
Response
- Name
stream- Type
- string
- Description
The stream identifier the snapshot belongs to.
- Name
kind- Type
- string
- Description
Either
"hourly"or"daily".
- Name
hour- Type
- integer
- Description
0–23for hourly snapshots; omitted for daily rollups.
- Name
id- Type
- string
- Description
Stable identifier
YYYY-MM-DDfor daily rollups,YYYY-MM-DD/HHfor past hourlies,latestfor the most recent hourly snapshot.
- Name
format- Type
- string
- Description
Always
"parquet".
- Name
date- Type
- integer
- Description
Unix timestamp in seconds for the snapshot instant. Daily rollups use the day's midnight UTC; hourly snapshots use the hour mark.
- Name
created_at- Type
- integer
- Description
Unix timestamp in seconds when the snapshot was indexed server-side.
- Name
size- Type
- integer
- Description
File size in bytes.
- Name
rows- Type
- integer
- Description
Number of rows in the parquet file.
- Name
checksum- Type
- string
- Description
Hex-encoded SHA-256 of the parquet file bytes.
- Name
schema.fields- Type
- array<object>
- Description
One entry per column in the parquet footer.
- Name
schema.fields[].name- Type
- string
- Description
Column name.
- Name
schema.fields[].type- Type
- string
- Description
Column type one of
string,int64,uint32,uint64,bool,bytes.
Request
curl -G https://api.synthient.com/api/v4/feeds/proxies/export/2026-05-02/meta \
-H "x-api-key: $API_KEY"
Response
{
"stream": "proxies",
"kind": "hourly",
"hour": 22,
"id": "latest",
"format": "parquet",
"date": 1778191200,
"created_at": 1778195206,
"size": 620273951,
"rows": 43141039,
"checksum": "fd6c002ad6c6ae73344c2fdf1cb535a303d90edf9252358e0d30a44231649d36",
"schema": {
"fields": [
{ "name": "ip", "type": "string" },
{ "name": "provider", "type": "string" },
{ "name": "type", "type": "string" },
{ "name": "timestamp", "type": "int64" },
{ "name": "country_code", "type": "string" },
{ "name": "asn", "type": "uint32" }
]
}
}
Paginating snapshot listings
The list endpoint returns up to 500 snapshots per page, newest-first, with a next_cursor you pass back as cursor on the next call. When the response no longer carries a next_cursor, you've reached the end.
Page through every snapshot
import os, requests
URL = "https://api.synthient.com/api/v4/feeds/proxies/export"
HEADERS = {"x-api-key": os.environ["API_KEY"]}
cursor = None
while True:
params = {"limit": 500, **({"cursor": cursor} if cursor else {})}
page = requests.get(URL, headers=HEADERS, params=params, timeout=30).json()
for snap in page["feeds"]:
handle(snap)
cursor = page.get("next_cursor")
if not cursor:
break
Response Codes
| Status Code | Description |
|---|---|
| 200 - Success | Snapshot list or metadata returned. |
| 307 - Redirect | Follow the Location header to a 24-hour presigned download URL. |
| 400 - Bad Request | Invalid cursor, limit, or date format. |
| 401 - Unauthorized | Missing or invalid API key. |
| 403 - Forbidden | API key lacks the per-stream feed scope. |
| 404 - Not Found | No snapshot exists for the requested date/stream. |
| 500 - Internal Server Error | Unexpected server-side error; reach out to support if the issue persists. |