Feeds

Synthient publishes every data stream as Parquet snapshots hourly rolls plus a daily rollup alongside the real-time NDJSON streams. Snapshots are served as 24-hour presigned R2 URLs and are accompanied by full metadata: size, row count, SHA-256 checksum, and parquet schema.

Streams

The same seven stream identifiers are used across exports and real-time streaming.

StreamDescription
proxiesProxy IP observations residential, datacenter, and mobile.
anonymizersVPN, Tor, and relay-class anonymizer ranges.
torrentsDHT and tracker peer sightings with info hash, metadata, and peers.
honeypot_httpHTTP request captures from Helios honeypot sensors.
honeypot_httpsTLS ClientHello captures from Helios honeypot sensors.
honeypot_dnsDNS resolution observations from Helios honeypot tunnels.
honeypot_adbAndroid Debug Bridge shell commands captured by Helios honeypot sensors.

Authentication

All endpoints require your API key in the x-api-key header and are served from https://api.synthient.com under /api/v4. Each stream is gated by its own *_FEED scope, such as PROXY_FEEDS, ANONYMIZERS_FEED, or HONEYPOT_HTTP_FEED.

curl -G https://api.synthient.com/api/v4/feeds/proxies/export \
  -H "x-api-key: $API_KEY"

Hourly snapshots and rollups

Hourly snapshots are addressable at /api/v4/feeds/{stream}/export/{date}/{hour} (with a matching /meta variant). Only the current UTC date is addressable for hourlies at 00:30 UTC each day, the previous day's hourlies are rolled up into the daily snapshot at /export/{date} and the per-hour artifacts are deleted. Use the snapshot id latest to track the most recent hourly without committing to a specific hour.

The Helios sensors live under their helio/ URL prefix for example /api/v4/feeds/helio/http/export/{date} (and /feeds/helio/https/... for TLS captures) but they appear in this list endpoint under the honeypot_http, honeypot_https, honeypot_dns, and honeypot_adb stream identifiers.


GET/api/v4/feeds/{stream}/export

List snapshots

Returns one page of available daily and hourly Parquet snapshots for the given stream. Pages are ordered newest-first and capped at 500 rows; pass next_cursor from the response back as cursor to fetch the next page.

Path parameters

  • Name
    stream
    Type
    string
    Description

    One of proxies, anonymizers, torrents, honeypot_http, honeypot_https, honeypot_dns, or honeypot_adb.

Query parameters

  • Name
    limit
    Type
    integer
    Description

    Page size. Defaults to 100. Values above 500 are clamped.

  • Name
    cursor
    Type
    string
    Description

    Opaque pagination token returned in next_cursor of the previous response. Omit on the first page.

Response

  • Name
    stream
    Type
    string
    Description

    The stream identifier the snapshots belong to.

  • Name
    feeds
    Type
    array<object>
    Description

    Page of snapshots, newest-first.

  • Name
    feeds[].kind
    Type
    string
    Description

    Either hourly or daily.

  • Name
    feeds[].date
    Type
    string
    Description

    UTC YYYY-MM-DD of the snapshot.

  • Name
    feeds[].hour
    Type
    integer
    Description

    023 for hourly snapshots; omitted for daily rollups.

  • Name
    feeds[].size_bytes
    Type
    integer
    Description

    Parquet file size in bytes.

  • Name
    feeds[].row_count
    Type
    integer
    Description

    Number of rows in the parquet file.

  • Name
    feeds[].checksum
    Type
    string
    Description

    Hex-encoded SHA-256 of the parquet file bytes.

  • Name
    feeds[].id
    Type
    string
    Description

    Stable identifier YYYY-MM-DD for daily rollups, YYYY-MM-DD/HH for past hourlies, and the literal latest for the most recent hourly snapshot.

  • Name
    feeds[].created_at
    Type
    integer
    Description

    Unix timestamp in seconds when the snapshot was indexed server-side.

  • Name
    feeds[].download_path
    Type
    string
    Description

    Relative API path to follow for the 307 download redirect.

  • Name
    next_cursor
    Type
    string
    Description

    Pagination token. Absent on the final page.

Request

GET/api/v4/feeds/{stream}/export
curl -G https://api.synthient.com/api/v4/feeds/proxies/export \
  -H "x-api-key: $API_KEY" \
  --url-query "limit=50"

Response

{
  "stream": "proxies",
  "feeds": [
    {
      "kind": "hourly",
      "date": "2026-05-07",
      "hour": 22,
      "size_bytes": 620273951,
      "row_count": 43141039,
      "checksum": "fd6c002ad6c6ae73344c2fdf1cb535a303d90edf9252358e0d30a44231649d36",
      "id": "latest",
      "created_at": 1778195206,
      "download_path": "/api/v4/feeds/proxies/export/latest"
    },
    {
      "kind": "hourly",
      "date": "2026-05-07",
      "hour": 21,
      "size_bytes": 612481020,
      "row_count": 42811027,
      "checksum": "d3fb2ec3de2bbf6af66b6028afe668e365c4113133757153ad975be93609d1ea",
      "id": "2026-05-07/21",
      "created_at": 1778192818,
      "download_path": "/api/v4/feeds/proxies/export/2026-05-07/21"
    },
    {
      "kind": "daily",
      "date": "2026-05-06",
      "size_bytes": 14721234567,
      "row_count": 1024411203,
      "checksum": "a3e4b804e1112ed2dd11e4b01a25a637ceb3abae23fa1f7c8503d567639a11a2",
      "id": "2026-05-06",
      "created_at": 1778115429,
      "download_path": "/api/v4/feeds/proxies/export/2026-05-06"
    }
  ],
  "next_cursor": "eyJkIjoiMjAyNi0wNS0wNyIsImgiOjIxfQ.GECAibkM_hEUt0ixRMKzzQ"
}

GET/api/v4/feeds/{stream}/export/{date}

Download a snapshot

Returns a 307 redirect to a presigned R2 URL that is valid for 24 hours. The 307 (rather than 301) is intentional the URL is minted per request and expires, so intermediaries must not cache it. Follow the redirect to download the parquet file. Use the date string latest for the most recent hourly snapshot, or append a specific hour at /{date}/{hour} for a particular hourly within the current UTC day.

Path parameters

  • Name
    stream
    Type
    string
    Description

    One of the seven stream identifiers.

  • Name
    date
    Type
    string
    Description

    Either a YYYY-MM-DD UTC date for a daily rollup, or the literal string latest for the most recent hourly snapshot.

  • Name
    hour
    Type
    integer
    Description

    Optional. 023 UTC hour, addressed as /api/v4/feeds/{stream}/export/{date}/{hour}. Only the current UTC date is addressable for hourlies.

Responses

CodeMeaning
307Redirect to a 24-hour presigned download URL.
400Bad date format, hour out of range (0..23), or hour requested for a non-current UTC date.
401Missing or invalid API key.
403API key lacks the per-stream feed scope.
404No snapshot exists for the requested date/stream.

Request

GET/api/v4/feeds/{stream}/export/{date}
# Follow the 307 with -L and write to a file
curl -L -o proxies-2026-05-06.parquet \
  https://api.synthient.com/api/v4/feeds/proxies/export/2026-05-06 \
  -H "x-api-key: $API_KEY"

GET/api/v4/feeds/{stream}/export/{date}/meta

Snapshot metadata

Returns JSON metadata for a parquet snapshot SHA-256 checksum, byte size, row count, parquet schema with column names and types, and the canonical date.

Response

  • Name
    stream
    Type
    string
    Description

    The stream identifier the snapshot belongs to.

  • Name
    kind
    Type
    string
    Description

    Either "hourly" or "daily".

  • Name
    hour
    Type
    integer
    Description

    023 for hourly snapshots; omitted for daily rollups.

  • Name
    id
    Type
    string
    Description

    Stable identifier YYYY-MM-DD for daily rollups, YYYY-MM-DD/HH for past hourlies, latest for the most recent hourly snapshot.

  • Name
    format
    Type
    string
    Description

    Always "parquet".

  • Name
    date
    Type
    integer
    Description

    Unix timestamp in seconds for the snapshot instant. Daily rollups use the day's midnight UTC; hourly snapshots use the hour mark.

  • Name
    created_at
    Type
    integer
    Description

    Unix timestamp in seconds when the snapshot was indexed server-side.

  • Name
    size
    Type
    integer
    Description

    File size in bytes.

  • Name
    rows
    Type
    integer
    Description

    Number of rows in the parquet file.

  • Name
    checksum
    Type
    string
    Description

    Hex-encoded SHA-256 of the parquet file bytes.

  • Name
    schema.fields
    Type
    array<object>
    Description

    One entry per column in the parquet footer.

  • Name
    schema.fields[].name
    Type
    string
    Description

    Column name.

  • Name
    schema.fields[].type
    Type
    string
    Description

    Column type one of string, int64, uint32, uint64, bool, bytes.

Request

GET/api/v4/feeds/{stream}/export/{date}/meta
curl -G https://api.synthient.com/api/v4/feeds/proxies/export/2026-05-02/meta \
  -H "x-api-key: $API_KEY"

Response

{
  "stream": "proxies",
  "kind": "hourly",
  "hour": 22,
  "id": "latest",
  "format": "parquet",
  "date": 1778191200,
  "created_at": 1778195206,
  "size": 620273951,
  "rows": 43141039,
  "checksum": "fd6c002ad6c6ae73344c2fdf1cb535a303d90edf9252358e0d30a44231649d36",
  "schema": {
    "fields": [
      { "name": "ip",           "type": "string" },
      { "name": "provider",     "type": "string" },
      { "name": "type",         "type": "string" },
      { "name": "timestamp",    "type": "int64" },
      { "name": "country_code", "type": "string" },
      { "name": "asn",          "type": "uint32" }
    ]
  }
}

Paginating snapshot listings

The list endpoint returns up to 500 snapshots per page, newest-first, with a next_cursor you pass back as cursor on the next call. When the response no longer carries a next_cursor, you've reached the end.

Page through every snapshot

import os, requests

URL = "https://api.synthient.com/api/v4/feeds/proxies/export"
HEADERS = {"x-api-key": os.environ["API_KEY"]}

cursor = None
while True:
    params = {"limit": 500, **({"cursor": cursor} if cursor else {})}
    page = requests.get(URL, headers=HEADERS, params=params, timeout=30).json()
    for snap in page["feeds"]:
        handle(snap)
    cursor = page.get("next_cursor")
    if not cursor:
        break

Response Codes

Status CodeDescription
200 - SuccessSnapshot list or metadata returned.
307 - RedirectFollow the Location header to a 24-hour presigned download URL.
400 - Bad RequestInvalid cursor, limit, or date format.
401 - UnauthorizedMissing or invalid API key.
403 - ForbiddenAPI key lacks the per-stream feed scope.
404 - Not FoundNo snapshot exists for the requested date/stream.
500 - Internal Server ErrorUnexpected server-side error; reach out to support if the issue persists.