Episode stream (firehose)

The episode stream is a long-lived Server-Sent Events (SSE) connection that pushes episodes to you the moment they reach a chosen stage of ingestion — no polling. Open one connection, pick the level of hydration you care about, optionally filter to the podcasts you follow, and receive each new episode as it crosses that point.

The episode stream is an Enterprise feature. Access requires an API key belonging to an organization on the Enterprise plan. Authenticate exactly as you do elsewhere — the X-API-Key header (recommended) or an Authorization: Bearer token.

GET  https://api.particle.pro/v1/podcasts/episodes/stream
POST https://api.particle.pro/v1/podcasts/episodes/stream

Use GET for no filter, a popularity threshold, or a small explicit set of podcasts. Use POST (with a JSON body) when you want to filter on a large podcast_ids set — a long list would exceed query-string limits on a GET. The two forms are otherwise identical.

Pick a milestone

Episodes move through ingestion in stages. You subscribe to exactly one milestone and receive each episode once, when it reaches that stage. The milestones are strictly ordered — each builds on the previous — so picking a later milestone means you wait longer but the episode arrives with more data already populated.

`milestone`	Delivered when…	What’s populated on the episode
`discovered`	the episode first appears in the feed	Title, URL, publish date, podcast, basic metadata. No transcript/segments/clips yet.
`transcribed` (default)	speech-to-text + speaker identification finish	Full diarized transcript and identified speakers.
`segmented`	the transcript is broken into structural segments	Segments (intros, ad reads, topic blocks).
`fully_ingested`	ingestion is complete	Clips and the full enrichment set. This is the terminal contract: anything added to “fully ingested” in future automatically flows to subscribers of this milestone.

If you don’t pass milestone, you get transcribed. Choose the single milestone that matches the data you need — a later one implies all earlier stages already happened. Expect real latency between stages: transcription and enrichment take minutes to hours.

Filter the podcasts

By default the stream delivers every episode in the catalog. Narrow it two ways, which combine as a union (an episode is delivered if it matches either):

podcast_ids — an explicit set of podcasts, each given as a slug (pivot) or ID. An episode is delivered if its podcast is in the set. Unknown values are ignored, so a single bad slug won’t break the stream — but if none of the supplied ids match a known podcast, the request fails immediately with an error event rather than leaving you waiting on a stream that can never produce anything.
popularity_threshold — a number in (0, 1). Podcast popularity is normalized 0–1 across the catalog (a percentile), so 0.9 ≈ the top 10% most popular podcasts. Use this to follow “the popular stuff” without enumerating ids.

Pass a large podcast_ids set via the POST body (see below). On GET, podcast_ids is capped at 100; beyond that you’ll get an error event telling you to use POST.

Parameters

milestone, cursor, since, and include are always query parameters. podcast_ids and popularity_threshold are query parameters on GET and JSON body fields on POST.

Parameter	Description
`milestone`	One of `discovered`, `transcribed`, `segmented`, `fully_ingested`. Defaults to `transcribed`.
`podcast_ids`	Slugs or IDs to filter to (union with `popularity_threshold`). GET: comma-separated, ≤100. POST: JSON array, ≤1000.
`popularity_threshold`	Number in `(0, 1)`. Deliver only podcasts at or above this popularity percentile.
`cursor`	Opaque resume token from a previously received event. See Resuming.
`since`	ISO 8601 date or date-time to backfill from when you have no `cursor`. Ignored if `cursor` is set.
`include`	Heavy relations to embed in each episode (comma-separated): `transcript`, `segments`, `clips`, or `all`. Omitted by default. See Hydrate the payload.

Open the stream

A simple GET — all transcribed episodes, live:

# -N disables curl's output buffering so events print as they arrive.
curl -N "https://api.particle.pro/v1/podcasts/episodes/stream?milestone=transcribed" \
  -H "X-API-Key: $PARTICLE_API_KEY"

with httpx.stream(
    "GET",
    "https://api.particle.pro/v1/podcasts/episodes/stream",
    params={"milestone": "transcribed"},
    headers={"X-API-Key": os.environ["PARTICLE_API_KEY"]},
    timeout=None,
) as res:
    for line in res.iter_lines():
        ...  # parse SSE events

Filtered by popularity, or by a handful of shows (GET):

# Top ~10% most popular podcasts, at the segmented milestone:
curl -N "https://api.particle.pro/v1/podcasts/episodes/stream?milestone=segmented&popularity_threshold=0.9" \
  -H "X-API-Key: $PARTICLE_API_KEY"

# A few specific shows by slug:
curl -N "https://api.particle.pro/v1/podcasts/episodes/stream?podcast_ids=pivot,lex-fridman,all-in" \
  -H "X-API-Key: $PARTICLE_API_KEY"

A large explicit set (POST with a JSON body):

curl -N -X POST "https://api.particle.pro/v1/podcasts/episodes/stream?milestone=transcribed" \
  -H "X-API-Key: $PARTICLE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "podcast_ids": ["pivot", "lex-fridman", "QpMz7GYKfSNuUa6zKXA4Q", "... up to 1000 ..."] }'

With no cursor or since, the stream is live-only: you receive episodes that reach your milestone from the moment you connect forward.

Event format

Each message is an SSE event. There are two event types. event: episode — an episode reached your milestone. The data is a JSON envelope:

event: episode
data: {
  "milestone": "transcribed",
  "cursor": "g3Qk9m8...",          // opaque resume token — store this
  "episode": {
    "id": "78cgekLUjCJBUZbj3s5K8Y",
    "title": "WHCD Shooting Aftermath, Musk and Altman Face-Off…",
    "podcast": { "id": "QpMz7GYKfSNuUa6zKXA4Q", "title": "Pivot" },
    "published_at": "2026-04-28T10:00:00Z",
    "has_transcript": true
    // …same shape as a /v1/podcasts/episodes list entry
  }
}

The episode object is the same list-shaped representation returned by list episodes and the feed, hydrated to the level implied by your milestone (has_transcript, segment_count, etc. reflect the stage reached). For the full per-episode detail — topics, all entities, videos — fetch GET /v1/podcasts/episodes/{id}, or embed the heavy relations inline with include. event: error — a terminal error (e.g. a filter that matched no podcasts, too many ids for a GET, or an invalid cursor). The server sends one and closes the connection:

event: error
data: { "message": "no podcasts matched the provided filter (podcast_ids / popularity_threshold)" }

Hydrate the payload

By default each episode carries only its metadata, counts, and flags (has_transcript, segment_count, clip_count) — the heavy relations are not shipped, so a consumer that only needs to know an episode reached a milestone never pays for transcript bytes. To embed those relations directly — and avoid a follow-up request per delivered episode — pass include:

`include` value	Embeds	Available at milestone
`transcript`	`episode.transcript` — the dialogue transcript, identical to `GET /v1/podcasts/episodes/{id}/transcript?format=dialogue`	`transcribed`
`segments`	`episode.segments`	`segmented`
`clips`	`episode.clips`	`fully_ingested`
`all`	everything available at the chosen milestone	—

Combine values with commas: include=transcript,clips. A relation can only be embedded at a milestone that guarantees it. Each becomes available at the milestone above, and because milestones are ordered, you can only embed what your milestone has reached. Asking for clips at milestone=transcribed is a contradiction — you’d be woken before clips exist — and is rejected with a terminal error event. all is milestone-relative: it expands to exactly the relations your milestone guarantees, so it never conflicts (e.g. all at transcribed embeds just the transcript).

# Segments at the segmented milestone, embedded in each event:
curl -N "https://api.particle.pro/v1/podcasts/episodes/stream?milestone=segmented&include=segments" \
  -H "X-API-Key: $PARTICLE_API_KEY"

# Everything available at the terminal milestone:
curl -N "https://api.particle.pro/v1/podcasts/episodes/stream?milestone=fully_ingested&include=all" \
  -H "X-API-Key: $PARTICLE_API_KEY"

Word-level transcripts are paginated and can’t be embedded inline; fetch them from GET /v1/podcasts/episodes/{id}/transcript/words.

Manage the stream lifecycle

Programming against the stream is mostly about three things: store the cursor, dedupe on episode id, and reconnect.

Disconnections are normal — design for automatic reconnection from day one. A long-lived stream will be interrupted periodically, and most often it’s not your network: we ship frequently, and every deploy does a rolling restart of the serving pods, which closes all open streams. Idle proxies and load balancers also recycle long connections. This is routine operation, not an error and not data loss — the durable log + your cursor guarantee a gap-free resume.Treat “the connection ended” as an ordinary, expected event your client handles silently, not an exception to alert on. Build the reconnect-with-backoff loop in from your very first implementation (see A resilient consumer); a client that assumes one connection stays open indefinitely will break during the next deploy.

The cursor

Every episode event carries an opaque cursor. Treat it as a black box — don’t parse it. Persist the cursor of the last event you have fully processed. It’s your resume point.

Delivery is at-least-once

You may occasionally receive the same episode more than once — most commonly right after a reconnect. Dedupe on episode.id and make your processing idempotent. You will not silently miss episodes (see below), but you should expect the rare duplicate rather than assume exactly-once.

Resuming after a disconnect

Connections end — network blips, your deploys, our rolling restarts. To resume without gaps, reconnect and pass the last cursor you stored as ?cursor=:

GET /v1/podcasts/episodes/stream?milestone=transcribed&cursor=g3Qk9m8...

The stream first replays every episode after that cursor (catch-up), then transitions seamlessly to live. (On POST, send the same body and the updated ?cursor=.) If you’ve never connected before and want history, use since instead of cursor.

If your consumer falls too far behind to keep up, the server ends the connection deliberately. This is not data loss: reconnect from your last stored cursor and the catch-up replay fills the gap. The golden rule is simply always reconnect from your last processed cursor.

A resilient consumer

The pattern in any language: connect → on each episode event, dedupe and process, then store its cursor → on error or disconnect, back off and reconnect with the stored cursor. Use exponential backoff with jitter, capped at a ceiling (e.g. 1s → 30s), and reset the delay to its minimum after a connection stays up and delivers — so a routine deploy reconnects within a second or two, while a sustained outage doesn’t hammer the API.

JavaScript

let cursor = await loadSavedCursor(); // null on first run
const seen = new Set();
let backoff = 1000; // ms; grows on repeated failure, resets on success
const MAX_BACKOFF = 30000;

while (true) {
  const url = new URL("https://api.particle.pro/v1/podcasts/episodes/stream");
  url.searchParams.set("milestone", "transcribed");
  if (cursor) url.searchParams.set("cursor", cursor);

  try {
    const res = await fetch(url, {
      headers: { "X-API-Key": process.env.PARTICLE_API_KEY },
    });
    for await (const evt of parseSSE(res.body)) {
      if (evt.event === "error") break; // terminal; reconnect from `cursor`
      if (evt.event !== "episode") continue;

      const { episode, cursor: next } = JSON.parse(evt.data);
      if (!seen.has(episode.id)) {
        seen.add(episode.id);
        await handleEpisode(episode); // idempotent
      }
      cursor = next;          // advance only after successful processing
      await saveCursor(cursor);
      backoff = 1000;         // healthy connection — reset backoff
    }
  } catch (err) {
    // network error / stream closed (e.g. a deploy) — fall through to reconnect
  }
  // Exponential backoff with jitter, capped. The disconnect itself is expected;
  // this just avoids reconnect storms during a longer outage.
  const delay = Math.min(backoff, MAX_BACKOFF) * (0.5 + Math.random() / 2);
  await sleep(delay);
  backoff = Math.min(backoff * 2, MAX_BACKOFF);
}

parseSSE is any standard SSE line parser (split on blank lines; read event: and data: fields). Persisting cursor to durable storage lets you resume cleanly across process restarts, not just transient drops.

Stream vs. poll

Not on Enterprise, or prefer polling to a long-lived connection? The episode feed is the all-plans pull alternative — the same episodes, milestones, and filters, returned by a resumable GET you poll on your own schedule. Reach for the stream when you want push-based, low-latency delivery without managing a poll loop. For plain catalog browsing, list episodes (which also accepts fully_ingested=true) is simpler still.

Episodes — the same episode shape, by query or by ID
Transcripts — dialogue available once an episode reaches transcribed
Segments & clips — available at segmented and fully_ingested

Getting started

Podcasts

Companies

Knowledge graph

Alerts

Errors

Episode stream (firehose)

Pick a milestone

Filter the podcasts

Parameters

Open the stream

Event format

Hydrate the payload

Manage the stream lifecycle

The cursor

Delivery is at-least-once

Resuming after a disconnect

A resilient consumer

Stream vs. poll

​Pick a milestone

​Filter the podcasts

​Parameters

​Open the stream

​Event format

​Hydrate the payload

​Manage the stream lifecycle

​The cursor

​Delivery is at-least-once

​Resuming after a disconnect

​A resilient consumer

​Stream vs. poll

​Related

Pick a milestone

Filter the podcasts

Parameters

Open the stream

Event format

Hydrate the payload

Manage the stream lifecycle

The cursor

Delivery is at-least-once

Resuming after a disconnect

A resilient consumer

Stream vs. poll

Related