GET /v1/podcasts/episodes/search finds dialogue inside podcast episodes. Each result is a segment of an episode, returned with bounded transcript windows centered on the highest-relevance dialogue lines (flagged is_match: true) and any highlight clips that overlap the segment.
Available to MCP agents as
particle_podcast_search_transcripts.When to use Episode search vs Mentions
The Particle podcast surface ships two complementary dialogue-search endpoints. Pick by what you’re really asking.| You want… | Use this |
|---|---|
| Dialogue that means something — paraphrase tolerant | /v1/podcasts/episodes/search?semantic_search=… |
| Dialogue containing exact tokens or phrases (BM25) | /v1/podcasts/episodes/search?keyword_search=… |
| Both: rank by an idea while boosting — or, quoted, requiring — exact terms | /v1/podcasts/episodes/search?semantic_search=…&keyword_search=… |
| Every line where a person or company is mentioned | /v1/podcasts/mentions?entity_id=… |
| Conceptual search scoped to a person/company | /v1/podcasts/episodes/search?semantic_search=…&entity_id=… |
semantic_search — search by meaning
Vector search is the right tool when the surface words in dialogue might not match the surface words in your query. Two speakers can discuss the same idea using totally different vocabulary, and a lexical engine misses both. Express your query the way you’d describe the topic to a colleague — full sentences are welcome.
semantic_search is not good at:
| Don’t ask it… | Use this instead |
|---|---|
| ”Every line about Sam Altman” — that’s an entity question, not a content question. | /v1/podcasts/mentions?entity_id=sam-altman |
| ”Podcasts where Sam Altman is a guest” — structural metadata about an episode, not its dialogue. | /v1/podcasts/mentions?entity_id=sam-altman&role=guest |
"NVDA H100" — when a specific token (ticker, model number) must appear verbatim, BM25 is more reliable than vector similarity. | keyword_search=NVDA H100 |
"AGI AND not safety" — boolean logic isn’t supported. Express the underlying intent in natural language; add a quoted keyword_search phrase if a literal term must appear in every result. | semantic_search=…&keyword_search="…" |
keyword_search — search by exact tokens (BM25)
Use this when the exact form of a token matters: company tickers, drug names, model numbers, hashtags. Tokens are matched after the same normalization the index applies (lowercased, English tokenizer, no stemming); punctuation splits tokens, and single-letter tokens without a digit are dropped, so spell terms out rather than abbreviating to one character. Multi-word queries are matched as a bag of tokens, ranked by BM25.
To require an exact ordered phrase rather than independent tokens, wrap it in double quotes — keyword_search="machine learning". Multiple quoted phrases must all appear ("central bank" "interest rates"), and unquoted tokens alongside them still contribute to the BM25 ranking.
Hybrid: semantic_search + keyword_search
When you want to rank by an idea and a specific term at once, send both. Each leg runs independently — vector similarity and BM25 — and the two result sets are fused via reciprocal rank fusion. The fusion is a union, not an intersection: a segment that scores strongly on either leg can surface, so an unquoted keyword is a ranking boost, not a guarantee that it appears in every result. To require a term or phrase, wrap it in double quotes — quoted phrases are hard filters applied to both legs, so every result contains them no matter which leg it came from.
Scoping a ranked search to an entity
entity_id and company_id here are filters — they narrow ranked candidates to episodes featuring the resolved entity. The ranking still comes from semantic_search / keyword_search. To read every line about an entity, use Mentions instead.
Response
windows is bounded — never the whole segment. Each window centers on one or more high-relevance lines (flagged is_match: true) padded with surrounding context. A single segment can produce multiple non-overlapping windows when the top-scored lines are far apart inside it. When the line-scoring path can’t pinpoint a match (e.g., a degraded embedding service or no individual line scored above zero), the window falls back to the segment’s opening lines and is flagged is_preview: true.
match.source is semantic, keyword, or hybrid — branch on it when rendering. clips is omitted when no highlight clip overlaps the segment. The page-level entity block appears when an entity_id or company_id filter was provided and resolved successfully — for company_id it is the company’s linked entity — and a company block additionally appears alongside it when the filter was a company_id. A reference that can’t be resolved at all returns an empty data array with both blocks omitted. A company_id that resolves to a company with no linked entity also returns empty data, but still echoes the company block (without entity) so you can render what you matched.
Filters
| Param | Notes |
|---|---|
podcast_id | Slug, ID, or numeric iTunes ID. |
episode_id | Restrict to a single episode. |
entity_id / company_id | Filter (not the primary query). For “every line about X” use Mentions. |
entity_type | Entity category slug (e.g. company, school, book) — narrows to episodes that mention any entity of that category. Ignored when entity_id or company_id resolves a specific entity (those are strictly narrower). When entity_type is doing the filtering — no entity_id/company_id — it cannot be combined with role: speakers are always people, so a role + category combination never matches. Slugs come from GET /v1/entities/types. |
role | guest, host, panelist, correspondent, mention. Requires entity_id or company_id. |
type | Segment type filter (e.g. INTERVIEW). |
since / until | Episode published_at window. ISO date or date-time. A bare date is interpreted as midnight UTC, so until=2024-06-01 excludes episodes published later that day — pass the next day (until=2024-06-02) or an explicit date-time for an inclusive end of day. |
sort | relevance (default) or recency. |
context | Lines of surrounding dialogue around each matched line (1–15, default 1). Widens each transcript window in place — ask for more context instead of fetching the full transcript. |
Pagination
Standardlimit (1–100, default 25) + opaque cursor. Pass the cursor from the previous response back as ?cursor=… to fetch the next page. Cursors are opaque — don’t parse them.
Related
- Podcast search — find a podcast by name, not dialogue inside episodes.
- Mentions — every line where a person or company is mentioned, episode-grouped.
- Transcripts → mentions in one episode — every entity mentioned in a single episode.
- Episodes — episode-level recall when you don’t need dialogue.