-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Re - feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ⛔ Deployment terminated View logs |
supermemory-app | 0a8c5fa | Commit Preview URL Branch Preview URL |
Jan 21 2026, 04:19 AM |
How to use the Graphite Merge QueueAdd the label Main to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Merge activity
|
…ve) (#683) #### RE-RAISING Pipecat live speech PR ### Added native speech-to-speech model support ### Summary: - Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.) - Fix memory bloating - Replace memories each turn using XML tags instead of accumulating - Add temporal context - Show recency on search results ([2d ago], [15 Jan]) - New inject_mode param - auto (default), system, or user ### Docs update - Update the docs for native speech-2-speech models
245ea24 to
0a8c5fa
Compare
| """Utility functions for Supermemory Pipecat integration.""" | ||
|
|
||
| from typing import Dict, List | ||
| from datetime import datetime, timezone | ||
| from typing import Any, Dict, List, Union | ||
|
|
||
|
|
||
| def get_last_user_message(messages: List[Dict[str, str]]) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The get_last_user_message function doesn't handle multimodal message content (lists), which will cause memory retrieval to fail for those messages.
Severity: MEDIUM
Suggested Fix
Update get_last_user_message to handle cases where message['content'] is a list. Check if the content is a list and, if so, iterate through its parts to extract and join the text content into a single string, similar to the implementation in supermemory_openai/utils.py.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: packages/pipecat-sdk-python/src/supermemory_pipecat/utils.py#L1-L7
Potential issue: The function `get_last_user_message` in `supermemory_pipecat/utils.py`
assumes that the `content` of a message is always a string. However, Pipecat messages
can also contain a list for multimodal content, a scenario more likely in
speech-to-speech pipelines which this pull request supports. When a message with list
content is processed, the function will return a list instead of a string. This list is
then passed to `_retrieve_memories`, which expects a string query, causing the memory
retrieval API call to fail. The failure is caught and logged, but it silently breaks the
memory feature for any multimodal user inputs.
Did we get this right? 👍 / 👎 to inform future reviews.

RE-RAISING Pipecat live speech PR
Added native speech-to-speech model support
Summary:
Docs update