Skip to content

Unpublish track failure causes deadlock #565

@timothyaufdermauer

Description

@timothyaufdermauer

In participant.py unpublish_track:

        try:
            resp = FfiClient.instance.request(req)
            cb: proto_ffi.FfiEvent = await queue.wait_for(
                lambda e: e.unpublish_track.async_id == resp.unpublish_track.async_id
            )

            if cb.unpublish_track.error:
                raise UnpublishTrackError(cb.unpublish_track.error)

            publication = self._track_publications.pop(track_sid)
            publication._track = None
            queue.task_done()
        finally:
            self._room_queue.unsubscribe(queue)

if raise UnpublishTrackError happens, queue.task_done() is never called.

if room.py Room._listen_task has already called await self._room_queue.join() before the unpublish fails, it will have a reference to the unpublish queue that can never be done and block indefinitely.

Take the above root cause with a grain of salt, Cursor helped me find the "why" and may or may not be correct. My actual problem was that if I loop through my agent's tracks and unpublished all of them before calling BackgroundAudioPlayer.aclose, the room callbacks would stop processing. That meant that when the caller hung up, the AgentSession wouldn't auto close when I have RoomInputOptions.close_on_disconnect = True.

To reproduce, put this in an agent entrypoint:

    @ctx.room.on("participant_disconnected")
    def _on_participant_disconnected(event):
        print("=============================disconnected=============================")

    hold_music_player = BackgroundAudioPlayer()
    await hold_music_player.start(room=ctx.room, agent_session=session)
    track_pubs = list(ctx.room.local_participant.track_publications.values())
    for track_pub in track_pubs:
        if track_pub.track:
            track_sid = track_pub.track.sid
            await ctx.room.local_participant.unpublish_track(track_sid)
    await hold_music_player.aclose()

The disconnect print statement never runs when you hang up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions