Daily's toolkits for AI-powered workflows
Daily offers two different ways to build real-time voice and video interaction with AI:
- Pipecat: An open source framework for building voice and multmodal conversational agents. It supports several different transport layers, including Daily WebRTC as well as audio over WebSockets.
- Daily Python SDK: Our lower-level SDK for connecting to Daily calls from Python apps.
- Batch Processor: APIs for post-processing call media to generate transcripts, summaries, and clinical notes.
Pipecat: The easiest way to build real-time interactive AI
Pipecat is an open-source framework for building voice (and multimodal) conversational agents. It supports use cases where an LLM talks in real time (whether with humans or another LLM!). Developers are building customer support bots, healthcare workflows like patient intake and scheduling, meeting assistants, social companions, and more.
It's the easiest way to get started using lots of the best and most popular AI services, such as OpenAI and Google Gemini, Eleven Labs and Cartesia for speech synthesis, Fal.ai for image models, and more.
Pipecat represents many of the best practices we've found as we've built real-time AI apps alongside a variety of customers. If you're looking for the quickest way to get started building Daily-powered AI chatbots, we recommend starting with Pipecat.
Daily-python: Flexibility for all kinds of "bot participants"
daily-python
is our Client SDK for Python. It powers the Daily transport layer in Pipecat, and it enables you to build video and audio calling functionality into your Python desktop and server applications.
This SDK is well suited to build AI applications on the server side as it can be easily integrated with well-known Python libraries such as OpenAI, Deepgram, YOLOv5, PyTorch, OpenCV and much more.
The SDK's core features include:
- Joining a Daily meeting as a participant
- As a meeting participant, configuring inputs, publishing, and subscription settings
- Receiving video frames from other participants
- Receiving raw audio from any participants in the meeting or mixed from all participants
- Sending video into a meeting
- Sending raw audio into a meeting
This functionality can be applied to several AI use cases, including:
- Performing object or face detection on the server side
- Processing audio from a Speech-To-Text platform into a meeting
- Sending audio from a Text-To-Speech platform into a meeting
- Sending video and audio tracks to a content moderation platform
- Using generative AI to inject video content into a meeting
Batch Processor: Post-processing APIs
The Batch Processor is an API that performs post-processing jobs on your call media.
It accepts a recording ID from a Daily meeting or a URL to a video/audio file (e.g. mp4, mp3) and can produce a transcript of the audio, summary of the transcript, or a SOAP note generated from the audio.
Use our Batch Processor to generate transcripts, summaries, and clinical notes for call media.
Read more about the Batch Processor API here.
Reference documentation
You can find API reference documentation, installation steps, and links to demos for Daily's AI libraries at:
Getting started
Pipecat guides and demos
To get started quickly, follow this example to run a voice agent locally on your computer.
For more demos and examples, check out Our example projects.
Daily-python guide
This guide will provide an overview of the functionality supported by Daily's Python SDK as well as examples of how to use methods. For details on these methods and events, visit the Python reference documentation.
Initializing
The first thing we need to do before using the SDK is to initialize the Daily
context.
See daily.Daily.init
for more details.
Creating a call client
Most of the functionality of the SDK lies in the daily.CallClient
class. A call client is used to join a meeting, handle meeting events, sending/receiving audio and video, etc.
In order to create a client (after the SDK is initialized) we can simply do:
See daily.CallClient
for more details.
Releasing a call client
Once the call client is not needed anymore (e.g. after leaving the meeting) we want to release any remaining internal resources explicitly as follows:
See daily.CallClient.release
for more details.
Joining a meeting
The next step is to join a Daily meeting using a Daily meeting URL:
You might also need to pass a meeting token, for example, to join a private room, or if you are the meeting owner. Meeting tokens
provide access to private rooms, and can pass some user-specific properties into the room.
See daily.CallClient.join
for more details.
Leaving a meeting
It is important to leave the meeting in order to cleanup resources (e.g. network connections).
See daily.CallClient.leave
for more details.
Setting the user name
It is also possible to change the user name of our client. The user name is what other participants might see as a description of you (e.g. Jane Doe).
See daily.CallClient.set_user_name
for more details.
Setting client permissions
A meeting owner has the ability to set permissions for other participants in a meeting. Owners can control whether:
- participants have presence in a meeting (e.g. whether they appear in a participant list for other participants)
- can send media (e.g. can send camera or microphone tracks)
- can control admin tasks for other participants (e.g. can manage other participants in a call)
Permissions can be either set dynamically via an owner or admin from within a meeting or configured in advance of the meeting with a meeting token.
A common use case for a Daily Python client is to export media from a call. In this case, you want the client to join the meeting as a hidden participant. To accomplish that, you can create a meeting token using the meeting-tokens
REST API endpoint, where the permissions
object is set with the hasPresence
property as false
. In POSTing to the endpoint, you'll receive a JWT in response. That JWT can be included in the join()
call to initialize the client as a hidden participant.
Completion callbacks
Some daily.CallClient
methods are asynchronous. In order to know when those methods finish successfully or with an error, it's possible to optionally register a callback at invocation time.
For example, below we will register a callback to know when a meeting join succeeds.
Handling events
During a meeting (or even before) events can be generated, for example when a participant joins or leaves a meeting, when a participant changes their tracks or when an app message is received.
To subscribe to events we need to subclass daily.EventHandler
. This can be done by the main application class (if there's one) or by simply creating a new class.
We can then implement any of the event handlers defined by daily.EventHandler
that we are interested in. For example, we could handle the event when a participant joins by using daily.EventHandler.on_participant_joined
:
Finally, we need to register the event handler when creating a daily.CallClient
. For example:
In this last example, we can see how there is a circular dependency created between MyApp
and the newly created client. Therefore, to make sure things get properly cleanup it is important to make sure the call client is released (by calling daily.CallClient.release
) as we saw in a previous section.
Inputs and publishing settings
Inputs and publishing settings specify if media can be sent and how it has to be sent but, even if they are related, they are different.
Inputs deal with video and audio devices. With inputs we can update the desired resolution of a camera or if the camera should be enabled or not. We can also select our desired microphone.
With publishing settings we can specify if the video from the input camera is being sent or not, and also the quality (e.g. bitrate) of the video we are sending. Note however, that a camera can be enabled via inputs but it not be published (i.e. sent).
See daily.CallClient.inputs
and daily.CallClient.publishing
for more details.
Subscriptions and subscription profiles
It is possible to receive both audio and video from all the participants or for individual participants. This is done via the subscriptions and subscription profiles functionality.
A subscription defines how we want to receive media. For example, at which quality do we want to receive video.
A subscription profile gives a set of subscription media settings a name. There is a predefined base
subscription profile, which subscribes to all remote participants' camera and microphone streams. Subscriptions profiles can be assigned to participants and can be even updated for a specific participant.
Updating subscription profiles
We can update the predefined base
profile to subscribe to only microphone streams:
Unless otherwise specified (i.e. for each participant), this will apply to all participants.
A more complicated example would be to define two profiles: lower
and higher
. The lower
profile can be used to receive the lowest video quality and the higher
to receive the maximum video quality:
These profiles can then be assigned to particular participants. For example, the participants that are shown as thumbnails can use the lower
profile and the active speaker can use the higher
profile.
See daily.CallClient.update_subscription_profiles
for more details.
Assigning subscription profiles to participants
Now that we have seen how subscription profiles work. Let's see how we can assign a subscription profile to a participant:
In the example above we have updated the base
profile by unsubscribing from both camera and microphone. Then, we have assigned the base
profile to participant eb762a39-1850-410e-9b31-92d7b21d515c
and subscribed to the camera stream only for that participant.
See daily.CallClient.update_subscriptions
for more details.
Video and audio virtual devices
A call client can specify virtual video and audio devices which can then be used as simulated cameras, speakers or microphones.
Cameras
Cameras are used to send video into the meeting. A camera is a live stream, so it needs to generate images at a certain framerate.
To start, we need to create a virtual camera with a certain width, height, and an optional color format (frames written to the camera should then be in this color format):
Once the camera is selected, we need to choose it as our default camera input. This is done through the call client input settings:
Finally, we can just write frames to the camera which are then sent as the call client video stream. In the following example, we load a PNG file (using the Pillow library) in RGB format and we send it 30 times per second.
See daily.Daily.create_camera_device
and daily.CallClient.update_inputs
for more details.
Speakers and microphones
We can create speaker and microphone devices. Speakers are used to receive audio from the meeting and microphones are used to send data to the meeting. Currently, the audio from all the participants will be received mixed into a speaker device.
In the following example we will create a new speaker device:
and we will set it as our default speaker:
After selecting the speaker device we will be able to receive audio from the meeting by reading audio frames from the device.
Microphones are created in a similar way:
but they are updated differently via the call client input settings:
Once a microphone has been selected as an audio input (and we have joined a meeting) we can send audio by writing audio frames to it. Those audio frames will be sent as the call client participant audio.
See daily.Daily.create_speaker_device
, daily.Daily.create_microphone_device
, daily.Daily.select_speaker_device
, and daily.CallClient.update_inputs
for more details.
Multiple microphone devices
Multiple microphones can be created, but only one can be active at the same time. With a single call client this is easy to achieve, since we can simply set it as the call client audio input as we saw before:
However, if multiple microphones are created and different call clients select different microphones (all in the same application), we will certainly get undesired behavior. For this particular use case, the recommended solution is to create multiple processes.
Sending and receiving raw media
It is possible to receive video from a participant or send audio to the meeting. In the following sections we will see how we can send and receive raw media.
Receiving video from a participant
Once we have created a call client we can register a callback to be called each time a video frame is received from a specific participant.
where on_video_frame
must be a function or a class method such as:
and where video_frame
is a daily.VideoFrame
.
See daily.CallClient.set_video_renderer
for more details.
Receiving audio
Audio can be received from an individual participant or from all meeting participant in a single mixed track.
Receiving audio from a participant
First, let's look at how to receive audio from a participant. Once we have created a call client, we can register a callback to be called each time audio data is received from a specific participant.
where on_audio_data
must be a function or a class method such as:
and where audio_data
is a daily.AudioData
.
See daily.CallClient.set_audio_renderer
for more details.
Receiving audio from a meeting
Next, let's look at how to receive audio from the entire meeting. In order to receive audio from the meeting, we need to create a speaker device. To create a virtual speaker device, we need to initialize the SDK as follows:
Then, we can create the device:
and we need to select it before using it:
Finally, after having joined a meeting, we can read audio frames from the speaker (e.g. every 10ms):
The audio format is 16-bit linear PCM.
See daily.VirtualSpeakerDevice.read_frames
for more details.
Sending audio to a meeting
To send audio into a meeting we need to create a microphone device and initialize the SDK as before:
Then, create the microphone device:
The next step is to tell our client that we will be using our new microphone device as the audio input:
Finally, after joining a meeting, we can write audio frames to the microphone device:
The audio format is 16-bit linear PCM.
See daily.VirtualMicrophoneDevice.write_frames
for more details.
Transcribing a meeting
Room owners or transcription admins (those with the 'transcription'
value in their canAdmin
permission) can start transcription services.
You can start the service by calling start_transcription()
on your call client.
Optionally, you can pass configuration options:
Name | Type | Description |
---|---|---|
language | str | See Deepgram's documentation for language |
model | str | See Deepgram's documentation for model |
tier | str | This field is deprecated, use model instead |
profanity_filter | bool | See Deepgram's documentation for profanity_filter |
redact | bool or list | See Deepgram's documentation for redact |
extra | dict | Specify additional parameters. See Deepgram's documentation for available streaming options |
includeRawResponse | boolean | Whether Deepgram's raw response should be included in all transcription messages |
Transcription message data is passed from the transcription service to the Daily call client via the 'on_transcription_message'
event. By listening to 'on_transcription_message'
events emitted from the transcription service, text can be received and handled by your application.
You can stop transcribing a meeting by simply calling stop_transcription
:
Clients can listen for 'on_transcripton_started'
and 'on_transcription_stopped'
events for when a transcription service is running during a call.
Attach a credit card to your Daily account to start using this feature.
Recording a meeting
If recording is enabled for the room, you can start a recording which captures all meeting participants that have their cameras and/or microphones on. Devices that are off will not be recorded. For example, if a participant's camera is off, it will not be recorded. The same is true when a microphone is off.
This API call has no effect if recording is not enabled for the corresponding room.
Multiple recording sessions (up to max_streaming_instances_per_room
on your Daily domain) can be started by specifying a unique stream_id
, which should be a valid UUID string. Each instance can have a different layout, participants, lifetime, and update rules.
Contact us to configure max_streaming_instances_per_room
for your domain.
For more details on controlling the recording layouts, check out our reference docs.
Sending messages to Daily Prebuilt
When users are meeting with Daily Prebuilt, your daily-python
client can send messages to Daily Prebuilt's chat using the send_prebuilt_chat_message
method. The sent messages must contain the message itself, an optional user name, and an optional completion callback.
Batch processor guide
To get started quickly with the Batch processor, here are the things you need to know:
- Submit a processor job by POSTing to the
/batch-processor
endpoint. In your POST message, include the preset and other input parameters. - Listen for batch-processor events to see if your job finished or encountered an error.
- Get the output from your job.
For more information refer to Batch processor and Webhooks reference docs.