Introduction to spatialization
In this guide, we will present some considerations you might face when building spatialization into your video or audio call application with Daily. This guide covers high-level considerations and the most relevant parts of Daily and other web APIs which you'll likely use during development.
In the context of this guide, spatialization will refer to manipulating the behavior of a video or audio call in relation to some measure of participants' proximity to each other. This can include toggling video and audio based on participant proximity as well as applying more advanced effects. For example, you can fade video and audio based on distance, or pan audio depending on which direction the speaker is coming from.
The first thing you will want to consider is how you want to define your world space and what your navigation should look like. There are many ways to approach this. We will focus on two examples:
- A "gamified", traversable virtual world
- Click-based participant grouping
If your design calls for users to walk around a gamified virtual space using their arrow or WASD keys, you'll want to consider how participants will send updated coordinates to each other.
If you'd like more direct control or need to store additional persistent data for your world, you may also consider running a server with your own websocket.
Having gamified navigation in a virtual world can result in more load to handle between clients as you send coordinates back and forth. It also isn't the only way to make use of proximity features in a video or audio call application.
Another option can be introducing click-based navigation and grouping in your application.
One example that represents proximity grouping is letting users click on a DOM element to join a group. Users are considered to be in close proximity by nature of being assigned the same group ID.
For a code example of proximity grouping, our spatialization demo uses both approaches within the same application. User proximity is calculated based on their distance from each other while they are traversing the world. When they join a dedicated desk zone, they are grouped into proximity by zone ID.
Another aspect to consider is whether you will be using a single room or have users join different rooms as they traverse the space or join different groups. There are pros and cons to both approaches, and many applications will likely lean toward a mix of both.
One such scenario might involve users landing in a common "lobby" room. As they move to different parts of the world or start up conversations with smaller groups, those participants can be redirected to dedicated rooms. New Daily rooms can either be created at runtime or set up in advance via the Daily dashboard and linked from your world.
You might want to design a persistent office space in which there are some pre-made meeting rooms for users to join. At the same time, you may want users to be able to jump into a brand new call on the fly after meeting up in the lobby. In this case, you can use the Daily
/rooms REST endpoint to generate a room for them.
In some cases, you can use the Daily Prebuilt in your proximity video or audio app. If you have users navigating in a space outside of a Daily call and then have them form a group to jump into a separate room, you can simply load up Daily Prebuilt for them on room join.
But for the more advanced and immersive spatialization applications (such as fading user tracks in and out as they navigate in a world), consider using the Daily call object. Call object mode provides more flexibility in how you visualize and control the Daily call.
Once you've decided what "proximity" means in your context, you will want to toggle video and audio based on how close participants are to each other.
If you've gone with the gamified world approach where users have coordinates in a virtual space, you can use Euclidean distance to calculate their proximity.
Alternatively, if you're taking a group-based-proximity approach, you can simply decide that everyone in a matching group ID is within close enough proximity to see and hear each other.
Depending on how many people you expect to be in a single room, for performance reasons you likely want to limit which other participant video and audio tracks they are subscribed to. There can be little point in subscribing to the tracks of a user who is on the other side of the world - until they come closer or join the same proximity group. This is where Daily track subscriptions come in.
To manage track subscriptions in a single room, take a look at the following parts of the Daily API:
DailyCallconfiguration property. This is
trueby default, but for a proximity-based application it often makes sense to disable automatic subscription.
updateParticipant()track subscriptions, which is how you would subscribe to another user's video, audio, screen video, and screen audio tracks on demand. You can read more about track states in the track subscription section of our large meeting guide.
If you're using a breakout room approach, you will want to check out the following parts of the Daily API:
- /rooms POST request to create a new room.
- /rooms/:name DELETE request to delete a room.
daily-jsmethod to join a room.
daily-jsmethod to leave a room (in case users start out in a lobby room and then hop to a new room).
If the design of your world calls for it, it's possible to define the constraints with which video tracks are sent to call participants.
sendSettings, a call object property allowing you to customize your simulcast layers.
userMediaVideoConstraintscall object property, which allows you to set track constraints for the sender. Note that this sets constraints for the captured track itself and in turn affects the highest simulcast layer settings. This means you are setting the maximum possible resolution/frame rate that other participants can receive. Therefore, if using lower constraints along with multiple simulcast layers, receivers may experience undesirably low video quality if/when they drop to those lower layers. This is why you may want to customize your layers accordingly, by using the
sendSettingsproperty mentioned above.
Depending on the size of your world space and your expected participant load, you might also want to eventually consider additional optimizations, such as breaking up the world into smaller "zones" and only processing position data within participants who are in the same zone.
Please don't hesitate to contact us if you would like some assistance in optimizing your spatialized application.
In addition to enabling and disabling video and audio tracks based on proximity, you can also manipulate audio effects based on where the listener is in relation to each speaker.
One example of such audio manipulation could be panning. If the speaker is positioned on the local participant's right hand side, you can have their audio favor the local participant's right speaker. You can also vary the volume of the audio based on the speaker's distance from the listener.
If using a
PannerNode, you will want to configure your
PannerNode position is set relative to the listener.
If using a
StereoPannerNode, configuring a listener should not be required as they are not spatialized nodes. These nodes allow you to control the volume and pan of the output directly, regardless of position in space. Of course, you can determine the gain and pan values themselves based on participants' positions in your world.
Depending on your target environment, you might encounter some Chromium issues when working with the Web Audio API. One especially relevant example is the lack of acoustic echo cancellation for remote streams, which is can require you to implement an
RTCPeerConnection loopback approach. Unfortunately, not all nodes seem to be compatible with this workaround.
As you can see, there are many considerations and decisions to make related to building an app with spatialization. In this guide, we went through the main concepts you'll want to look into when creating a spatial video/audio application with Daily. We also pointed out the most relevant parts of the Daily and Web Audio APIs, proximity definition options, and some design considerations to keep in mind.