A short introduction to the architecture of a video call

Daily is built on WebRTC, which is an open, secure standard for streaming media in browsers. There is no single webRTC technology — webRTC is a collection of standards, protocols, and APIs that make up an open-source framework which allows for real-time communications and data transfers.

Simply put, webRTC means that developers can integrate real-time communication into any website or app. From a product perspective, this means that webRTC can turn video calling into a built-in feature of any product.

Daily's API makes this as fast and flexible as possible.

Rethinking the familiar: video rooms

The notion of rooms, which you’re already familiar with from video chat tools, is central to Daily as a video chat API. A room is a virtual space where users — who we'll call "participants" — meet and exchange audio and video data in real-time.

Abstract representation of a room as an array of minimalist-style windows chairs plants and lamps

A room can be reused for different "sessions" — i.e. video calls — over time. A room can only host one session at any one moment.

Track subscriptions

Video rooms are based on a publish/subscribe model. In the example of a one-on-one video call, participants publish audio and video tracks via their mic and camera. The two participants are subscribed to each other’s tracks, meaning they receive the media information that the other publishes.

Depending on the type of call architecture in place, participants may or may not need to be subscribed to every other participant’s tracks. Take the example of a conference broadcast with 100,000 people. This type of call only requires select audio and video tracks — those of the presenters, and maybe those of certain audience members asking questions. The remaining audience members do not need to publish tracks, and they do not need to be subscribed to one another's tracks.

For these situations, make use of Daily track subscriptions, dynamically subscribing to different participants' audio and video streams.

Note: track subscriptions are only available on calls over an SFU network connection (more on what that is below).

The architecture of a room: P2P vs. SFU calls

Using the Daily platform, calls made with fewer than three participants connect using a peer-to-peer (P2P) connection by default. In a P2P call, each user is directly connected with each other user. Depending on the call, P2P calls can be higher quality and allow for faster connections; they’re also encrypted end-to-end.

Kwin, our CEO and lead dev, talks in depth about the advantages of P2P in this blog post. Kwin’s blog post is also great to share with teammates who’d like more information on P2P calls and SFU mode.

While they can offer superior call quality, P2P calls are hard to scale. The upstream bandwidth used by a P2P call scales as n-1, where n is the number of participants. For this reason, when more than four users are connected on a call using the Daily platform, the call will switch to a cloud connection using a Selective Forwarding Unit — an SFU.

An SFU is a media server that processes, re-encrypts, and routes media tracks to the correct destinations. Participants in this type of call publish and receive tracks to and from an SFU, minimizing the number of tracks sent and received.

Abstract representation of P2P vs. SFU connections, where in P2P circles are connected by arrows and in SFU circles point arrows to a server icon

A call routed through an SFU can use as little as 200 kbps upstream connection no matter how many people are on the call. Contrast that to the 200 kbps minimum per connection used by a P2P call.

While routing calls through an SFU minimizes the burden on the receiver's bandwidth and CPU, it also has notable downsides, which is why Daily defaults to a P2P connection when available.

One of the advantages of using Daily is that, unlike other platforms, we switch from P2P to SFU during a call as participants join and leave the meeting. You don't have to designate in advance whether a call is P2P or SFU – once the meeting gets to three participants, we automatically switch from a P2P connection to an SFU. If people leave the call, the connection will, by default, return to a P2P connection once the number of participants is fewer than five.

Daily's platform is unique in this regard — many other platforms require developers to decide ahead of time whether a room has a P2P or SFU connection and do not offer this mid-call switching.

Call constraints and cameras

Video calls are constrained by two main factors:

  • Latency — and this is especially true for mobile clients.
  • Bandwidth – video calls demand even more bandwidth than watching streamed video. Streamed media can be buffered, whereas video calls happen in real-time.

One of the advantages of using Daily is that you don’t have to worry too much about this. Daily works in the background with the underlying network to probe bandwidth and the continuously-changing parameters of audio and video streams. Daily then dynamically adjusts processing to account for network conditions, before delivering media tracks to subscribed participants, in real time.

However, when you begin building, it’s important to keep these two factors in mind (for this reason, you might see it repeated in a few different places in these docs).

The most significant impact on call quality — on the amount of bandwidth a call requires, and the latency of track delivery — is the number of cameras on during the call. While connecting to an SFU stabilizes a user's upstream connection, the strain on a user's downstream connection depends on the number of cameras on during that call:

  • A Daily client needs about 75 kb/s per participant downstream for each participant that is sending video (and only 200kb/s upstream in total, no matter how many participants are in the meeting)
  • So in a 10-participant meeting, you need about 200kb/s upstream and about 750kb/s downstream.
  • While newer, more powerful laptops can easily handle 30 simultaneous video streams, users with older devices will experience poor call quality after around 12 video streams.

All of this also takes a toll on a call participant’s CPU as well! For all these reasons, users on older machines or mobile clients will begin to experience poorer call quality much sooner than users on newer devices.

As a developer, the way you set up video rooms maps determines how Daily accounts for these factors. As you make these choices, keep in mind what type of users you’re serving — we offer different build modes for older devices. If you’re unsure about your audience, it’s best to be conservative!

To summarize

  • P2P calls have less latency and offer higher call quality for 1:1 or small (<5 person) group calls.
  • Connecting users to an SFU lightens the load of users' CPU, bandwidth, and battery.
  • One of the advantages of Daily is its flexibility: unlike other platforms, users switch seamlessly between P2P and SFU connections.
  • While Daily works to optimize track delivery, video calls demand a certain amount of bandwidth and CPU to function. As a developer, your choices influence how much bandwidth and CPU the call demands, which ultimately determines the call quality of the end user.