Video Call Architecture (Quick Intro)

Daily is built on WebRTC, the open standard that powers real-time audio and video in the browser. Understanding a few key concepts — what WebRTC does, how media moves between participants, and why the network architecture matters — will help you build better experiences with Daily.

WebRTC: real-time media in the browser

WebRTC is a collection of standards, protocols, and APIs built into every modern browser. It handles several hard problems for you:

No plugins or installs required. WebRTC is native to Chrome, Firefox, Safari, and Edge.
Encryption by default. All WebRTC media is encrypted in transit using DTLS and SRTP.
Adaptive to network conditions. WebRTC continuously probes available bandwidth and adjusts bitrates, frame rates, and resolutions to maintain the best possible quality.

But WebRTC is a low-level set of primitives, not a complete video calling solution. Building directly on raw WebRTC means solving a long list of additional problems yourself:

Signaling. WebRTC defines how to send media once a connection exists, but not how two peers find each other or negotiate that connection in the first place. You need to build and operate a signaling layer.
NAT traversal and TURN servers. Most devices are behind NATs or firewalls that block direct peer connections. WebRTC uses ICE, STUN, and TURN protocols to work around this — but you need to run TURN server infrastructure, which relays media when direct connections fail. Without it, calls fail for a significant portion of real-world users.
SFU infrastructure. Direct P2P connections don’t scale. To support calls with more than a couple of participants, you need to build, operate, and maintain SFU media servers — globally distributed if you want low latency for users in different regions.
Device and track management. Handling camera and microphone permissions, device switching mid-call, dealing with browser inconsistencies, managing track lifecycle across join/leave events — all of this falls on you.
Call quality. Detecting and adapting to degraded network conditions, implementing simulcast, managing receive-side quality — these require deep, ongoing work to get right across the range of real-world devices and networks.
Cross-browser and cross-platform differences. Every browser implements WebRTC and media APIs slightly differently, with different constraints, quirks, and bugs — and they keep changing. Safari handles audio contexts differently than Chrome. Mobile browsers impose their own restrictions. New browser releases routinely introduce regressions. Staying on top of this is a continuous maintenance burden.

Daily handles all of this. The SDK gives you a clean API for participants, tracks, permissions, and call events, backed by globally distributed infrastructure — so you can focus on building your product instead of the plumbing beneath it.

Rooms and participants

The core building block in Daily is a room — a virtual space where participants meet to exchange audio and video in real-time. Rooms are configurable: you can set privacy rules, recording preferences, permissions, and more via the REST API. A room persists over time and can host many sessions — individual calls — one at a time. When the last participant leaves, the session ends, but the room remains available for future calls.

The architecture of a room: P2P vs. SFU calls

How media actually travels between participants depends on the call’s network topology. Daily supports two models. In a peer-to-peer (P2P) call, each participant’s device connects directly to every other participant’s device. Media flows directly between peers with no central server involved. Because each participant must upload a separate stream to every other participant, upstream bandwidth scales as n-1 where n is the number of participants — making P2P hard to scale beyond very small calls. In a Selective Forwarding Unit (SFU) call, participants send their media to a central media server instead. The SFU processes, re-encrypts, and routes each track to the correct recipients. Because each participant uploads only once — to the SFU — a call routed this way can use as little as 200 kbps upstream regardless of how many people are on the call.

Abstract representation of P2P vs. SFU connections, where in P2P circles are connected by arrows and in SFU circles point arrows to a server icon

Daily uses what we call a mesh SFU: each participant connects to the Daily server geographically closest to them, and traffic is then routed between servers via fast backbone networks to reach servers near the other participants. This means a Daily SFU call can actually have lower latency than a P2P call when participants are in different locations — the backbone routing outperforms a direct peer connection over the public internet. For this reason, Daily defaults to SFU for all calls. The SFU model also offers:

More reliable connections
More control over send and receive settings (including track subscriptions)

P2P is still available and may be preferable for 1:1 calls where both participants are known to be geographically close and end-to-end encryption is required. To enforce P2P, set the sfu_switchover room property to a value at or above the number of participants you want to stay on P2P (for a 1:1 call, set it to 2) via the REST API, or call setNetworkTopology({ topology: 'peer' }) from the Daily JS SDK.

Track subscriptions

Video rooms are based on a publish/subscribe model: participants publish audio and video tracks from their mic and camera, and subscribe to tracks published by others. By default, Daily subscribes each participant to every other participant’s tracks automatically. This works well for small calls, but in large calls — a webinar with hundreds of attendees, for example — most participants don’t need to receive every other participant’s video. Subscribing to tracks you’re not displaying wastes bandwidth and CPU. Daily track subscriptions let you control this precisely: subscribe to the tracks you’re displaying, stage nearby pages, and unsubscribe from the rest. This is the foundation of features like pagination, breakout rooms, and large-scale broadcasts.

Track subscriptions are only available on SFU calls.

Call quality and bandwidth

The biggest factor affecting call quality is the number of active video streams. Each video stream a participant receives requires bandwidth to download and CPU to decode. As a rough guide:

Each incoming video stream requires approximately 75 kbps downstream
A participant’s total upstream is only ~200 kbps, regardless of how many others are on the call (thanks to the SFU)
So a 10-person call needs roughly 200 kbps up and 750 kbps down per participant
Most modern laptops handle 30 simultaneous streams; older devices and mobile clients start to struggle around 12

Daily continuously adapts to network conditions — probing bandwidth, adjusting bitrates, and tuning resolution and frame rate in real time. But there’s a floor below which the experience degrades regardless of adaptation. As a developer, the most impactful thing you can do is limit how many streams each participant needs to decode at once. Track subscriptions, pagination, and simulcast layer control are the main tools for this.

Summary

WebRTC handles encryption, adaptive bitrate, and real-time media transport — Daily builds on it so you don’t have to.
Daily defaults to SFU for all calls. The mesh SFU connects participants to nearby servers and routes traffic over backbone networks, giving better latency than P2P for most real-world calls.
P2P is an option for 1:1 calls where participants are geographically close and E2E encryption is needed.
Track subscriptions let you control exactly which streams each participant receives — essential for calls with more than a handful of participants.
Bandwidth and CPU are finite. The fewer streams a participant has to decode, the better their experience. Design your UI with this in mind.

Architecture & Monitoring

Privacy and Security

Scaling Calls

Features

Video Call Architecture (Quick Intro)

WebRTC: real-time media in the browser

Rooms and participants

The architecture of a room: P2P vs. SFU calls

Track subscriptions

Call quality and bandwidth

Summary

​WebRTC: real-time media in the browser

​Rooms and participants

​The architecture of a room: P2P vs. SFU calls

​Track subscriptions

​Call quality and bandwidth

​Summary

WebRTC: real-time media in the browser

Rooms and participants

The architecture of a room: P2P vs. SFU calls

Track subscriptions

Call quality and bandwidth

Summary