Author’s note: Firefox landed support for multistream and renegotiation support in Firefox 38. This article talks about how the team at Jitsi Videobridge, a WebRTC service, collaborated with the Firefox WebRTC team to get Jitsi’s multi-party video conferencing working well in Firefox. In the process, several issues were identified and fixed on both sides of the system. Firefox 40 (our newly released Developer Edition) and later versions include all those fixes. This post, written by Jitsi engineer George Politis, assumes some basic knowledge of WebRTC and how it works.
Firefox is the first browser to implement the spec-compliant “Unified Plan” for multistream support, which Chrome will be moving to, but hasn’t implemented yet. Thus, services that currently work on Chrome will need some modifications to work on Firefox. I encourage all service providers who have or are thinking of adding multistream support to give Firefox 40 or later a try and let us know how it works for you. Thanks.
Engineering Manager, Web RTC
Many of you WebRTC developers out there have probably already come across the name Jitsi Videobridge. Multi-party video conferencing is arguably one of the most popular use cases for WebRTC and once you start looking for servers that allow you to implement it, Jitsi’s name is among the first you stumble upon.
The problem was that, until recently, applications using Jitsi Videobridge only worked on a limited set of browsers: Chromium, Chrome, and Opera.
This limitation is now gone!
After a few months of hard work by Mozilla and Jitsi developers, both Firefox and Jitsi have added the missing pieces and can now work together.
While this wasn’t the most difficult project on Earth, it wasn’t quite a walk in the park either. In this post we’ll tell you more about the nitty-gritty details of our collaborative adventure.
From a WebRTC perspective, every browser establishes exactly one PeerConnection with the videobridge. The browser sends and receives all audio and video data to and from the bridge over that one PeerConnection.
In a Jitsi Videobridge-based conference, all signaling goes through a separate server-side application called the Focus. It is responsible for managing media sessions between each of the participants and the videobridge. Communication between the Focus and a given participant is done through Jingle and between the Focus and the Jitsi Videobidge through COLIBRI.
When discussing interoperability between Firefox and Chrome for multi-party video conferences, it is impossible not to talk a little bit (or a lot!) about the Unified Plan and Plan B. These were two competing IETF drafts for the negotiation and exchange of multiple media sources (i.e., MediaStreamTracks or MSTs) between WebRTC endpoints. Unified Plan has been incorporated into the JSEP draft and Bundle negotiation draft, which are on their way to becoming IETF standards. Plan B expired in 2013 and nobody should care about it anymore … at least in theory.
In reality, Plan B lives on in Chrome and its derivatives, like Chromium and Opera. There’s actually an issue in the Chromium bug tracker to add support for Unified Plan in Chromium, but that’ll take some time. Firefox, on the other hand, has, as of recently, implemented Unified Plan.
Developers who implement many-to-many WebRTC-based videoconferencing solutions and want to support both Firefox and Chrome have to deal with this situation and implement some kind of interoperability layer between Chrome and and Firefox. Jitsi Meet is no exception of course; in the beginning it was a no-brainer to assume Plan B because that’s what Chrome implements and Firefox didn’t have multistream support. As a result, most of Jitsi’s abstractions were built around this assumption.
The most substantial difference between Unified Plan and Plan B is how they represent media stream tracks. Unified Plan extends the standard way of encoding this information in SDP which is to have each RTP flow (i.e., SSRC) appear on its own m-line. So, each media stream track is represented by its own unique m-line. This is a strict one-to-one mapping; a single media stream track cannot be spread across several m-lines, nor may a single m-line represent multiple media stream tracks.
Plan B takes a different approach, and creates a hierarchy within SDP; an m= line defines an “envelope”, specifying codec and transport parameters, and a=ssrc lines are used to describe individual media sources within that envelope. So, typically, a Plan B SDP has three channels, one for audio, one for video, and one for the data.
On the Jitsi side, it was obvious from the beginning that all the magic should happen in the client. The Focus communicates with the clients using Jingle, which is in turn transformed into SDP, and then handed over to the browser. There’s no SDP going around on the wire. Furthermore, there’s no signaling communication between the endpoints and the Jitsi Videobridge, it’s the Focus that mediates this procedure using COLIBRI. So the question for the Jitsi team was: “What’s the easiest way to go from Jingle to Unified Plan for Firefox, given that we have code that assumes Plan B in all imaginable places?”
In its first few attempts, the Jitsi team tried to provide general abstractions wherever there was Plan B specific code. This could have worked, but at the same period of time Jitsi Meet was undergoing some massive refactoring and the inbound Unified Plan patches were constantly broken. On top of that, with multistream support in Firefox in its very early stages, Firefox was breaking more often than it worked. Result: 0 progress. One could even argue that the progress was negative, because of the wasted time.
It was time to change course. The Jitsi team decided to try a more general solution to the problem and deal with it at a lower level. The idea was to build a PeerConnection adapter that would feed the right SDP to the browser, i.e. Unified Plan to Firefox and Plan B to Chrome, and that would give a Plan B SDP to the application. Enter sdp-interop.
sdp-interop is a reusable npm module that offers the two simple methods:
toUnifiedPlan(sdp)that takes an SDP string and transforms it into a Unified Plan SDP.
toPlanB(sdp)that, not surprisingly, takes an SDP string and transforms it into a Plan B SDP.
The PeerConnection adapter wraps the
setRemoteDescription() methods, and the success callbacks of the
createOffer() methods. If the browser is Chrome, the adapter does nothing. If, on the other hand, the browser is Firefox the PeerConnection adapter does as follows:
- Calls the
toUnifiedPlan()method of the sdp-interop module prior to calling the
setRemoteDescription()methods, thus converting the Plan B SDP from the application to a Unified Plan SDP that Firefox can understand.
- Calls the
toPlanB()method prior to calling the
createOffer()success callback, thus converting the Unified Plan SDP from Firefox to a Plan B SDP that the application can understand.