Architecting a VOIP Cross-Platform Mobile App using WebRTC with Unity 3D

Dhilip Seshadri Kumar
5 min readJul 19, 2020

Prelude:

I realize that there are a lot of technologies involved in the title and it is there for a reason. The fascination towards P2P can not be downplayed. Ever since the advent of the internet and file-sharing applications, the voices of P2P communications have always been growing louder. And it is especially growing in popularity in recent pandemic times where we rely heavily on VOIP for remote communications. The recent surge in popularity has raised concerns about security and privacy. WebRTC remains one of the most secure ways for VOIP communication between two peers in an encrypted fashion and hence the drive to use this. Switching gears on to Unity, cross-platform development is again one of the trending topics which promise to achieve the goal of writing once and deploying everywhere. we don't generally associate Unity to develop mobile applications other than games, but there are inherent advantages to using Unity as a cross-platform development tool which we will analyze later in this article.

WebRTC:

For folks who are new to WebRTC, I will try to explain what it is and how it works very briefly. WebRTC is an open-source framework to establish real-time communication between peers.

WebRTC relies on two main server entities to get its initial handshake to establish between the peers. First is the STUN server which is used to obtain the public-facing IP address of the remote peers and next is the Signalling Server which initiates the handshake and establishes communications between the two parties. Once the initial handshake sequence is done the communication becomes P2P. For development purposes, there are many publically available STUN servers and Signalling servers, although I would recommend hosting open-source server projects in your own server setup. Keep in mind it would consume very less bandwidth as it is only used for the handshake.

Finally the dreaded component — “TURN server”. TURN server is a relay server that is used as a fallback mechanism when the remote peers are behind a symmetrical NAT and it is difficult to obtain the public IP address using STUN. So, in this case, we do not use P2P which means the whole communication is relayed using the TURN server adding to the cost and bandwidth. But thankfully we could avoid using the TURN server for most of the use cases. There are open-source implementations of TURN available which can be deployed easily. We can also rely on readymade solutions provided by third-party vendors (please refer to the reference section).

Unity as a Development Tool:

Unity is not the first thing that comes to mind when we think about cross-platform development tools for mobile applications but it has its own merits. Unity proves to be useful if we plan to extend support to devices such as HoloLens, Mac apps, and especially when it comes to throwing in some AR capabilities to our VOIP application. It has an inbuilt physics ray-cast which helps in the placement of models over planes quite easily.

Unity also has third party WebRTC plugins that would help us achieve the same and of course we can develop both ARKit and ARCore features in Unity. We can easily achieve 75% common codebase and 25% specific to platforms.

Application Development:

So now that we have an ecosystem in place with the STUN server, Signalling Server, TURN server, Unity IDE, and WebRTC plugin the missing piece is how do we put it all together using a Mobile application and build a workable solution.

So to brainstorm the architecture and build a solution, there are two key aspects we should consider further.

  1. Exchange of keys between peers
  2. Integrating the Push Notification Mechanism for a seamless experience

WebRTC Plugin requires a unique identifier that needs to be exchanged between the callee and the caller. This is usually done outside of the scope of application. We could send a unique ID when we register a session with WebRTC. This is the unique ID that the other peers will be able to connect (Imagine this as a meeting room code). We can rely on iMessages or text messages to send the ID to other clients. We can register a URI schema so that clicking on the text messages can open our application directly.

Next is notifying the callee about receiving a call. There are scenarios when the host application could be in the background or be closed while it waits for clients to join in. So we need to ensure that we notify the host appropriately when the clients join. We could use the Push notification mechanism to inform the callee about incoming calls and also integrate the platform-specific call UI such as CallKit and so on. Push Notification can be achieved by a simple push notification server or firebase.

So now that we have thought through everything that needs to be in place, let us put those thoughts on to blocks in the application. The high-level system can be envisioned as follows.

The Denouement:

Thus a simple and effective VOIP communication ecosystem can be envisioned by identifying the right components. WebRTC plays an increasingly major role in VOIP communication. It is easy and fun to integrate and extend various solutions from it. Unity’s physics engine is advanced and perfect to cater to various AR needs. Further information on all these components can be found in the references section below that can help to kickstart your development.

References:

  1. WebRTC Unity Plugin — https://www.because-why-not.com/
  2. Signaling Server — https://github.com/because-why-not/awrtc_signaling
  3. STUN/TURN — https://github.com/coturn/coturn
  4. TURN 3rd party — https://www.twilio.com/stun-turn
  5. Push Notifications — https://firebase.google.com/products/cloud-messaging

--

--

Dhilip Seshadri Kumar

Making the better place livable — developer — Mobile and everything around it.