In the previous blog post we looked at the relevant protocols used for VoIP (SIP,SDP,RTP) and some libraries you might use to develop a SIP softphone.
Equipped with these basics let us dive into some details of our implementation after the following disclaimer:
To the uninitiated in the dark arts of Voice over IP we want to apologize in advance for the acronym soup you are (unless you stop reading here) inevitably about to enter. You can find explanations of all the used jargon in the first part of the blog post.
The necessary work for an initial SIP plugin consists of signalling and media pipeline code which is largely independent of each other and some glue code responsible (e.g. for plugging negotiated codecs into the media pipeline).
Some notable limitations at the time (including, but not limited to):
~/.config/calls/sip-account.cfg
gnome-calls sip:some_user@some_host.tld
One of the first improvements was supporting multiple audio codecs.
Implementing support for multiple provider plugins was another important milestone, that would allow to have both VoIP calls over SIP as well as traditional cellular calls using the modem at the same time.
The feature of managing accounts in the UI made sure that users would not have to manually tinker with credentials in a key file any more.
The first thing a SIP client typically does is registering to a server called the registrar.
The registrar is responsible for accounts in its domain (e.g. domain A
in the SIP trapezoid), challenges you to authenticate when you want to login and maintains a list of currently connected clients per account (after all you can be connected to the same account from multiple devices). The registrar can often be reached on the same address as your SIP server, e.g. sip:example.org
.
A simple invocation of
nua_register (registration_handle,
NUTAG_M_USERNAME ("alice"),
NUTAG_REGISTRAR ("sip:example.org"),
TAG_END ());
will cause a message to be sent that looks something like:
REGISTER sip:example.org SIP/2.0
Via: SIP/2.0/UDP 192.168.1.23:47888;rport;branch=z9hG4bKBB6516vD98Z7F
From: <sip:alice@example.org>;tag=BByp2m992N4gc
To: <sip:alice@example.org>
Call-ID: 0df10634-7364-123b-bdb1-18c04d8376ad
CSeq: 966403932 REGISTER
Contact: <sip:alice@192.168.1.23:47888>
Expires: 180
User-Agent: calls sofia-sip/1.12.11devel
Content-Length: 0
We now receive a message challenging us for authentication:
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 192.168.1.23:47888;rport=47888;received=192.168.1.23;branch=z9hG4bKBB6516vD98Z7F
Call-ID: 0df10634-7364-123b-bdb1-18c04d8376ad
From: <sip:alice@example.org>;tag=BByp2m992N4gc
To: <sip:alice@example.org>;tag=z9hG4bKBB6516vD98Z7F
CSeq: 966403932 REGISTER
WWW-Authenticate: Digest realm="asterisk",nonce="1656626926/25ce7bee572bdfa3eba02f41552405a3",opaque="71c8fa0e05df2db4",algorithm=md5,qop="auth"
Server: Asterisk PBX 16.16.1~dfsg-1+deb11u1
Content-Length: 0
invoking our callback (see Sofia SIP section in Part 1) for the nua_r_register
(response to register) event. This ends up being processed in the following code path
if (status == 401 || status == 407) {
char *auth = g_strdup_printf ("%s:%s:%s:%s",
scheme, realm, username, password);
nua_authenticate (nh, NUTAG_AUTH (auth), TAG_END ());
}
causing us to answer the authentication challenge with our credentials:
REGISTER sip:example.org SIP/2.0
Via: SIP/2.0/UDP 192.168.1.23:47888;rport;branch=z9hG4bKcmZy31DH6HptB
From: <sip:alice@example.org>;tag=BByp2m992N4gc
To: <sip:alice@example.org>
Call-ID: 0df10634-7364-123b-bdb1-18c04d8376ad
CSeq: 966403933 REGISTER
Contact: <sip:alice@192.168.1.23:47888>
Authorization: Digest username="alice", realm="asterisk", nonce="1656626926/25ce7bee572bdfa3eba02f41552405a3", cnonce="DfLWRHNkEjuxvRjATYN2rQ", opaque="71c8fa0e05df2db4", algorithm=MD5, uri="sip:example.org", response="3bf175376f577552babf42c4e6382a07", qop=auth, nc=00000001 Content-Length: 0
And this time we get:
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.23:47888;rport=47888;received=192.168.1.23;branch=z9hG4bKcmZy31DH6HptB
Call-ID: 0df10634-7364-123b-bdb1-18c04d8376ad
From: <sip:alice@example.org>;tag=BByp2m992N4gc
To: <sip:alice@example.org>;tag=z9hG4bKcmZy31DH6HptB
CSeq: 966403933 REGISTER
Date: Thu, 30 Jun 2022 22:08:47 GMT
Contact: <sip:alice@192.168.1.23:47888;user=phone>;expires=179
Expires: 180
Server: Asterisk PBX 16.16.1~dfsg-1+deb11u1
Content-Length: 0
The server accepted our registration and we are now online. Hooray!
Now that we are online, sending an INVITE
(to actually call bob on the other end) is as simple as
char *local_sdp = construct_sdp ();
nua_invite (call_handle,
SOATAG_USER_SDP_STR (local_sdp),
SIPTAG_TO_STR ("sip:bob@example.org"),
TAG_END ());
resulting in the following message:
INVITE sip:bob@example.org SIP/2.0
Via: SIP/2.0/UDP 192.168.1.23:56164;rport;branch=z9hG4bKX02p8U7yy6QHr
From: <sip:alice@example.org>;tag=t39mvBvF17gSa
To: <sip:bob@example.org>
Call-ID: 4ad1af21-7363-123b-4484-18c04d8376ad
CSeq: 966403849 INVITE
Contact: <sip:6001@192.168.1.23:56164>
Content-Type: application/sdp
Content-Length: 237
v=0
o=- 5090247275169452773 8778832019085077134 IN IP4 192.168.1.23
s=-
c=IN IP4 192.168.1.23
t=0 0
m=audio 49270 RTP/AVP 9 8 0 3
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:3 GSM/8000
a=rtcp:46118
Let us now take another look at the SDP containing the codecs we want to negotiate. For now we only support a subset of static payload definitions, but extending them is on our radar.
The codecs are encoded in the following struct
typedef struct {
guint payload_id;
char *codec_name; /* e.g. PCMA */
gint clock_rate; /* sampling rate */
gint channels; /* mono, stereo */
char *gstreamer_payloader_element_name;
char *gstreamer_depayloader_element_name;
char *gstreamer_encoder_element_name;
char *gstreamer_decoder_element_name;
char *gstreamer_filename_plugin;
} MediaCodecInfo;
which includes the the information necessary for constructing the SDP as well as the names of the GStreamer elements we plug into our pipeline.
The currently supported codecs are hardcoded to:
static MediaCodecInfo gst_codecs[] = {
{8, "PCMA", 8000, 1, "rtppcmapay", "rtppcmadepay", "alawenc", "alawdec", "libgstalaw.so"},
{0, "PCMU", 8000, 1, "rtppcmupay", "rtppcmudepay", "mulawenc", "mulawdec", "libgstmulaw.so"},
{3, "GSM", 8000, 1, "rtpgsmpay", "rtpgsmdepay", "gsmenc", "gsmdec", "libgstgsm.so"},
{9, "G722", 8000, 1, "rtpg722pay", "rtpg722depay", "avenc_g722", "avdec_g722", "libgstlibav.so"},
};
The following code iterates over the supported codecs and build the SDP string from it:
for (GList *node = supported_codecs; node != NULL; node = node->next) {
MediaCodecInfo *codec = node->data;
g_string_append_printf (media_line, " %u", codec->payload_id);
g_string_append_printf (attribute_lines,
"a=rtpmap:%u %s/%u%s",
codec->payload_id,
codec->codec_name,
codec->clock_rate,
"\r\n");
}
When we started experimenting with RTP in GStreamer we were using two separate gst-launch-1.0
invocations, one per direction:
For sending:
gst-launch-1.0 rtpbin name=rtpbin \
pulsesrc ! avenc_g722 ! rtpg722pay ! rtpbin.send_rtp_sink_0 \
rtpbin.send_rtp_src_0 ! udpsink host=${REMOTE} port=5002\
rtpbin.send_rtcp_src_0 ! udpsink host=${REMOTE} port=5003 sync=false async=false \
udpsrc port=5007 ! rtpbin.recv_rtcp_sink_0
gst-launch-1.0 -v rtpbin name=rtpbin \
udpsrc caps="application/x-rtp,media=(string)audio,clock-rate=(int)8000,encoding-name=(string)G722" \
port=5002 ! rtpbin.recv_rtp_sink_1 \
rtpbin. ! rtpg722depay ! avdec_g722 ! pulsesink \
udpsrc port=5003 ! rtpbin.recv_rtcp_sink_1 \
rtpbin.send_rtcp_src_1 ! udpsink host=${REMOTE} port=5007 sync=false async=false
rtpbin
is the main actor here: It can manage multiple RTP and their corresponding RTCP streams and create source and sink pads as needed (e.g. recv_rtp_sink_0
and send_rtp_src_0
). These are appropriately linked to udpsink
and udpsrc
to send and receive data over the network via UDP, as well as to pulsesink
for playing the audio back to you and pulsesrc
for recording from your microphone.
The media pipeline code was closely modelled after these sort of commands.
Note that while these gst-launch-1.0
commands will work within the same network, they likely won’t work (as presented) over the internet, but more on that in the next section.
In the IPv4 world users typically sit behind a router which performs Network address translation (NAT) between the private network and the internet.
The router will keep track of connections and knows to which client in the private network responses from the internet should get forwarded to.
This does of course mean that our media pipeline needs to reuse the socket from outgoing RTP traffic for the incoming connections as well.
It should be noted that this is a rather simple approach, but it works for most cases. In the future however we plan to support NAT traversal using the Interactive Connectivity Establishment technique.
You can find the relevant issue here.
With Secure RTP (SRTP) we can secure our media streams.
SRTP can be used to provide confidentiallity, message authentication and replay protection to both RTP and RTCP traffic.
Encrypting the RTP data using block ciphers e.g. AES Counter Mode) or Galois/Counter Mode provides confidentiality, while e.g. SHA is used for the message authentication.
For the security minded we must draw attention to the fact, that we currently only support SDP Security Descriptions for Media Streams (SDES) which embeds the keys used for SRTP directly in the signalling (as part of the exchanged SDP) like this:
m=audio 49170 RTP/SAVP 0
a=crypto:1 AES_CM_128_HMAC_SHA1_32
inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj
The string after inline:
is the base64 encoded representation of the key: This is why you can only enable SRTP when using TLS for the signalling. Even with TLS enabled you should be aware that your server (and potentially any proxies your messages pass through) can still see the keys with this particular method for key exchange.
Supporting DTLS-SRTP and ZRTP is on our radar.
With this disclaimer out of the way, let us return to some technical details:
The bulk of changes took place in these two merge requests.
We decomposed the work needed into the following tasks:
GstSrtpEnc
and GstSrtpDec
in the media pipeline. key changea=crypto
SDP attributes key changea=crypto
attributes in the SDP offer/answer model key changesThe fruits of our labour can be seen in this last pipeline graph we want to show you: Note the presence (and base64 encoded “key” property) of GstSrtpEnc
and GstSrtpDec
inside the GstRtpBin
!
I think it is fair to say that we covered quite a lot of ground: From the main protocols (SIP, SDP, RTP) and some code examples to a simple implementation, iterative improvements and finally supporting encrypted calls.
Additionally we want to point out that the knowledge (and code) we’ve developed will surely come in handy for implementing support for audio calls in chatty using matrix or video calls on the using the cameras of the Librem 5 (libcamera will provide easy to use GStreamer elements).
Finally, we hope you enjoyed reading this write-up as much as we enjoyed writing it (nevermind the code).
The human behind this blog post is very happy to be able to make a living developing free software and being part of the community.
My thanks go to Purism, for the work that I am able to do here, the team that I have the absolute pleasure of working with and the vision that it pursues:
I can honestly say that the Librem 5 is the smartphone I always wanted to have (but more on that in the future)!
Lastly, special thanks also to everyone who contributes to the ecosystem – by writing code, providing translations, filing issues or being an user:
You are awesome!
Model | Status | Lead Time | ||
---|---|---|---|---|
Librem Key (Made in USA) | In Stock ($59+) | 10 business days | ||
Librem 5 | In Stock ($699+) 3GB/32GB | 10 business days | ||
Librem 5 COMSEC Bundle | In Stock ($1299+) Qty 2; 3GB/32GB | 10 business days | ||
Liberty Phone (Made in USA Electronics) | Backorder ($1,999+) 4GB/128GB | Estimated fulfillment early November | ||
Librem 5 + SIMple (3 GB Data) | In Stock ($99/mo) | 10 business days | ||
Librem 5 + SIMple Plus (5 GB Data) | In Stock ($129/mo) | 10 business days | ||
Librem 5 + AweSIM (Unlimited Data) | In Stock ($169/mo) | 10 business days | ||
Librem 11 | In Stock ($999+) 8GB/1TB | 10 business days | ||
Librem 14 | Backorder ($1,370+) | Estimated fulfillment date pending | ||
Librem Mini | Backorder ($799+) | Estimated fulfillment November | ||
Librem Server | In Stock ($2,999+) | 45 business days |