algesten / str0m Goto Github PK
View Code? Open in Web Editor NEWA synchronous sans I/O WebRTC implementation in Rust.
License: MIT License
A synchronous sans I/O WebRTC implementation in Rust.
License: MIT License
Currently there is no support for dtx.
Add:
Support for dtx in FormatParams
Change Opus packetizer to be able to write markerbit.
This might belong to the SCTP crate, but I will put it here for now for visibility.
When running data channels in reliable mode I saw hash map being on top in the performance.
Switching the datachannel to unreliable made it better.
_ = self.rtc.direct_api().create_data_channel(ChannelConfig {
label: "".into(),
negotiated: Some(1337),
ordered: true,
reliability: Reliability::Reliable,
..Default::default()
});
Make a full chat room where each incoming voice/camera is reflected out to all other connected clients.
Turns out I've misunderstood how multiple data channels are represented in the SDP. The SDP level is one m-line for the SCTP association, of which there is only one for the entire RTC session. I.e. the single m-line is inserted on the first data-channel added to the session, and any additional channels are multiplexed over the same association.
There are multiple issues running the examples in Safari.
One single video stream typically works, but not always. Firing up 3 incoming videos typically takes the entire browser down. It will start consuming 60-100% CPU, even if all windows are closed, and the process still running – force quitting will stop it.
Running chrome with --enable-logging=stderr --vmodule="*/webrtc/*=2"
gives us these kinds of rows.
[20362:40195:0111/182232.831763:WARNING:common_header.cc(73)] Invalid RTCP header: Padding bit set but 0 padding size specified.
[20362:40195:0111/182232.831784:WARNING:rtcp_receiver.cc(462)] Incoming invalid RTCP packet
[20362:40195:0111/182232.831802:WARNING:common_header.cc(73)] Invalid RTCP header: Padding bit set but 0 padding size specified.
[20362:40195:0111/182232.831814:WARNING:rtcp_receiver.cc(462)] Incoming invalid RTCP packet
The media API currently works only on "Sample" level, full H264/VP8/etc packets. The RTP level is an internal concern to str0m and there are built-in packetizer/depacketizer to go from sample level to RTP level. https://github.com/algesten/str0m/tree/main/packet
The WebRTC related RFC sometimes seem to consider SFU (Signal Forward Units) as something that just takes incoming RTP packets and routes them to other clients. This solves a quite narrow use case where not many peers participate in the same session.
RTP packets can't pass through an SFU unaltered because many RTP header fields depends on situation:
The above requires at least some level of rewriting the RTP header.
Bad link ┌─────────────┐
│ │ │
│ ┌──────│ Sender 1 │
│ │ │ │
┌─────────────┐ │ ┌─────────────┐ │ └─────────────┘
│ │ ▼ │ │ │
│ Receiver │◀ ─ ─ ─ │ SFU │◀─────┤
│ │ │ │ │
└─────────────┘ └─────────────┘ │ ┌─────────────┐
│ │ │
└──────│ Sender 2 │
│ │
└─────────────┘
The receiver here will miss different packets from Sender 1 and Sender 2. In a naive SFU, the NACK, PLI and FIR would be forwarded back to source. That means an increased pressure on bandwidth for all peers, not just the Receiver.
To make the SFU smarter, we can implement various levels of RTP buffering.
The above doesn't mean it's impossible to make an RTP level API. It just means the RTP level can't be thought of as "simply write this RTP packet to the wire". At the very least all headers must be fixed up to be correct given the SDP, TWCC etc.
I think we want a method to switch either an m-line, or the entire session into "RTP mode", which means instead of emitting Event::MediaData(MediaData)
, we have another event such as Event::RtpData(RtpData)
, and also MediaWriter::write_rtp
for the other direction.
We should support these RTP header exensions:
urn:3gpp:video-orientation
urn:ietf:params:rtp-hdrext:ssrc-audio-level
And potentially also:
http://www.webrtc.org/experiments/rtp-hdrext/color-space
http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
http://www.webrtc.org/experiments/rtp-hdrext/video-timing
http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
To make it possible to provide these, we might want to go back to a builder pattern for Media::write()
.
The impls has a few Vec<u8>
arguments. It would be good to look through the instances of these to see where we can re-use buffers (or use ring buffers), and where we can use lifetimes to avoid internal allocation.
Zero allocations is a non-goal, but we can do better than we currently do.
If we multiplex the server socket, we can avoid having to listen to multiple UDP sockets. This could probably reduce complexity in the example.
Currently if the receive register is in "probation", we drop the incoming RTP packet.
This is not great, because we miss the first packets of the first sample. We need to either reconsider the probation mechanism (do we need it?). It's in the RFC for RTP, but it doesn't seem like libwebrtc does it.
Another strategy would be to retain incoming RTP packets until probation is over and "replay them" into the depacketizing buffer.
Following the advice from here, I am opening a separate issue!
Here is what we want to do: Have libp2p applications running in the browser connect to server nodes that don't have a valid TLS certificate. Being in the browser, we don't have control over the WebRTC stack but use the RTCPeerConnection
API.
The entire protocol is described here: https://github.com/libp2p/specs/blob/master/webrtc/webrtc-direct.md#browser-to-public-server
We also don't have separate STUN servers. Instead, the client knows the server's address through an out-of-band mechanism. We have a self-descriptive address format that looks like this:
/ip4/172.28.0.1/udp/9999/webrtc/certhash/uEiD8c9GWyEurSjh9UkY-YWXaXBJZIHo179zx9IpgDFFKgw/p2p/12D3KooWDpJ7As7BWAwRMfu1VU2WCqNjvq387JEYKDBj4kx6nXTN
The component after the /certhash
part (uEiD8c9GWyEurSjh9UkY-YWXaXBJZIHo179zx9IpgDFFKgw
) is the server's certificate fingerprint. This gives us enough information to directly connect to the server.
We want to use str0m
for the server part of this setup. The node will always be publicly reachable, hence we are in ICE-lite mode. We only ever need data channels, no media.
Concrete questions I have:
sdp_api
. I think I achieved the same thing with the direct_api
and adding candidates myself. Is there something I should be aware of when doing it that way?More general questions:
str0m
?If an ICE agent doesn't receive any UDP traffic at all. I..e. no candidate is ever nominated, the agent never times out at all.
That means we get forever hanging clients when there is no UDP connectivity.
H264 codecs proposed by the browser are not matched by str0m. This is because of a bug in parsing the profile-level-id in the a=fmtp
line.
Another bug is that the PT is not locked down by what the browser proposes.
While rtc.channel(cid) will return None if not ready
it's also nice to get a Event once it's ready from str0m::Event::ChannelOpen(channel_id, label)
Simulcast is the function of sending multiple RTP streams for the same track at different bitrate levels. An SFU can chose which bitrate level is appropriate given the bandwidth.
Bad link -> send low bitrate
│
│
│
┌─────────────┐ │ ┌─────────────┐ High ┌─────────────┐
│ │ ▼ │ │◀────────────│ │
│ Receiver │◀ ─ ─ ─ │ SFU │ │ Sender 1 │◀───── Camera
│ │ │ │◀────────────│ │
└─────────────┘ └─────────────┘ Low └─────────────┘
The idea is that Event::MediaData(MediaData)
will also carry information about the simulcast level. The MediaWriter
should have a builder pattern where we can add on the simulcast level for sending.
It's a simple bug. Because ice-lite means not doing the binding requests, we simply don't schedule anything, which isn't right, because the connections not receiving any incoming binding requests, don't get purged.
MIDs can be any length (well... there's probably some limit in the standard... but it's a lot more than 3). Lengths of 1 or 2 are probably the most common (libwebrtc just increments "0", "1", and it's good to stay small for the header extension value). So maybe it will never matter in practice, but it doesn't seem to make sense to me to make the MID type a [u8; 3]. It's probably best to use String or Vec, or if perf around it really did matter somehow, Arc or smallvec or CompactString or something fancy.
To build a sender pacer for not overwhelming the receiving client, we need bandwidth estimation. Quinn-proto has some clean implementations that can be used for inspiration: https://github.com/quinn-rs/quinn/tree/main/quinn-proto/src/congestion
The corresponding code in libwebrtc is here https://chromium.googlesource.com/external/webrtc/+/refs/heads/main/modules/congestion_controller/
We both need events for PLI/FIR and ability to send it, per media.
Probably globally since we can't control tracks opened by the peer otherwise.
A possible way forward is to copy the sctp_proto
branch from webrtc-rs. This branch is a sans I/O implementation of SCTP. https://github.com/webrtc-rs/sctp/tree/proto
By default we should disallow DTLS 1.0. To do this we need a PR to land in rust-openssl crate: sfackler/rust-openssl#1886
The man page in openssl: https://www.openssl.org/docs/man3.1/man3/DTLSv1_2_method.html - tells us DTLSv1_2_method
is deprecated. The way to limit the DTLS version (or TLS for that matter), is to use SSL_CTX_set_min_proto_version
. In the Rust wrapper of openssl this corresponds to https://docs.rs/openssl/0.10.50/openssl/ssl/struct.SslContextBuilder.html#method.set_min_proto_version however SslVersion
constant lacks the values we need: https://docs.rs/openssl/0.10.50/openssl/ssl/struct.SslVersion.html
Given #3 implement a pacer for outgoing traffic. It needs to prioritise some types of data over other. The order is probably
The current value of MediaData::network_time
is the receive time of the first packet that makes up the MediaData
. However this is not necessarily the first packet to have been received due to reordering. It's also not clear whether this value should be when the first or last packet was received.
We should decided which it is and adjust the code.
Also, I think it would be good if MediaData
contained a field to reflect the delay caused by building the sample i.e. the difference between when the first packet arrived and the last.
As I use this project in our production app, I'm going to document which types aren't exposed from the public API of str0m
.
CandidateKind
Candidate::new
— cannot create a Server Reflexive candidate in any wayI am trialing to use this library for rust-libp2p
as a replacement for webrtc-rs
. The protocol we implement using WebRTC is using ice-lite and we rely on SDP munging.
The SDP offer I am trying to feed into str0m
errors on the parsing step.
This is my SDP message:
v=0
o=- 0 0 IN IP4 172.17.0.1
s=-
c=IN IP4 172.17.0.1
t=0 0
m=application 9999 UDP/DTLS/SCTP webrtc-datachannel
a=mid:0
a=ice-options:ice2
a=ice-ufrag:libp2p+webrtc+v1/a75469cf670c4079f8c06af4a963c8a1:libp2p+webrtc+v1/a75469cf670c4079f8c06af4a963c8a1
a=ice-pwd:libp2p+webrtc+v1/a75469cf670c4079f8c06af4a963c8a1:libp2p+webrtc+v1/a75469cf670c4079f8c06af4a963c8a1
a=fingerprint:sha-256 FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF
a=setup:actpass
a=sctp-port:5000
a=max-message-size:16384
The parser fails with:
Unexpected `a`\nExpected `c`\nsdp line\n
We have this code working in multiple other languages and browser bindings. From my limited research, this SDP message is still correct. Is the parser perhaps too strict here?
Any help is much appreciated!
SSRC allocation needs to be dynamic and function in two "modes". There are a number of scenarios that needs to be handled.
In classic mode, if the direction of an m-line is such that "I might send", the SDP contains the following lines (regardless of offer or answer).
a=rtpmap:98 VP9/90000
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
…
a=ssrc-group:FID 482634678 553121561
a=ssrc:482634678 cname:7zASaW9vFA6IT6Qe
a=ssrc:482634678 msid:xTscQI56sDGGHeMvl27qpDQTlVg7Sv1GrFc0 1dcf2d05-d3a4-482a-8bbe-6c6b217a9f3e
a=ssrc:553121561 cname:7zASaW9vFA6IT6Qe
a=ssrc:553121561 msid:xTscQI56sDGGHeMvl27qpDQTlVg7Sv1GrFc0 1dcf2d05-d3a4-42a-8bbe-6c6b217a9f3e
rtx
and VP9
tells us we are using a separate RTX stream for repairs.a=group:FID
tells us 553121561
repairs the stream 482634678
a=ssrc:482634678
tells us this is the main SSRC, because it comes first (order is important!)a=ssrc:553121561
tells us this is the repair SSRC, because it comes second.With simulcast, Chrome started doing this in a new way. Instead of advertising the SSRC it's going to use in advance, it relies on RTP header extensions to communicate the role of the packet. The receiver will still arrive at the same relationship as the Classic mode, but must discover it dynamically.
a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:10 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:11 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
…
a=rtpmap:98 VP9/90000
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
...
a=rid:h send
a=rid:l send
a=simulcast:send h;l
Instead of giving us any a=ssrc
lines, we are negotiating the headers mid
, rtp-stream-id
and repaired-rtp-stream-id
. This means that when an RTP packet arrives,
a=rid
lines and a=simulcast
tells us the expected header values for rtp-stream-id
.mid
+ rtp-stream-id
tells us the SSRC of the packet is the main stream.mid
+ repaired-rtp-stream-id
tells the SSRC of the packet is the repair stream.It appears Chrome will send us a couple of empty repair packets as "info" on stream start.
Before the Rid way, simulcast was triggered by munging the SDP and adding a line like this, together with a=ssrc
lines for all the SSRCs. This told us these 3 SSRC were used for different simulcast levels.
a=ssrc-group:SIM 659652645 1982135572 3604909222
It's unclear if we need to support this mode at all. It might just been an experiment that can be forgotten.
We need to support both Classic mode and Rid mode. It might be Chrome eventually goes only for the Rid mode. We need dynamic detection and appropriate handling.
Any m-line may start in one direction and then switch to the opposite (or sendrecv
). When we detect the mode will allow for sending packets, we either need to allocate SSRC straight away for Classic mode, or maybe later in Rid mode. The direction change will result in an SDP negotation, and if we are in Classic mode, the SSRC we intend to use must be communicated as a=ssrc
lines straight away.
Sequence counters and sender reports are tracked per SSRC, we need to represent the SSRC allocation and structure in a non-confusing, robust way.
For simulcast the most usual setup is that the browser sends multiple streams (with Rid), and the SFU will forward one of those to another peer. Sending simulcast from the SFU is unusual, but needs to be supported.
The current situation works, but the code to handle the situations is too spread out and the logic is not obvious.
Chrome complains like so:
[41956:36355:0116/203007.888959:WARNING:fir.cc(59)] Packet is too small to be a valid FIR packet.
[41956:36355:0116/203007.888998:WARNING:rtcp_receiver.cc(557)] 1 RTCP blocks were skipped due to being malformed or of unrecognized/unsupported type, during the past 10 second period.
https://www.rfc-editor.org/rfc/rfc3551.html#section-4.1
2023-04-04T14:40:20.162098Z TRACE str0m::session: Poll RTP: RtpHeader { version: 2, has_padding: false, has_extension: true, marker: true, payload_type: Pt(111), sequence_number: 13128, timestamp: 438720, ssrc: Ssrc(3618840797), ext_vals: ExtensionValues { mid: 1 abs_send_time: 3889608020.161961 voice_activity: true audio_level: -30 transport_cc: 90 }, header_len: 24 }
We need to set markerbit to 0 for audio in general, we need to set it to 1 marking start of talkspurt.
But for this issue, ensure that markerbit is 0 is enough.
Sender and Receiver sit under Media (and media corresponds to an m-line).
Any m-line can at any point change direction. Even if something starts RecvOnly, we can later become SendRecv (or even SendOnly).
If we want to send a PLI to a RecvOnly, the RTCP packet wants us to have both the incoming media SSRC as well as some local SSRC (which would be the "Sender SSRC"). For now we've used SSRC 0, but this is pretty bad since the SSRC is a components of the (AES crypto) IV in SRTCP – and 0 being guessable and static is pretty crap.
Currently when applying an SDP offer, we create corresponding Senders for each remote a=ssrc
.
However, this breaks down when enabling simulcast, since that makes Chrome stop sending a=ssrc
and instead requires us to dynamically create Receivers using the rid
/repair-rid
RTP header extensions. In this scenario we don't create the corresponding Senders, and thus lacks a "Sender SSRC" for feedback packets.
Rather than loosely matching Sender/Receiver, let's make them hard paired in a new struct called something like Transceiver or Pair. This way we get a static guarantee they always come in pairs and we know which one is allocated to match which.
The final bit of the puzzle is knowing which SSRC (both in and out) are definitely okay, and which might need recreating due to SSRC conflicts. For old school a=ssrc
, we can simply avoid the SSRC the other side declares, but for dynamic allocation, there is this risk:
rid
header extension - also create corresponding Sender.Sender SSRC in 2 could conflict with incoming SSRC in 3.
Small optimization to save bit's on the wire.. Once receiver has mapped the Mid to an SSRC we can stop sending the header ext.
I am trying to use str0m in VS Code on Windows 10, when I cargo run
it appears some DLL files are not found:
exit code: 0xc0000135, STATUS_DLL_NOT_FOUND
full output:
Compiling str0m-chat v0.1.0 (C:\Users\douga\Documents2\code\RUST-projects\str0m-chat)
Finished dev [unoptimized + debuginfo] target(s) in 1m 00s
Running `target\debug\str0m-chat.exe`
error: process didn't exit successfully: `target\debug\str0m-chat.exe` (exit code: 0xc0000135, STATUS_DLL_NOT_FOUND)
That is in VC code's PowerShell terminal only. When I run cargo run
in Bash (MINGW64) it runs fine. So I ran ldd
in Bash and it shows me the DLLs required by the executable:
$ ldd target/debug/str0m-chat.exe
ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb1fd10000)
KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb1fb70000)
KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb1d5f0000)
bcrypt.dll => /c/WINDOWS/System32/bcrypt.dll (0x7ffb1d410000)
ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ffb1dab0000)
VCRUNTIME140.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140.dll (0x7ffb03f60000)
libssl-1_1-x64.dll => /mingw64/bin/libssl-1_1-x64.dll (0x7ffb014d0000)
msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffb1fad0000)
libcrypto-1_1-x64.dll => /mingw64/bin/libcrypto-1_1-x64.dll (0x7ffad8050000)
ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ffb1e950000)
sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ffb1fc30000)
RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ffb1f0c0000)
USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ffb1e7a0000)
win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ffb1da80000)
GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ffb1eaa0000)
gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ffb1dc50000)
msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ffb1dbb0000)
WS2_32.dll => /c/WINDOWS/System32/WS2_32.dll (0x7ffb1fa60000)
Two of them are in MINGW64 instead of SYSTEM32:
libssl-1_1-x64.dll => /mingw64/bin/libssl-1_1-x64.dll (0x7ffb014d0000)
libcrypto-1_1-x64.dll => /mingw64/bin/libcrypto-1_1-x64.dll (0x7ffad8050000)
So I figure this is some sort of OpenSSL install issue on Windows.
Has anyone else run into this and discovered any ways around it?
If n the inital OFFER we add a data channel, there's a risk we allocate the incorrect SCTP channel id.
// RFC 8831
// Unless otherwise defined or negotiated, the
// streams are picked based on the DTLS role (the client picks even
// stream identifiers, and the server picks odd stream identifiers).
To allocate odd/even means we must know if we're client or server in the SCTP setup. The setup is controlled by the DTLS a=setup:active
attribute that we only know once we've made the very first OFFER/ANSWER dance. Hence we got a chicken and egg situation – can't allocate the sctp channel before OFFER/ANSWER, but also want to create the initial channel already in the first OFFER.
We will only publish str0m
to crates.io and we want all types that appear in the public API to be re-exported. The task is to audit that all the relevant types from project subcrates are re-exported in str0m
.
What to do with them?
We already handle twcc, what do we do with the "old school" RTP receiver reports? (RR)
https://www.rfc-editor.org/rfc/rfc3550#page-42
The code in question is here:
Lines 553 to 554 in e6b01fd
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
The SDP configures which feedback mechanisms to use for the PT. We currently ignore these, but should obey them (apart from goog-remb
, which we haven't implemented).
Notice the config is per PT which causes some headaches for session wide feedback TWCC. Should we send TWCC if just one PT has it enabled? Or do we need to see traffic on that PT? Unclear.
Before publishing to crates.io, we want to have at least rudimentary docs for all public types and functions. Ideally we also have lots of examples, but this can come later.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.