-
Notifications
You must be signed in to change notification settings - Fork 64
Add Lockless DTLS #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…. There is a test that doesn't pass that concerns me.
… highlights a bug in lockless DTLS
Mukikaizoku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far just looked at the work distribution and not the DTLS processing code (I presume it's just migrated code and just needs spot checking).
Really excited that we're moving forward with making things lockless! I think overall I'd still recommend going with a thread-affinity approach as I think it'd keep the message processing "fairness" the best. On a more detailed note, I think then having single receive/work queues per thread may make it easier to manage thread work-load/sleep cycling.
Nonetheless, I think the non-affinity model here could be viable and I had some comments about how I think it could be improved.
Bottom-line, I think whatever is done, the absolutely most important thing to do here is to measure performance at various loads. In particular, we could see performance being great at certain loads, but starts to deteriorate at a certain CCU. Measuring statistical distribution of latencies will give us confidence in the methodology.
| LocklessDtlsServerConnection connection; | ||
| if (!this.allConnections.TryGetValue(remoteEP, out connection)) | ||
| { | ||
| lock (this.allConnections) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this lock and double TryGet anymore - previously, the Add was happening from multiple threads, but if it's on a single receive thread, we should be fine
| var myTid = Thread.CurrentThread.ManagedThreadId; | ||
| while (this.receiveQueue.TryTake(out var sender)) | ||
| { | ||
| if (Interlocked.CompareExchange(ref sender.LockedBy, myTid, 0) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm - I'm trying to wrap my head around a few comments that are all connected wrt:
-This compare exchange
-The concurrent dictionary within ConcurrentSetQueue
-The monitor pulses? (maybe)
I believe the goal here is to do thread-non-affinity lockless handling of a connection. I think the criteria that we are looking for is even work distribution. Essentially, without affinity, we are creating a "check-out" system, which effectively locks a particular connection to a particular thread temporarily.
The ConcurrentSetQueue will be loaded when receives/sends have triggered for that connection, meaning that even if a thread is working on a connection, the SetQueue may note there's new work to be done (even if the thread is still processing it). This is guarded by the compare exchange.
On one-side, I'd recommend the compare exchange to be a feature of ConcurrentSetQueue, so the worker thread will be guaranteed to get a connection on a TryTake (if a free one exists) or block/sleep. Considering receives/sends happen constantly, we can guess that there might be a lot of exchange "misses" here and the internal dictionary keeps filling with connections that are already being processed.
At the same time, AFAIK the concurrent dictionary isn't order guaranteed and so I don't think we're guaranteed queue-like behavior here.
So stepping back, I think we can potentially avoid these quirks by switching up how the "connection check-out" works with ConcurrentSetQueue. Instead of queuing connections on receive/send - queue them on connection creation. Then have threads pull/return connections with FIFO behavior (maybe BlockingCollection?). For disconnected connections, don't re-queue them and if it's disconnected on a dequeue, just drop it and re-dequeue.
We then don't need the monitors to synchronize the threads as well. If we do need monitors, I have the feeling it might be cleaner to using one of the signaling objects .NET provides (ManualResetHandle? I forget which one works for this exact case).
| } | ||
| } | ||
|
|
||
| while (sender.PacketsSent.TryDequeue(out var message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reiterating that I think these two receive and send loops will need maximum processing counts.
Also, while I think the monitors will take care of sleeping the thread occasionally, if the TryTake doesn't induce a Wait, a thread could spin quite a bit - wondering if it'll be necessarily to sleep the thread every 100 or 1000 total messages.
f30e956 to
ff5f86a
Compare
I'll describe this in more detail when I get a chance, but it's based on @Mukikaizoku's outline for message queues. The big refactor is actually combining the PeerData and DtlsServerConnection classes. This allows for a lot more certainty around adding and removing connections during handshake. There is also a couple of new concurrent set structures which allows us to either cycle connections across multiple threads or assert thread affinity.
WIP stuff: