nanomsg / nng Goto Github PK

View Code? Open in Web Editor NEW

3.7K 124.0 474.0 16.34 MB

nanomsg-next-generation -- light-weight brokerless messaging

Home Page: https://nng.nanomsg.org

License: MIT License

C 93.21% CMake 2.91% Shell 0.54% C++ 2.34% Go 1.00%

nng's Introduction

nng - nanomsg-next-gen

Please see here for an important message for the people of Russia.

ℹ️	If you are looking for the legacy version of nanomsg, please see the nanomsg repository.

This project is a rewrite of the Scalability Protocols library known as libnanomsg, and adds significant new capabilities, while retaining compatibility with the original.

It may help to think of this as "nanomsg-next-generation".

NNG: Lightweight Messaging Library

NNG, like its predecessors nanomsg (and to some extent ZeroMQ), is a lightweight, broker-less library, offering a simple API to solve common recurring messaging problems, such as publish/subscribe, RPC-style request/reply, or service discovery. The API frees the programmer from worrying about details like connection management, retries, and other common considerations, so that they can focus on the application instead of the plumbing.

NNG is implemented in C, requiring only C99 and CMake to build. It can be built as a shared or a static library, and is readily embeddable. It is also designed to be easy to port to new platforms if your platform is not already supported.

License

NNG is licensed under a liberal, and commercial friendly, MIT license. The goal to the license is to minimize friction in adoption, use, and contribution.

Enhancements (Relative to nanomsg)

Here are areas where this project improves on "nanomsg":

Reliability	NNG is designed for production use from the beginning. Every error case is considered, and it is designed to avoid crashing except in cases of gross developer error. (Hopefully we don’t have any of these in our own code.)
Scalability	NNG scales out to engage multiple cores using a bespoke asynchronous I/O framework, using thread pools to spread load without exceeding typical system limits.
Maintainability	NNG’s architecture is designed to be modular and easily grasped by developers unfamiliar with the code base. The code is also well documented.
Extensibility	Because it avoids ties to file descriptors, and avoids confusing interlocking state machines, it is easier to add new protocols and transports to NNG. This was demonstrated by the addition of the TLS and ZeroTier transports.
Security	NNG provides TLS 1.2 and ZeroTier transports, offering support for robust and industry standard authentication and encryption. In addition, it is hardened to be resilient against malicious attackers, with special consideration given to use in a hostile Internet.
Usability	NNG eschews slavish adherence parts of the more complex and less well understood POSIX APIs, while adopting the semantics that are familiar and useful. New APIs are intuitive, and the optional support for separating protocol context and state from sockets makes creating concurrent applications vastly simpler than previously possible.

Compatibility

This project offers both wire compatibility and API compatibility, so most nanomsg users can begin using NNG right away.

Existing nanomsg and mangos applications can inter-operate with NNG applications automatically.

That said, there are some areas where legacy nanomsg still offers capabilities NNG lacks — specifically enhanced observability with statistics, and tunable prioritization of different destinations are missing, but will be added in a future release.

Additionally, some API capabilities that are useful for foreign language bindings are not implemented yet.

Some simple single threaded, synchronous applications may perform better under legacy nanomsg than under NNG. (We believe that these applications are the least commonly deployed, and least interesting from a performance perspective. NNG’s internal design is slightly less efficient in such scenarios, but it greatly benefits when concurrency or when multiple sockets or network peers are involved.)

Supported Platforms

NNG supports Linux, macOS, Windows (Vista or better), illumos, Solaris, FreeBSD, Android, and iOS. Most other POSIX platforms should work out of the box but have not been tested. Very old versions of otherwise supported platforms might not work.

Requirements

To build this project, you will need a C99 compatible compiler and CMake version 3.13 or newer.

We recommend using the Ninja build system (pass "-G Ninja" to CMake) when you can. (And not just because Ninja sounds like "NNG" — it’s also blindingly fast and has made our lives as developers measurably better.)

If you want to build with TLS support you will also need Mbed TLS. See docs/BUILD_TLS.adoc for details.

Quick Start

With a Linux or UNIX environment:

  $ mkdir build
  $ cd build
  $ cmake -G Ninja ..
  $ ninja
  $ ninja test
  $ ninja install

API Documentation

The API documentation is provided in Asciidoc format in the docs/man subdirectory, and also online. The nng(7) page provides a conceptual overview and links to manuals for various patterns. The libnng(3) page is a good starting point for the API reference.

You can also purchase a copy of the NNG Reference Manual. (It is published in both electronic and printed formats.) Purchases of the book help fund continued development of NNG.

Example Programs

Some demonstration programs have been created to help serve as examples. These are located in the demo directory.

Legacy Compatibility

A legacy libnanomsg compatible API is available, and while it offers less capability than the modern NNG API, it may serve as a transition aid. Please see nng_compat(3) for details.

Commercial Support

Commercial support for NNG is available.

Please contact Staysail Systems to inquire further.

Commercial Sponsors

The development of NNG has been made possible through the generous sponsorship of Capitar IT Group BV and Staysail Systems, Inc..

nng's People

Contributors

Stargazers

Watchers

Forkers

reqshark techniware adityamarella roscopecoltran mwpowellhtx liamstask brehm raymond-sun potatogim staysail a-j-k myneworder dengf kousgroup bertrand- tveric kamalgs nonnenmacher lihongguang linecode toppk cuiopen goldstar111 laszlo-kiss protoblock mark-r-stevens sg777 orrgal1 dehorsley wilesun andy-amoy mbush00 winbuilds zplus camerondruyor proffan moonsimon mzipay babeloff ame89 svenefftinge codypiersall surgams cerestong nothrow charmerx jimjag wndhangzhou awakecoding 19317362 neachdainn selvamkrish moneytech dndn1011 bgat stevexucd gregorburger bobdeng1974 tawawhite tjrong zagor trevor211 marchon tidesq mjgigli steve-scott 7thtool krattai leelnfei tigercl indigos33k3r a83376750 shadowwalker2718 denniswind xwbeast kuncao moshohayeb bgraf jungkwangho vk-coder docwyatt2001 marius-plv topooo philwil haohuixin shawvi kjx98 webee pickledgator nariakiiwatani kamingli1st the-other-james biscand ian2009 yssource eaybek bsirang q1q2q3-q4 slyusar003 jeikabu

nng's Issues

Double free found in stress testing

Stress testing the ipc test program in a tight loop, we found this:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ipc'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f764ee08428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f764a2f1700 (LWP 18825))]
(gdb) bt
#0  0x00007f764ee08428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f764ee0a02a in __GI_abort () at abort.c:89
#2  0x00007f764ee4a7ea in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f764ef63e98 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f764ee5337a in malloc_printerr (ar_ptr=<optimized out>, 
    ptr=<optimized out>, 
    str=0x7f764ef63fa8 "double free or corruption (out)", action=3)
    at malloc.c:5006
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0)
    at malloc.c:3867
#5  0x00007f764ee5753c in __GI___libc_free (mem=<optimized out>)
    at malloc.c:2968
#6  0x0000000000411853 in nni_free (ptr=0x7f763c0008c0, size=176)
    at /home/parallels/Projects/nng/src/platform/posix/posix_alloc.c:27
#7  0x000000000041aecd in nni_posix_pipedesc_fini (pd=0x7f763c0008c0)
    at /home/parallels/Projects/nng/src/platform/posix/posix_pipedesc.c:334
#8  0x0000000000419e74 in nni_plat_ipc_pipe_fini (p=0x7f763c0008c0)
    at /home/parallels/Projects/nng/src/platform/posix/posix_ipc.c:159
#9  0x00000000004136d4 in nni_ipc_pipe_fini (arg=0x7f76340008c0)
    at /home/parallels/Projects/nng/src/transport/ipc/ipc.c:93
#10 0x000000000040db71 in nni_pipe_destroy (p=0x7f7634000da0)
    at /home/parallels/Projects/nng/src/core/pipe.c:60
#11 0x000000000040dd48 in nni_pipe_reap (p=0x7f7634000da0)
    at /home/parallels/Projects/nng/src/core/pipe.c:136
#12 0x00000000004106ff in nni_taskq_thread (self=0xa46c30)
    at /home/parallels/Projects/nng/src/core/taskq.c:44
#13 0x0000000000410f52 in nni_thr_wrap (arg=0xa46c38)
    at /home/parallels/Projects/nng/src/core/thread.c:88
#14 0x0000000000412265 in nni_plat_thr_main (arg=0xa46c38)
    at /home/parallels/Projects/nng/src/platform/posix/posix_thread.c:185
#15 0x00007f764f1a46ba in start_thread (arg=0x7f764a2f1700)
    at pthread_create.c:333
#16 0x00007f764eeda3dd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Eliminate the separate timer

The timer logic we have in src/core/timer.c could probably go away now. The main timeouts we have are using aios and their builtin timers, and it seems that the timer.c code is now duplicate; it also forces yet another thread to be created.

Consider using synchronous completions sometimes

Performance wise, we could eliminate a number of context switches if we used synchronous completions when we are already in a context that is safe to do so. For example, any of the callbacks executed from pollers or the IO completion threads can reasonably safely assume that that no locks are held, and therefore could execute completions without going through a taskq.

Probably the best way to do this is to create a nni_aio_finish_synch() routine that does the completion logic right in the context of the caller. Then call sites inside individual providers could be modified as needed. Note that some modes of this need to be called asynch -- such as any context where we are on a user's call stack.

Endpoint close should be synchronous

We don't need to create a reaper thread.

We always have user context -- either via socket_close, or endpoint_close, or indirectly via nng_fini().

(Pipes still have async cleanups and need the reaper thing.)

We do need to use reference counting though, so that we can create an API for accessing endpoints. This will be handled much like the socket API.

Make transports pluggable

The current transports are hard-wired into the library. We need to make these pluggable.

This will mean that transports will need to have access to the guts of the library, and we will need to change the transports fixed array into a linked list with registration routines.

It should not be necessary to support unregistering transports.

Dynamic option numbering desired.

As we make transports and protocols pluggable, its likely that the framework will not know the option numbers apriori, and unless we want to get into managing the identifiers for option numbers, we should contemplate a mechanism where by option ids are "registered" dynamically, allowing applications to select options using a string. (The only other option is to create and manage a registry, which is a PITA and not scalable.)

I am proposing an API like this:

// gets the same value if option string is already known.  can return NNG_ENOMEM
// No unregister; but nng_fini() cleans them all up, hence protocols and transports
// need watch these.  Knowing the scope allows us to avoid passing transport options
// to protocols, and vice versa.  (Segregate by ID.)  (XXX: maybe we should register
// other information to help with discovery?)
enum { NNI_OPTION_SCOPE_PROTO, NNI_OPTION_SCOPE_TRANSPORT };
int nni_option_register(int *valuep, int scope, const char *name);

// Public API (nng.h)
int nng_option_lookup(int *valuep, const char *name);

Providers and transports will be expected to use names that are specific to them to avoid
collisions by scoping the option names with a "/". E.g. a TCP option could be "TCP/NoDelay".
Names without a "/" are assumed to be common, socked scoped.

Using a two step lookup will allow the hot code paths to just use integers like they always have done.

Sockets should be uint32_t's (handles) not pointers.

We want to convert our external API to one that uses socket handles (integers) rather than pointers.

This will let us hold reference counts, and also detect use after free (particularly if we try not to reuse socket IDs too often). Our idhash code lets us allocate ids dynamically quickly and easily, and we can use this to get a quick handle on sockets.

Use of the global table of socket IDs will also allow us to cleverly detect when there are no more active sockets, and finalize the platform, meaning we can close any leaks provided that the application properly does nng_close() on sockets.

autoscale based on CPUs available

We need a platform-specific way to inquire as to how many CPUs are available. This should be used to automatically tune the number of threads used for the system wide taskq; it may be used to drive other decisions later.

SSH based transport

Jason Aten has proposed the creation of a SSH transport. This could be lots easier to work with than say TLS, and so its worth investigating. We should try to use a 3rd party SSH library though.

FreeRTOS support

A question has come up about integration in FreeRTOS. It would be meaningful to demonstrate that the code works for FreeRTOS as an example non-mainstream platform.

tcp sometimes fails to get a port

We see occasional errors from the TCP test at travis where it seems to fail to obtain a port, and hence listen() fails.

Probably we need to avoid hitting the same port over and over again -- my theory is that we are hitting port reuse and the mandatory TCP TIME_FIN delay.

REQ round-robin load-balancing

We need to implement round-robin load balancing for REQ. (Note that mangos has this bug too.) Right now we are just letting pipes race to get the pipe, and relying on the scheduler to provide fair access. Under load this works out, but during lighter conditions it can lead to less optimal scheduling. Instead, we should aim to distribute the work evenly.

Add kqueue support for Darwin/BSD

We would like to have kqueue based polling for improved scalability.

Add other UNIX systems

The CMake and platform support is pretty explicitly just supporting Windows, Linux, and MacOS right now.

The code can easily support any modern UNIX, and we should try ensure that the various BSDs, Solaris, illumos, HP-UX, and AIX.

Security attributes support

nanomsg has a framework for setting up security_attributes (passing through via a property). We should add the same support.

consider abandoning macOS on travis

given that macOS is the slowest by far of the platforms for the CI, and given that we actually develop and test natively on macOS, it may be worth considering eliminating the use of macOS at Travis, at least for non-tagged builds. (Tagged releases should probably still be built and tested on macOS.)

Sure would be nice if we had a nice light weight container solution for macOS....

SURVEY hang in Travis

Travis occasionally hangs in timeout on the Survey. We need to understand why this occurs (race?) and fix. Note that we think we've seen a similar problem with the pipeline test too.

TLS transport

We can implement at TLS transport now, pretty easily, I think. We just need to choose a TLS framework (mbed TLS?) and implement it.

Restore the old idhash logic for sockets

The new-fangled object hash turned out not to be such a great idea, and in particular the deferred destruction is really obtuse. As we stopped using object hashes for pipes or eps as a matter of improving clarity of the code, we should do the same here.

pipeline leaks a pipe

After some level of testing, it would appear that we have a situation where valgrind reports leaks in the pipeline test. The leak is apparently a dialing pipe, and occurs when we close the socket immediately after starting a dial.

expose aio to applications

The AIO paradigm for async IO is super useful and powerful -- much much better than other things we might come up with. We should expose this directly to user programs. In fact, we think this may be superior to the notification hacks we have.

open protocol by "name" (symbol) instead number

In support of #38 we should convert to a better API, using a pattern like we have for mangos.

Imagine that instead of nng_open(nng_socket *, uint16_t proto) we just have something more like nng_pairv0_open(nng_socket *);

This means that the protocols will need to have to have a different way to pass their ops vector into some kind of internal "socket_open_proto()" call. This has the benefit that applications need never know about the protocol number.

This is really good, because most applications don't need to know the protocol numbers at all; this also allows for pluggable protocols to be built by alllowing them to supply their ops vectors at call time, without requiring any kind of startup constructor/initialization to register them into the main list of protocols.

(Sadly, we still need to do this for transports, for now at least.)

NNG_FLAG_SYNCH should be the default

Arguably, NNG_FLAG_SYNCH behavior is the preferred behavior in all circumstances. Users using the new API should have to request ASYNC behavior explicitly (NOWAIT flag?)

Convert string/errno statements into a table

One of the poorer code coverage reports relates to all the various cases for various error codes. We should move these into an initialized table then use a simple loop over them. This would greatly improve the code coverage reports.

Consider making Windows use threadpools

Thread Pools offer some attractive benefits over raw I/O completion ports, including a simpler programming model and better automatic scaling. We might be able to use these to handle taskq type tasks too!

Make pipe and endpoint structures private

From a cleanliness/architecture perspective, the pipe and endpoint structures should
ideally not be exposed outside of their implementation.

Crash in IPC (POSIX)

The IPC test occasionally crashes, due to a race condition in the posix poller.

Essentially, pollq makes an assumption that nothing will have rearmed the file descriptor while it is running the callback. This is not quite true, since a pipe accessing the other direction at the wrong time can indeed cause a problem here.

The pollq logic for poll() is a bit obtuse, so it might be a great idea to go back and rethink this to use simpler registration of pollq items that records outstanding aios, and uses aios to trigger completion.

Address determinations should be via nng_sockaddr

We should pass nng_sockaddr internally instead of URLs.

Occasional orphaned pipe or endpoint?

When running the ipc test in a tight loop, we will occasionally see failures like this:

Failures:

  * Listen and accept (Assertion Failed)
  File: /home/parallels/Projects/nng/tests/trantest.h
  Line: 92
  Test: nng_listen(tt->repsock, tt->addr, &ep, NNG_FLAG_SYNCH) == 0

(Sometimes it can be a dial failure too.) I'm fairly certain that what is happening here is that the /tmp/nng_ipc_test pipe isn't getting cleaned up (and I think I've seen a similar problem with TCP), resulting in EADDRINUSE. After the program exits, the thing is gone.

It takes many runs to trigger this failure. Usually several hundred at least. This is on an Ubuntu vm, but I think I've seen failures in MacOS too.

I added a slight delay in my loop, and that didn't fix the problem (but it took a lot longer to hit, of course.) (I was hoping that maybe this was an operating system level race, where close() didn't free the resource entirely... no such luck.)

More work to track this down is needed.

Consider a heap/priority queue for the aio timeouts

The AIO timeout logic uses a really naive sorted list for ordering timeouts. This can lead to some pretty unfortunate performance impacts (O(n)). A priority heap would be faster for insertion by far. (Note that running the timeout thread is still O(1) with the linked list, and would remain so with any sane heap implementation.)

Timer priority list improvements

The timer logic uses a logical minheap, implemented as a sorted linked list. Insertions into this list, which can happen reasonably often, are thus O(n).

That said, most of the timeouts are kind of FIFO like... we can pop from the beginning of the list and add to the end of the list -- in general a new timer will have a later timeout than some earlier added ones.

There may be some interesting exceptions to this. For example, survey or req retry timeouts might be quite long, while individual send/recv timeouts might be relatively short.

There are at least three kinds of optimization to consider:

a) Converting to a minheap. This would convert insert into O(log n), but it may have the bad effect of making removal of the min also O(log n). (It also makes removal of the node O(log(n))

b) Breaking the list/heap into per-CPU based ones, and running a separate timeout thread for each real CPU. This may reduce some lock contention, and it would lead to generally shorter queues on the lists. (Note that it may break slightly the property that two events, both with timeouts very near to each other, run in a specific order. I do not think we have any requirement that timeouts be strictly ordered.)

c) Breaking the list up into near and long term expirations. Basically two threads (or more) using a binning strategy; this would probably avoid some unpleasant mixing of timeouts coming from very different sources causing longer list traversals, and might insulate the current linked list approach so that it tends to much more often behave in the optimal O(1) case.

My initial instinct is that we probably want to some combination of b and/or c. I feel pretty strongly that the O(log(n)) behavior of a true minheap is going to perform poorly.

Note that item "c" could still be backed by a single thread (or single thread per CPU), just looking at the minimum of both lists instead of just one.

A very nice project for an external contributor to do would be to model all these and see how they really behave.

Transport ops vector should be versioned

With #37 coming in, we need to remember that ops vectors should be versioned.

Make protocols "pluggable", or at least optional

We should contemplate ways to remove the hardwired list of protocols. Probably some of these need to be compile time builtin by default, but arguably even then it can be controlled via conditional compilation.

It would also be nice to make it possible for 3rd parties to extend the protocols by giving them a wa to register new protocols with the framework. Probably the API needs to be versioned, because the internal guts of the communication between the core and the protocols is not very "clean", and it might be hard to separate them fully. (That said, some of the work there is already done.)

make device() use aios directly

We could save several extra context switches by using AIOs directly from within nni_device.

If we implement #45 this becomes even more easy. This can also possibly benefit from #22

Device (nng_device(), nn_device()) implementation

We need to implement nng_device and co.

Move DNS out of tcp transport

DNS resolver stuff really needs to be done not inside the specific transports, but rather outside. We should think about how best to make use of this in more generic fashion.

Websocket transport

We need a websocket transport.

The websocket client support is straightforward, and providing equivalent server functionality to nanomsg would be easy. It might be worth looking at (thinking about) richer integration into extant web frameworks. That said, its unclear how that would work with C.

Add epoll() support on Linux

We need an epoll() based poller to increase scalability.

Complete the endpoint API

We specified an API for endpoints, that allowed endpoints to be created, giving the ability to change the endpoint properties, before starting up listen/connect. We need to implement that.

Close pipe by ID

For messages, we could store the PIPE ID as a message property. This could then be used by the application to conditionally close the underlying pipe (if it still exists). Using IDs here would solve questions about use-after-free, and avoid needing to hold any locks or long term reference counts.

Performance needs work

We greatly improved performance recently, but it still needs a lot more work. We are several multiples slower than stock nanomsg and mangos. Lock contention is suspected.

nanomsg compatibility layer

We need to implement an API compatibility shim for nanomsg.

The things we will have to add are:

a) message allocation etc. (NN_MSG)
b) integer socket descriptors (and a socket descriptor table)
c) NN_RECVFD, NN_SENDFD, etc.

Plus a whole bunch of other more straight-forward stuff.