This page contains a high-level overview of the overall organization of roscpp's internals, starting from the bottom up.
General Philosophy
The philosophy for roscpp is that only a very specific set of APIs is externally visible. This means that none of the public headers are allowed to include the private ones, unless those private ones require templating. This means type erasure happens early and often.
In general we try to minimize the number of external headers included. For example, no Boost.Thread headers are allowed in the common public APIs (CallbackQueue is an exception, and should be fixed), and in the future we'd like to remove Boost.Bind as well, since it adds significant compile time.
xmlrpc
We use a modified 3rdparty library called xmlrpcpp. It unfortunately does not allow us to use our own polling code for its sockets, so we spin up a separate thread for it to do its work in.
While we do use xmlrpcpp, all use of it goes through the XMLRPCManager singleton, and all communication with the master goes through the master::execute function.
PollSet/PollManager
At the lowest level, all the non-xmlrpc networking is done using poll(), and is managed by the PollSet class. There is currently one PollSet used, which is managed by the PollManager singleton. PollSet lets you register file descriptors with it, and add events (read or write) to be listened for.
Connections and Transports
The transport system attempts to make transport types (e.g. TCP, UDP) generic at the lowest level. There is a base class, Transport, that is extended by TransportTCP and TransportUDP which provide methods like read() and write(). At a higher level, the Connection class provides a callback-based way of reading and writing data to a transport. The Connection class manages tracking how much data has been sent/received and calling callbacks when the operation has finished.
Connections are tracked by the ConnectionManager class, which mainly manages incoming TCP/UDP connections, as well as keeping pointers to Connection objects when they're dropped until they're allowed to be deleted. When the ConnectionManager receives a new incoming TCP or UDP connection, it will create either a TransportSubscriberLink or ServiceClientLink depending on whether it is a topic or service connection.
Initialization
This is explained in the Overview as well, but I might as well go over it here. There are two parts to roscpp's initialization, the init() method and the start() method. init() does very little -- it simply parses the environment and command line for things like the master URI, local hostname/ip address, remappings, etc. init() is not allowed to actually do any communication -- this is so that the user can manually do things like check if the master is up and have custom behavior if it's not. init() also does not start any threads.
start() performs most of the actual initialization. start() can either be called manually, in which case you are required to call shutdown(), or it will automatically be called by the first NodeHandle created, and then shutdown() will automatically be called when the last NodeHandle is destroyed.
start() will...
- initialize all of the various singletons
- install the SIGINT signal handler if it's supposed to
- create the rosout log4cxx appender 
- create the ~get_loggers and ~set_logger_level services 
- check the /use_sim_time parameter and if it's set, subscribe to /clock 
Topics
Topics are managed by the TopicManager, which keeps the list of Subscriptions and Publications. By the time anything arrives at the TopicManager, any compile-time time information has been erased. This means that, for example, TopicManager::subscribe() only has access to whatever is in the SubscribeOptions class.
Subscriptions
Subscriptions to a topic are managed by the Subscription class. Only one Subscription class is allowed for a topic, and multiple subscribers to the same topic will share a single connection, preventing bandwidth wastage.
The Subscription object manages N PublisherLink objects, which each connect to a different publisher on that topic. There are two types of PublisherLink objects. TransportPublisherLink connects to a publisher using the transport layer. IntraprocessPublisherLink connects to a local, intraprocess publisher, and allows those to skip the transport layer entirely (and provide no-copy message passing when possible).
publisherUpdate
When a new publisher registers with the master, the node will receive a "publisherUpdate" xmlrpc call, which gets forwarded to Subscription::pubUpdate on the right topic. For non-intraprocess connections, this then starts an asynchronous xmlrpc request to the new publisher. When that xmlrpc request goes through, the Subscription::pendingConnectionDone() method is called. Once this request goes through, a new TransportPublisherLink is created for that publisher.
Retry
A TCP TransportPublisherLink that gets disconnected from its publisher will attempt to retry its connection. Its first retry is scheduled for 100ms after the disconnect, and that time doubles every time it fails to connect until it hits 20 seconds, where it is capped.
Subscription Callbacks and Deserialization
When a message is received, it's pushed into a SubscriptionQueue, which handles throwing away old messages if the queue fills up. An important thing to note is that the message has not been deserialized yet. Instead of pushing the message itself onto the queue, a MessageDeserializer object is pushed onto it, and this object is shared by multiple callbacks if possible (if their rtti info matches). This means that:
- Messages that are thrown away because their queue fills up are never deserialized
- Messages are not deserialized until the first callback associated with that subscription is called
Publications
Publications on a topic are managed by the Publication class. Only one Publication class is allowed for a topic, and multiple publishers on the same topic share connections to their subscribers.
The Publication object manages N SubscriberLink objects, which are each connected to a different subscriber on that topic. There are two types of SubscriberLink objects. TransportSubscriberLink connects to a subscriber using the transport layer (and is created by the ConnectionManager), while IntraprocessSubscriberLink connects to an intraprocess subscriber, and skips the transport layer entirely.
No-Copy Intraprocess Support
roscpp uses a combination of boost::shared_ptr and rtti to provide mostly-safe no-copy intraprocess message passing. It's only mostly safe because it relies on the publisher to never modify the message object it has just published. Note that this only applies if the message is passed in as a boost::shared_ptr. Otherwise it currently still requires a serialize()/deserialize(), though it will skip the transport layer.
The path of an intraprocess message is as follows:
- Publisher::publish() 
- TopicManager::publish() - calls Publication::getPublishTypes() to determine what kind of subscribers we have 
- If we have any "serialization required" subscribers, serialize the message
- If we don't have any "no-copy" subscribers, immediately clear out the message pointer and rtti information 
 
- Publication::publish() - Finds the IntraprocessSubscriberLink in this Publication, if any 
- Directly enqueues a message to it, only allowing "no-copy" publication (this may not be necessary)
 
- IntraprocessSubscriberLink::enqueueMessage() 
- IntraprocessPublisherLink::handleMessage() 
- Subscription::handleMessage() - For each subscriber, checks if the rtti info matches. If so, adds the message to its subscription queue.
 
There are some portions of this that aren't really necessary. For performance reasons it's probably a good idea to get rid of IntraprocessPublisherLink/IntraprocessSubscriberLink and just directly connect Publication and Subscription.
Services
Services are managed by the ServiceManager singleton.
Service Servers
A service server is managed by the ServicePublication class, which tracks a number of ServiceClientLink objects. Since only a single server for a given service is allowed at any given time, the same is true in a roscpp node.
The ServiceClientLink object (created by the ConnectionManager) handles a connection to a single service client.
Service Clients
A connection to a service server is managed by a ServiceServerLink. A new ServiceServerLink is created for each service client.
User API
The user API is almost entirely included through the ros.h header, with the exception of callback_queue.h that needs some cleaning up to remove extra includes before it can be included there (it exposes Boost.Thread).
Most of the user API should be pretty self-explanitory, so I won't go over it. The main thing worth going over is the callback mechanism that allows multiple different callback function signatures.
Multiple Callback Signatures
Subscription callbacks allow multiple callback signatures, e.g.:
The mechanism that allows this is called the ParameterAdapter, which is a template class specialized for different types of parameters. For example, the standard (const MsgConstPtr&) signature looks like this:
   1 template<typename M>
   2 struct ParameterAdapter<const boost::shared_ptr<M const>& >
   3 {
   4   typedef typename boost::remove_reference<typename boost::remove_const<M>::type>::type Message;
   5   typedef ros::MessageEvent<Message const> Event;
   6   typedef const boost::shared_ptr<Message const> Parameter;
   7   static const bool is_const = true;
   8 
   9   static Parameter getParameter(const Event& event)
  10   {
  11     return event.getMessage();
  12   }
  13 };
It provides a Message typedef, which is guaranteed non-const and non-reference. It provides the MessageEvent type, as an Event typedef. It provides a Parameter typedef, which returns the actual value to be passed to the callback, and an is_const static that tells us if the message is constant and not allowed to change.
ParameterAdapter is templated on the function signature (not just the message type), and extracts the message type and other information from that.
ParameterAdapter is used in conjunction with SubscriptionCallbackHelperT (also templated on the function signature), which calls the callback like so:
   1 callback_(ParameterAdapter<P>::getParameter(event));
Adding new callback signatures, as long as they're still single-parameter, is as simple as creating a new ParameterAdapter specialization (which can even be done outside the ROS source if necessary).
Callback Queue
The CallbackQueue class is one of the trickiest cases in roscpp, because it has to support so many corner cases. The current implementation supports:
- Ensuring that any callbacks with a specific owner (specified with the removal id passed to addCallback()) will never be called once that removeByID has returned 
- Removing the callback that is currently in progress
- Recursively calling (calling CallbackQueue::callAvailable() or callOne() from within a callback that is getting called from callAvailable() or callOne() on the same queue 
- Threadsafe calls to callAvailable() and callOne() 
These requirements add a lot of complexity to CallbackQueue, and make it not as efficient as it could be. A queue could be written that is much faster and less complex than the current one, if it chooses either a different interface or a different set of requirements.
Removing (or changing the semantics of) callAvailable() could make CallbackQueue much simpler, and potentially remove the need for a lot of the complexity (such as the need for thread-local storage).
Future
Transports
The transport layer was conceived when we only had a TCP transport, with the idea that any necessary changes would happen when UDP was implemented. This didn't happen, and it turns out the way the transport layer was designed was a mistake, and only supports the TCP case well. It is the cause of our lack of good large-message support over UDP.
The main reason the transport layer doesn't work well right now is that it doesn't know about messages. Currently, for example, receiving messages is a "pull" model. TransportPublisherLink asks its Connection for a 4-byte length, and when that arrives, parses, allocates space, and asks for the message itself. Instead, the transport should be pushing messages to the PublisherLink, and the Connection class can be done away with entirely. Essentially, I designed a more generic networking transport system, when I should have designed a message-passing transport system.
Sub/Pub
While roscpp does support no-copy intraprocess message passing, it's not the fastest of implementations. This is mainly because roscpp began not supporting intraprocess message passing at all, and has grown into its current form. Ideally this would swap around, with the intraprocess case becoming the main one, with the network side hooking into it. With some enhancement of the lock-free structures in the lockfree package, it should even be possible to do it in a lock-free manner, though you'd need a lockfree CallbackQueue implementation as well. That's possible, but maybe not with the same guarantees the current CallbackQueue makes -- this is why the CallbackQueue is abstracted into CallbackQueueInterface.
NodeHandle
IMHO a mistake was made with NodeHandle, in that it does far too much, and everything you can do in roscpp ended up as a method in it. This means that now even if all you need is its namespace/remapping functionality, you drag along a lot of baggage. After a fair amount of thought, I'd prefer a Context class that only deals with namespacing and remapping (and has a notion of both its public and private namespaces). I also wish I had made fewer overloads of each function type, and moved those into the XOptions structures, using the named parameter idiom (like TransportHints). Ideally normal subscribers would also follow the message_filters pattern.
Subscribing to a topic might then look something like this:
Note also the use of context.name() above, which should return a Name object rather than a string. This may be going too far, but there are benefits to not using strings directly (like knowing if a name has been remapped already or not). If a string is passed directly, a global context would be used.
Single Master Assumption
There is currently an assumption made in all of roscpp that there is a single master. This is mainly a holdover from the old ros::Node days, which had the same assumption. The main thing that would need to change to allow multiple masters (other than on the user API side) is internally storing the master along with any topic/service. A Name class would help here, as it could include the master.
On the user API side I've been thinking of something like:
If no MasterContext is specified, it would fall back on the global one (created out of the environment/command line), just like it does now.
Global Singletons
There are a number of global singletons in existence right now. I'd like to clean them up into just a single singleton or global accessor, with a class that manages their lifetime, e.g.:
