Change the structure of the configuration file (in a backwards
compatible manner) to allow specifying configurations for multiple
domains in a file. (Listing multiple files in CYCLONEDDS_URI was
already supported.) A configuration specifies an id, with a default of
any, configurations for an incompatible id are ignored.
If the application specifies an id other than DDS_DOMAIN_DEFAULT in the
call to create_participant, then only configuration specifications for
Domain elements with that id or with id "any" will be used. If the
application does specify DDS_DOMAIN_DEFAULT, then the id will be taken
from the first Domain element that specifies an id. If none do, the
domain id defaults to 0. Each applicable domain specification is taken
as a separate source and may override settings made previously.
All settings moved from the top-level CycloneDDS element to the
CycloneDDS/Domain element. The CycloneDDS/Domain/Id element moved to
become the "id" attribute of CycloneDDS/Domain. The old locations still
work, with appropriate deprecation warnings.
Signed-off-by: Erik Boasson <eb@ilities.com>
The default participant QoS/plist that is used for defaulting received
QoS and for determining which QoS/plist entries to send in discovery
data was mixed up with the one that contains local process information
such as hostname and process id.
It moreover was modified after starting up the protocol stack, and hence
after discovery of remote participants. While unlikely, this could lead
to an assertion in plist_or_xqos_mergein_missing.
Signed-off-by: Erik Boasson <eb@ilities.com>
A QoS change can happen at the same time that a new reader for a
built-in topic is provisioned with historical data, and so cause reading
in inconsistent QoS, use-after-free or other fun things.
During QoS matching it is also necessary to guarantee the QoS doesn't
change (QoS changes affecting matching will be supported at some point,
and manipulating complex data structures where bitmasks determine which
parts are defined while reading the same data concurrently is a recipe
for disaster.
Signed-off-by: Erik Boasson <eb@ilities.com>
The big issue is the there is still only a single log output that gets
opened on creating a domain and closed on deleting one, but otherwise at
least this minimal test works.
The other issue is that the GC waits until threads in all domains have
made sufficient progress, rather than just the threads in its own
domain.
Signed-off-by: Erik Boasson <eb@ilities.com>
This commit moves all but a handful of the global variables into the
domain object, in particular including the DDSI configuration, globals
and all transport internal state.
The goal of this commit is not to produce the nicest code possible, but
to get a working version that can support multiple simultaneous domains.
Various choices are driven by this desire and it is expected that some
of the changes will have to be undone. (E.g., passing the DDSI globals
into address set operations and locator printing because there is no
other way to figure out what transport to use for a given locator;
storing the transport pointer inside the locator would solve that.)
Signed-off-by: Erik Boasson <eb@ilities.com>
Thread liveliness monitoring moves to dds_global and there is one
monitor running if there is at least one domain that requests it. The
synchronization over freeing the thread name when reaping the thread
state is gone by no longer dynamically allocating the thread name.
Signed-off-by: Erik Boasson <eb@ilities.com>
This moves DDSI stack initialisation and finalisation to the creating
and deleting of a domain, and modifies the related code to trigger all
that from creating/deleting participants.
Built-in topic generation is partially domain-dependent, so that moves
as well. The underlying ddsi_sertopics can be created are domain
independent and created without initialising DDSI, which necessitates
moving the IID generation (and thus init/fini) out of the DDSI stack and
to what will remain global data.
Signed-off-by: Erik Boasson <eb@ilities.com>
This makes it possible to use a different RHC implementations for
different readers and removes the need for the RHC interface to be part
of the global state.
Signed-off-by: Erik Boasson <eb@ilities.com>
The payload in a struct serdata_default is assumed to be at a 64-bit
offset for conversion to/from a dds_{i,o}stream_t and getting padding
calculations in the serialised representation correct. The definition
did not guarantee this and got it wrong on a 32-bit release build.
This commit computes the required padding at compile time and at
verifies the assumption holds where it matters.
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Thijs Sassen <thijs.sassen@adlinktech.com>
Adjusted the close methode not to expand by the lwip close macro and added a check for DDSI_INCLUDE_SSM to match the correct pid table size.
Signed-off-by: Thijs Sassen <thijs.sassen@adlinktech.com>
Multiple writers for a single instance is pretty rare, so it makes sense
to lazily allocate the tables for keeping track of them. The more
elegant solution would be to have a single lock-free table.
Signed-off-by: Erik Boasson <eb@ilities.com>
Rather than allocate a HH_HOP_RANGE large array of buckets, allocate
just 1 if the initial size is 1, then jump to HH_HOP_RANGE as soon as a
second element is added to the table. There are quite a few cases where
hash tables are created where there never be more than 1 (or even 0)
elements in the table (e.g., a writer without readers, a reader for a
keyless topic).
Signed-off-by: Erik Boasson <eb@ilities.com>
There were inconsistencies in the order in which entity locks were taken
when multiple entities needed to be locked at the same time. In most
cases, the order was first locking entity X, then locking the parent
entity of X. However, in some cases the order was reversed, a likely
cause of deadlocks.
This commit sorts these problems, and in particular propagating
operations into children. The entity refcount is now part of the handle
administration so that it is no longer necessary to lock an entity to
determine whether it is still allowed to be used (previously it had to
check the CLOSED flag afterward). This allows recursing into the
children while holding handles and the underlying objects alive, but
without violating lock order.
Attendant changes that would warrant there own commits but are too hard
to split off:
* Children are now no longer in a singly linked list, but in an AVL
tree; this was necessary at some intermediate stage to allow unlocking
an entity and restarting iteration over all children at the "next"
child (all thanks to the eternally unique instance handle);
* Waitsets shifted to using arrays of attached entities instead of
linked lists; this was a consequence of dealing with some locking
issues in reading triggers and considering which operations on the
"triggered" and "observed" sets are actually needed.
* Entity status flags and waitset/condition trigger counts are now
handled using atomic operations. Entities are now classified as
having a "status" with a corresponding mask, or as having a "trigger
count" (conditions). As there are fewer than 16 status bits, the
status and its mask can squeeze into the same 32-bits as the trigger
count. These atomic updates avoid the need for a separate lock just
for the trigger/status values and results in a significant speedup
with waitsets.
* Create topic now has a more rational behaviour when multiple
participants attempt to create the same topic: each participant now
gets its own topic definition, but the underlying type representation
is shared.
Signed-off-by: Erik Boasson <eb@ilities.com>
Add the instance handle to the DDSC entity type, initialize it properly
for all types, and remove the per-type handling of
dds_get_instance_handle. Those entities that have a DDSI variant take
the instance handle from DDSI (which plays tricks to get the instance
handles of the entities matching the built-in topics). For those that
do not have a DDSI variant, just generate a unique identifier using the
same generate that DDSI uses.
Signed-off-by: Erik Boasson <eb@ilities.com>
Thread sanitizer warns about reads and writes of variables that are
meant to be read without holding a lock:
* Global "keep_going" is now a ddsrt_atomic_uint32_t
* Thread "vtime" is now a ddsrt_atomic_uint32_t
Previously the code relied on the assumption that a 32-bit int would be
treated as atomic, now that is all wrapped in ddsrt_atomic_{ld,st}32.
These being inline functions doing exactly the same thing, there is no
functional change, but it does allow annotating the loads and stores for
via function attributes on the ddsrt_atomic_{ld,st}X.
The concurrent hashtable implementation is replaced by a locked version
of the non-concurrent implementation if thread sanitizer is used. This
changes eliminates the scores of problems signalled by thread sanitizer
in the GUID-to-entity translation and the key-to-instance id lookups.
Other than that, this replaces a flag used in a waitset test case to be
a ddsrt_atomic_uint32_t.
Signed-off-by: Erik Boasson <eb@ilities.com>
* calling ddsrt_memdup, ddsrt_strdup with a null pointer (they handle it
gracefully but forbid it in the interface ...)
* replacement of all pre-C99 flexible arrays (i.e., declaring as
array[1], then mallocing and using as if array[N]) by C99 flexible
arrays.
* also add a missing null-pointer test in dds_dispose_ts, and fix the
test cases that pass a null pointer and a non-writer handle to it to
instead pass an invalid adress
Signed-off-by: Erik Boasson <eb@ilities.com>
* dds_set_allocator
* dds_set_aligned_allocator
The intent behind them is good, but the approach too primitive ... there
is far more work to be done for managing dynamic allocation in a
meaningful way.
Signed-off-by: Erik Boasson <eb@ilities.com>
Missing prototypes for exported functions cause a really huge issue on
Windows. Enabling the "missing prototypes" warning makes it much easier
to catch this problem. Naturally, any warnings caused by this have been
fixed.
Signed-off-by: Erik Boasson <eb@ilities.com>
This commit adds support for changing all mutable QoS except those that
affect reader/writer matching (i.e., deadline, latency budget and
partition). This is simply because the recalculation of the matches
hasn't been implemented yet, it is not a fundamental limitation.
Implementing this basically forced fixing up a bunch of inconsistencies
in handling QoS in entity creation. A silly multi-process ping-pong
test built on changing the value of user data has been added.
Signed-off-by: Erik Boasson <eb@ilities.com>
These topics are generated internally and never sent over the wire.
Performing full discovery for these is therefore a significant waste of
effort.
Signed-off-by: Erik Boasson <eb@ilities.com>
The old parameter list parsing was a mess of custom code with tons of
duplicated checks, even though parameter list parsing really is a fairly
straightforward affair. This commit changes it to a mostly table-driven
implementation, where the vast majority of the settings are handled by a
generic deserializer and the irregular ones (like reliability, locators)
are handled by custom functions. The crazy ones (IPv4 address and port
rely on additional state and are completely special-cased).
Given these tables, the serialization, finalisation, validation,
merging, unalias'ing can all be handled by a very small amount of custom
code and an appropriately defined generic function for the common cases.
This also makes it possible to have all QoS validation in place, and so
removes the need for the specialized implementations for the various
entity kinds in the upper layer.
QoS inapplicable to an entity were previously ignored, allowing one to
have invalid values set in a QoS object when creating an entity,
provided that the invalid values are irrelevant to that entity. Whether
this is a good thing or not is debatable, but certainly it is a good
thing to avoid copying in inapplicable QoS settings. That in turn means
the behaviour of the API can remain the same.
It does turn out that the code used to return "inconsistent QoS" also
for invalid values. That has now been rectified, and it returns
"inconsistent QoS" or "bad parameter" as appropriate. Tests have been
updated accordingly.
Signed-off-by: Erik Boasson <eb@ilities.com>
There are some cases where "int" or "unsigend" actually makes sense, but
in a large number of cases it is really supposed to be either a 32-bit
integer, or, in some cases, at least a 32-bit integer. It is much to be
preferred to be clear about this.
Another reason is that at least some embedded platforms define, e.g.,
int32_t as "long" instead of "int". For the ones I am aware of the
"int" and "long" are actually the same 32-bit integer, but that
distinction can cause trouble with printf format specifications. So
again a good reason to be consistent in avoiding the
implementation-defined ones.
Signed-off-by: Erik Boasson <eb@ilities.com>
The functions did not touch the callback pointer if a null pointer was
passed in for the listener. That means one would have to initialize the
out parameter before the call or manually check the listener pointer to
know whether the callback point has a defined value following the call.
That's asking for trouble.
Thus, the decision to return a callback of 0 when no listener object is
passed in.
Signed-off-by: Erik Boasson <eb@ilities.com>
All this duplication was rather useless: the values are standardized
anyway and the conversion was a simple type cast without any check.
This commit unifies the definitions.
* DDSI now uses the definitions of the various QoS "kind" values from
the header file;
* The durations in the QoS objects are no longer in wire-format
representation, the conversions now happen only in conversion to/from
wire format;
* The core DDSI stack no longer uses DDSI time representations for time
stamps, instead using the "native" one;
* QoS policy ids duplication has been eliminated, again using the IDs
visible in the API -- the actual values are meaningless to the DDSI
stack anyway.
Signed-off-by: Erik Boasson <eb@ilities.com>
Code formatting was quite a mess (different indentation, completely
different ideas on where opening braces should go, spacing in various
places, early out versus single return or goto-based error handling,
&c.). This commit cleans it up.
A few doxygen comment fixes allowed turning on Clang's warnings for
doxygen comments, so those are no enabled by default as least on
Xcode-based builds.
Signed-off-by: Erik Boasson <eb@ilities.com>
* Remove dds_return_t / dds_retcode_t distinction (now there is only
dds_return_t and all error codes are always negative)
* Remove Q_ERR_... error codes and replace them by DDS_RETCODE_...
ones so that there is only one set of error codes
* Replace a whole bunch "int" return types that were used to return
Q_ERR_... codes by "dds_return_t" return types
Signed-off-by: Erik Boasson <eb@ilities.com>
Two bits of the DDSI encoding "options" field are used by the XTypes
spec to indicate the amount of padding that had to be added at the end
to reach the nearest 4-byte boundary as required by the DDSI message
format.
These bits are now set in according with the spec, and for received
samples, the padding is subtracted from the inferred size of the data so
that, e.g., a struct T { octet x; } will never deserialise as a struct S
{ octet x, y; }.
Signed-off-by: Erik Boasson <eb@ilities.com>
The CDR deserializer failed to check it was staying within the bounds of
the received data, and it turns out it also was inconsistent in its
interpretation of the (undocumented) serializer instructions. This
commit adds some information on the instruction format obtained by
reverse engineering the code and studying the output of the IDL
preprocessor, and furthermore changes a lot of the types used in the
(de)serializer code to have some more compiler support. The IDL
preprocessor is untouched and the generated instructinos do exactly the
same thing (except where change was needed).
The bulk of this commit replaces the implementation of the
(de)serializer. It is still rather ugly, but at least the very long
functions with several levels of nested conditions and switch statements
have been split out into multiple functions. Most of these have single
call-sites, so the compiler hopefully inlines them nicely.
The other important thing is that it adds a "normalize" function that
validates the structure of the CDR and performs byteswapping if
necessary. This means the deserializer can now assume a well-formed
input in native byte-order. Checks and conditional byteswaps have been
removed accordingly.
It changes some types to make a compile-time distinction between
read-only, native-endianness input, a native-endianness output, and a
big-endian output for dealing with key hashes. This should reduce the
risk of accidentally mixing endianness or modifying an input stream.
The preprocessor has been modified to indicate the presence of unions in
a topic type in the descriptor flags. If a union is present, any
memory allocated in a sample is freed first and the sample is zero'd out
prior to deserializing the new value. This is to prevent reading
garbage pointers for strings and sequences when switching union cases.
The test tool has been included in the commit but it does not get run by
itself. Firstly, it requires the presence of OpenSplice DDS as an
alternative implementation to check the CDR processing against.
Secondly, it takes quite a while to run and is of no interest unless one
changes something in the (de)serialization.
Finally, I have no idea why there was a "CDR stream" interface among the
public functions. The existing interfaces are fundamentally broken by
the removal of arbitrary-endianness streams, and the interfaces were
already incapable of proper error notification. So, they have been
removed.
Signed-off-by: Erik Boasson <eb@ilities.com>
In all cases where read/take allocates memory for storing samples but
the result turns out to be an empty set, the (observable) state of the
system should end up unchanged.
It turns out several cases were/are considered:
* application supplies buffers (i.e., buf[0] != NULL): no memory
allocated, so no issue.
* reader has no cached set ("m_loan" in the current code): read/take
allocated memory, cached the address and marked it as in use
("m_loan_out"), and modified buf[0] (and subsequent entries).
To undo this on returning an empty set, it now: resets the
"m_loan_out" flag to allow the cached buffer to be reused, and sets
buf[0] back to NULL.
* reader has a cached set, but it is not marked in use: read/take
marked it as in use and modified buf[0] (and subsequent entries).
To undo this, it now resets "m_loan_out" to indicate the cached buffer
is not in use, and sets buf[0] back to NULL.
* reader has a cached set that is currently in use: read/take allocated
memory and updated buf[0] (and subsequent entries) but left the cached
state alone.
To undo this, it now frees the memory and sets buf[0] back to NULL.
With this, in any path where the application lets dds_read/dds_take
allocate memory for the samples:
* it can still safely call dds_return_loan with buf[0] and the actual
return value of read/take (even if an error code), and whatever memory
was allocated will not be leaked;
* but it no longer has to do so when the result was empty (or error).
Signed-off-by: Erik Boasson <eb@ilities.com>
The primary reason is that this allows the implementator of the sertopic
to freely select an allocation strategy, instead of being forced to
allocate the sertopic itself and the names it contains in the common
header with ddsrt_malloc. The secondary reason is that it brings it in
line with the serdata.
Signed-off-by: Erik Boasson <eb@ilities.com>
The name parameter and the name in the sertopic parameter had to match
because it used the one as a key in a lookup checking whether the topic
exists already, and the other as key for the nodes in that index. As
the name is (currently) included in the sertopic, it shouldn't be passed
in separately as well.
Signed-off-by: Erik Boasson <eb@ilities.com>
The entirely historical "DDSI2E" element within the CycloneDDS
configuration element is herewith eliminated. All settings contained in
that element (such as General, Discovery, Tracing) are now subelements
of the CycloneDDS top-level element. Old configurations continue to
work but will print a deprecation warning:
//CycloneDDS/DDSI2E: settings moved to //CycloneDDS
Any warnings/errors related for an element //CycloneDDS/DDSI2E/x will be
reported as errors for the new location, that is, for //CycloneDDS/x.
As the "settings moved" warning always precedes any other such warning,
confusion will hopefully be avoided.
Signed-off-by: Erik Boasson <eb@ilities.com>
These settings all constitute settings from the long history of the DDSI
stack predating Eclipse Cyclone DDS and can reasonably be presumed never
to have been used in Cyclone. Their removal is therefore not expected
to break backwards compatibility (which would be anyway be limited to
Cyclone complaining about undefined settings at startup):
* Tracing/Timestamps[@absolute]: has always been ignored
* Tracing/Timestamps: has always been ignored
* General/EnableLoopback: ignored for quite some time, before that
changing it from the default resulted in crashes.
* General/StartupModeDuration: it did what it advertised (retain data in
the history caches of volatile writers as-if they were transient-local
with a durability history setting of keep-last 1 for the first few
seconds after startup of the DDSI stack) but had no purpose other than
complicating things as the volatile readers ignored the data anyway.
* General/StartupModeCoversTransient: see previous -- and besides,
transient data is not supported yet in Cyclone.
* Compatibility/RespondToRtiInitZeroAckWithInvalidHeartbeat: arguably a
good setting given that DDSI < 2.3 explicitly requires that all
HEARTBEAT messages sent by a writer advertise the existence of at least
1 sample, but this has been fixed in DDSI 2.3. As this requirement was
never respected by most DDSI implementations, there is no point in
retaining the setting, while it does remove a rather tricky problem
immediately after writer startup involving the conjuring up of a
sample that was annihilated immediately before it could have been
observed.
That conjuring up (as it turns out) can cause a malformed message to go
out (one that is harmless in itself). Fixing the generation of that
malformed message while the entire point of the trick is moot in DDSI
2.3 is a bit silly.
Note that full DDSI 2.3 compliance needs a bit more work, so not
bumping the DDSI protocol version number yet.
* Compatibility/AckNackNumbitsEmptySet: changing it from 0 breaks
compatibility with (at least) RTI Connext, and its reason for
existence disappers with a fix in DDSI 2.3.
* Internal/AggressiveKeepLastWhc: changing the setting from the default
made no sense whatsoever in Cyclone -- it would only add flow-control
and potentially block a keep-last writer where the spec forbids that.
* Internal/LegacyFragmentation: a left-over from almost a decade ago when
it was discovered that the specification was inconsistent in the use
of the message header flags for fragmented data, and this stack for a
while used the non-common interpretation. There is no reasonable way of
making the two modes compatible, and this setting merely existed to
deal with the compatibility issue with some ancient OpenSplice DDS
version.
* Durability/Encoding: historical junk.
* WatchDog and Lease: never had any function in Cyclone.
Signed-off-by: Erik Boasson <eb@ilities.com>
High sample rates require rather high rates of allocating and freeing
WHC nodes, serialised samples (serdata), and RTPS message fragments
(xmsg). A bunch of dedicated parallel allocators help take some
pressure off the regular malloc/free calls. However, these used to
gobble up memory like crazy, in part because of rather generous limits,
and in part because there was no restriction on the size of the samples
that would be cached, and it could end up caching large numbers of
multi-MB samples. It should be noted that there is no benefit to
caching large samples anyway, because the sample rate will be that much
lower.
This commit reduces the maximum number of entries for all three cases,
it furthermore limits the maximum size of a serdata or xmsg that can be
cached, and finally instead of instantiating a separate allocator for
WHC nodes per WHC, it now shares one across all WHCs. Total memory use
should now be limited to a couple of MB.
The caching can be disabled by setting ``FREELIST_TYPE`` to
``FREELIST_NONE`` in ``q_freelist.h``.
Signed-off-by: Erik Boasson <eb@ilities.com>