cyclonedds

Author	SHA1	Message	Date
Erik Boasson	647f7466d6	Address locking order for entity locks There were inconsistencies in the order in which entity locks were taken when multiple entities needed to be locked at the same time. In most cases, the order was first locking entity X, then locking the parent entity of X. However, in some cases the order was reversed, a likely cause of deadlocks. This commit sorts these problems, and in particular propagating operations into children. The entity refcount is now part of the handle administration so that it is no longer necessary to lock an entity to determine whether it is still allowed to be used (previously it had to check the CLOSED flag afterward). This allows recursing into the children while holding handles and the underlying objects alive, but without violating lock order. Attendant changes that would warrant there own commits but are too hard to split off: * Children are now no longer in a singly linked list, but in an AVL tree; this was necessary at some intermediate stage to allow unlocking an entity and restarting iteration over all children at the "next" child (all thanks to the eternally unique instance handle); * Waitsets shifted to using arrays of attached entities instead of linked lists; this was a consequence of dealing with some locking issues in reading triggers and considering which operations on the "triggered" and "observed" sets are actually needed. * Entity status flags and waitset/condition trigger counts are now handled using atomic operations. Entities are now classified as having a "status" with a corresponding mask, or as having a "trigger count" (conditions). As there are fewer than 16 status bits, the status and its mask can squeeze into the same 32-bits as the trigger count. These atomic updates avoid the need for a separate lock just for the trigger/status values and results in a significant speedup with waitsets. * Create topic now has a more rational behaviour when multiple participants attempt to create the same topic: each participant now gets its own topic definition, but the underlying type representation is shared. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-06-28 12:47:27 +02:00
Erik Boasson	3322fc086d	Table-driven parameter list handling The old parameter list parsing was a mess of custom code with tons of duplicated checks, even though parameter list parsing really is a fairly straightforward affair. This commit changes it to a mostly table-driven implementation, where the vast majority of the settings are handled by a generic deserializer and the irregular ones (like reliability, locators) are handled by custom functions. The crazy ones (IPv4 address and port rely on additional state and are completely special-cased). Given these tables, the serialization, finalisation, validation, merging, unalias'ing can all be handled by a very small amount of custom code and an appropriately defined generic function for the common cases. This also makes it possible to have all QoS validation in place, and so removes the need for the specialized implementations for the various entity kinds in the upper layer. QoS inapplicable to an entity were previously ignored, allowing one to have invalid values set in a QoS object when creating an entity, provided that the invalid values are irrelevant to that entity. Whether this is a good thing or not is debatable, but certainly it is a good thing to avoid copying in inapplicable QoS settings. That in turn means the behaviour of the API can remain the same. It does turn out that the code used to return "inconsistent QoS" also for invalid values. That has now been rectified, and it returns "inconsistent QoS" or "bad parameter" as appropriate. Tests have been updated accordingly. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-06-10 10:45:53 +02:00
Erik Boasson	12e6946163	Remove QoS duplication between DDS and DDSI All this duplication was rather useless: the values are standardized anyway and the conversion was a simple type cast without any check. This commit unifies the definitions. * DDSI now uses the definitions of the various QoS "kind" values from the header file; * The durations in the QoS objects are no longer in wire-format representation, the conversions now happen only in conversion to/from wire format; * The core DDSI stack no longer uses DDSI time representations for time stamps, instead using the "native" one; * QoS policy ids duplication has been eliminated, again using the IDs visible in the API -- the actual values are meaningless to the DDSI stack anyway. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-06-10 10:42:52 +02:00
Erik Boasson	a652ecb78e	ensure delivery of writes immediately following pub match event (#165 ) A long-standing bug of Cyclone is that a sample written immediately after a publication-matched event may never arrive at the reader that was just matched. This happened because the reader need not have completed discovery of the writer by the time the writer discovers the reader, at which point the reader ignores the sample because it either doesn't know the writer at all, or it hasn't yet seen a Heartbeat from it. That Heartbeat arrives shortly after, but by then it is too late: the reader slaves decides to accept the next sample to be written by the writer. (It has no choice, really: either you risk losing some data, or you will be requesting all historical data, which is empathically not what a volatile reader is about ...) A related issue is the handling of historical data for transient-local readers: it used to deliver this out-of-order, but that is firstly against the specification, and secondly, against reasonable expectations of those who use DDS as a mere publish-subscribe messaging system. To add insult to injury, it didn't completely handle some reordering issues with disposes ... This commit changes the way writers respond to a request for retransmission from volatile proxy readers and the way the in-sync/out-of-sync setting of a reader with respect to a proxy-writer is used. The first makes it safe for a Cyclone reader to ask a Cyclone writer for all data (all these details not being covered in the specs it errs on the reasonable side for other vendors, but that may cause the data loss mentioned above): the writer simply send a Gap message to the reader for all the sequence numbers prior to the matching. The second changes the rule for switching from out-of-sync to in-sync: that transition is now simply once the next sequence number to be delivered to the reader equals the next sequence number that will be delivered directly from the proxy writer object to all readers. (I.e., a much more intuitive notion than reaching some seemingly arbitrary sequence number.) To avoid duplicates the rule for delivery straight from a proxy writer has changed: where samples were delivered from the proxy writer to all matching readers, they are now delivered only to the matching readers that are in-sync. To avoid ordering problems, the idea that historical data can be delivered through the asynchronous delivery path even when the regular data goes through the synchronous delivery path has been abandoned. All data now always follows the same path. As these same mechanisms are used for getting historical data into transient-local readers, the ordering problem for the historical data also disappeared. The test stuff in src/core/xtests/initsampledeliv covers a lot of the interesting cases: data published before the existene of a reader, after it, mixes of volatile and transient-local. Running them takes quite a bit of time, and they are not yet integrated in the CI builds (if ever, because of that time). Note: the "conservative built-in startup" option has been removed, because it really makes no sense to keep a vague compatibility option added a decade ago "just in case" that has never been used ... Note: the workaround in the src/mpt/tests/basic/procs/hello.c (use transient-local to ensure delivery of data) has been removed, as has been its workaround for the already-fixed #146. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-05-29 13:20:37 +02:00
Erik Boasson	3067a69c92	validate and normalize received CDR data The CDR deserializer failed to check it was staying within the bounds of the received data, and it turns out it also was inconsistent in its interpretation of the (undocumented) serializer instructions. This commit adds some information on the instruction format obtained by reverse engineering the code and studying the output of the IDL preprocessor, and furthermore changes a lot of the types used in the (de)serializer code to have some more compiler support. The IDL preprocessor is untouched and the generated instructinos do exactly the same thing (except where change was needed). The bulk of this commit replaces the implementation of the (de)serializer. It is still rather ugly, but at least the very long functions with several levels of nested conditions and switch statements have been split out into multiple functions. Most of these have single call-sites, so the compiler hopefully inlines them nicely. The other important thing is that it adds a "normalize" function that validates the structure of the CDR and performs byteswapping if necessary. This means the deserializer can now assume a well-formed input in native byte-order. Checks and conditional byteswaps have been removed accordingly. It changes some types to make a compile-time distinction between read-only, native-endianness input, a native-endianness output, and a big-endian output for dealing with key hashes. This should reduce the risk of accidentally mixing endianness or modifying an input stream. The preprocessor has been modified to indicate the presence of unions in a topic type in the descriptor flags. If a union is present, any memory allocated in a sample is freed first and the sample is zero'd out prior to deserializing the new value. This is to prevent reading garbage pointers for strings and sequences when switching union cases. The test tool has been included in the commit but it does not get run by itself. Firstly, it requires the presence of OpenSplice DDS as an alternative implementation to check the CDR processing against. Secondly, it takes quite a while to run and is of no interest unless one changes something in the (de)serialization. Finally, I have no idea why there was a "CDR stream" interface among the public functions. The existing interfaces are fundamentally broken by the removal of arbitrary-endianness streams, and the interfaces were already incapable of proper error notification. So, they have been removed. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-05-24 07:48:45 +02:00
Erik Boasson	6c171a890d	move util library into ddsrt As was the plan with the introduction of ddsrt; this includes renaming the identifiers to match the capitalization style and removes old junk. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-04-24 14:09:30 +02:00
Erik Boasson	c3dca32a2f	nestable calls to thread_[state_]awake Remove all the "if asleep then awake ..." stuff from the code by making awake/asleep calls nestable, whereas before it "awake ; awake" really meant a transition through "asleep". This self-evidently necessitates fixing those places where the old behaviour was relied on upon, but fortunately those are few. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-04-21 16:05:06 +02:00
Erik Boasson	6e87841ea5	move MT19937 random generator to ddsrt Signed-off-by: Erik Boasson <eb@ilities.com>	2019-04-21 16:05:06 +02:00
Jeroen Koekkoek	63a5c87baf	Fix format strings and signatures for fixed size integers Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>	2019-04-11 10:04:06 +02:00
Jeroen Koekkoek	cd6742ee12	Rearrange and fixup abstraction layer - Replace os_result by dds_retcode_t and move DDS return code defines down. Eliminates the need to convert between different return code types. - Move dds_time_t down and remove os_time. Eliminates the need to convert between different time representations and reduces code duplication. - Remove use of Microsoft source-code annotation language (SAL). SAL annotations are Microsoft specific and not very well documented. This makes it very difficult for contributers to write. - Rearrange the abstraction layer to be feature-based. The previous layout falsely assumed that the operating system dictates which implementation is best suited. For general purpose operating systems this is mostly true, but embedded targets require a slightly different approach and may not even offer all features. The new layout makes it possible to mix-and-match feature implementations and allows for features to not be implemented at all. - Replace the os prefix by ddsrt to avoid name collisions. - Remove various portions of unused and unwanted code. - Export thread names on all supported platforms. - Return native thread identifier on POSIX compatible platforms. - Add timed wait for condition variables that takes an absolute time. - Remove system abstraction for errno. The os_getErrno and os_setErrno were incorrect. Functions that might fail now simply return a DDS return code instead. - Remove thread-specific memory abstraction. os_threadMemGet and accompanying functions were a mess and their use has been eliminated by other changes in this commit. - Replace attribute (re)defines by ddsrt_ prefixed equivalents to avoid name collisions and problems with faulty __nonnull__ attributes. Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>	2019-03-22 15:19:09 +01:00
Erik Boasson	3acabf23a5	address false positive from gcc in snprintf usage Signed-off-by: Erik Boasson <eb@ilities.com>	2019-02-15 16:23:43 +01:00
Erik Boasson	1c963b5c3b	add torture test for read, query conditions The "rhc" test runs a random sequence of operations (writes, reads, &c.) through an RHC with conditions attached to it. All possible state masks are used, and query conditions are tried with a condition that only tests the key value, and one that tests attribute values. It depends on the internal checking logic of the RHC, which is currently enabled only in Debug builds because of the associated run-time overhead. Signed-off-by: Erik Boasson <eb@ilities.com>	2019-02-12 14:22:06 +01:00

12 commits