When no arguments are given, read input from stdin. Any line that is
not indented relative to the start of the current test starts a new
test. Comments start with # and "literate programming" mode is enabled
with the "-l" option and causes it ignore all lines not starting with
"> ".
Signed-off-by: Erik Boasson <eb@ilities.com>
The dds_security_fsm_current_state function has a problem in that it
already returns the next state while the callbacks related to the state
transition to that state are still in progress. The interface is only
used in test code for waiting for a certain state to occur before
generating an event and for waiting until the FSM has reached the end
state.
This commit firstly changes the tests so that the first usage is
replaced by some internal state in the tests and a condition variable,
and secondly replaces the "current_state" function to function that
returns whether the FSM is still "running" and that only returns true
once the FSM has reached the end state, including the calling of the
callbacks.
This eliminates some race conditions in the test code.
Signed-off-by: Erik Boasson <eb@ilities.com>
Functions associated with a state transition are called with the FSM
mutex unlocked and this gave dds_security_fsm_free a chance of freeing
the memory while the FSM thread was still accessing the state.
This marks the FSM as "busy" while the mutex is unlocked for the
invoking of this functions and waits with freeing the memory until it is
no longer busy.
Signed-off-by: Erik Boasson <eb@ilities.com>
This changes the status argument of the listener call to a local copy of
the entity's status field, fixing a race between dds_get_xxx_status and
the xxx listener invocations. In that case there was a window during
which the "change" fields could be reset by the former prior to the
latter getting invoked.
One symptom of this particular race condition is the (very rare) failure
of the liveliness tests for 0 and 1ns lease durations, while waiting for
the writers to all become not alive. In that particular scenario, the
liveliness changed listener observes alive_count_change and
not_alive_count_change both 0, which it (rightly) considers an error.
A regression test is added that reliably reproduces the problem.
Signed-off-by: Erik Boasson <eb@ilities.com>
Content filtering is possible in current Cyclone by setting a callback
function on a topic, but this is not an interface that is intended to
survive. Still, it can save the day. This commit improves the
interface a bit by allowing an argument to the filter function. It also
adds some tests to verify that it does work.
Also point out in the header file that this really is not an interface
that will be around for long and that it needs to be used with care.
Signed-off-by: Erik Boasson <eb@ilities.com>
At least part of the flakiness of this test was caused by failures
checking the number of remote writers that were still alive: sometimes
the number of writers that had lost its liveliness was as expected, at
which point the number that had retained its liveliness should also be
as expected, and yet the second read for the latter gave an unexpected
value.
The hypothesis is that the transitions happened in the short space of
time between reading the two. Some factors appear to be in play:
* Sometimes a test stressing the discovery path gets scheduled in
parallel, which can delay the liveliness updates, or perhaps even
cause it to be lost in transit.
* If the second happens to be the case, the relatively short lease
duration (200ms) makes the retransmit tight if the timers are at
default values and it so happens that it is the last packet with those
updates that got lost (in that case, the only trigger is a 100ms
timer).
* The multiplication factor of exactly 2 means that the lease expiry is
likely to happen around the time these checks get done, and so can
easily cause a very odd-looking result.
This commit restarts the test with longer lease durations in case either
of the two counts is wrong, where before it would restart if the "not
alive" count was wrong, but fail immediately if the "alive" count was.
It also slightly stretches the check duration. Finally it prints all
the actual and expected counts that cause the test to pass or fail,
improving the chances of making sense of future failures.
Signed-off-by: Erik Boasson <eb@ilities.com>
The configuration was changed in b25f10ff to consider the DDSSecurity
tag deprecated (there is no other), which gave rise to warnings in test
output.
Signed-off-by: Erik Boasson <eb@ilities.com>
Some of the tests wait until reader/writing matching has completed in
multiple steps and/or interleave entity creation and waiting for
matching. This commit moves the waiting to the end and uses a (mostly
arbitrary) absolute timeout rather than an arbitrary relative one that
could (but fortunately never -- or rarely -- did) add up to durations
that could easily cause the overall test to time out.
For the specific case of the access control permission expiry test (a
particularly problematic case), it uses the first expiry time and gives
a bit more time for the discovery. This appears to significantly reduce
the number of failures.
Signed-off-by: Erik Boasson <eb@ilities.com>
Based on a sense that perhaps the interval grows too quickly in
combination with network hiccups, combined with a suspicion that this
may be the cause of some test failures.
Signed-off-by: Erik Boasson <eb@ilities.com>
The writer may well block at some point while retransmitting (and
certainly when retransmits are prioritised), and so there are two
reasons for sending a HEARTBEAT: one is that the reader needs it before
it can request more missing samples, the other is that the writer may
require a acknowledgement.
The change only affects the cases where the most recently transmitted
sample (there may be later ones currently being packed into a message)
had to be retransmitted.
Signed-off-by: Erik Boasson <eb@ilities.com>
The emphasis is on "do": with this commit it does it even when there is
no data available in the writer. These were suppressed previously
because of a quirk in the DDSI specification in versions prior to 2.3,
where it impossible for a writer to send a valid heartbeat if its
history cache was empty.
Not sending them has negative consequences, as establishing a reliable
connection then becomes dependent on the reader sending a pre-emptive
ACKNACK message. Uusally, this makes no observable difference, but if
the writer temporarily disconnects from the reader (but not vice-versa)
it may require the publishing of a sample to resynchronize the two.
Signed-off-by: Erik Boasson <eb@ilities.com>
This adds a set of functions:
* dds_create_statistics
* dds_refresh_statistics
* dds_delete_statistics
* dds_lookup_statistic
to poll entities for information on their state, returned as a set of
name-value pairs. The interface and selection of statistics (and
naming) is all provisional, and for this reason the
dds/ddsc/dds_statistisc.h file is not included by dds.h.
Currently, the only statistics available relate to retansmits and are
optionally output by ddsperf.
Signed-off-by: Erik Boasson <eb@ilities.com>
* Bandwidth usage is now printed in Mb/s if no reference rate is given
* Trailing average rate over the last 10s (approximated as the last 10
lines of output) is printed
* An option to wait until the expected number of peers is present
* The test script now pushes data to the remotes, instead of using the
first remote as the publisher
Signed-off-by: Erik Boasson <eb@ilities.com>
Overly aggressive sending of ACKNACKs eats bandwidth and causes
unnecessary retransmits and lowers performance; but overly timid sending
of them also reduces performance. This commit reduces the
aggressiveness.
* It keeps more careful track of what ACKNACK (or NACKFRAG) was last
sent and when, suppressing ACKs that don't provide new information for
a few milliseconds and suppressing NACKs for the NackDelay
setting. (The setting was there all long, but it didn't honor it when
the writer asked for a response.)
* It ignores the NackDelay when all that was requested has arrived, or
when it receives a directed heartbeat from a Cyclone peer. The latter
is taken as an indication that no more is following, and allows the
recipient to ask far arbitrary amounts of data and rely on the sender
to limit the retransmit to what seems reasonable. (For NACKFRAG one
can do it in the recipient, but for ACKNACK one cannot, and so one
might as well do it at the sender always.)
* Sufficient state is maintained in the match object for the ACKNACK
generator to decide whether or not to send an ACKNACK following the
rules, and it may decide to send just an ACK even though there is data
missing, or nothing at all.
* If HEARTBEAT processing requires an immediate response, the response
message is generated by the receive thread, but still queued for
transmission. If a delayed response is required, it schedules the
ACKNACK event.
Signed-off-by: Erik Boasson <eb@ilities.com>
This adds tracking of whether a heartbeat should be generated until
processing of the message is complete or an ACKNACK or NACKFRAG from
another reader requires a response. This way, an ACKNACK + NACKFRAG
pair does not trigger multiple heartbeat messages.
Signed-off-by: Erik Boasson <eb@ilities.com>
The DDSI spec version 2.3 allows empty bit sets, so malformed GAPs
caused by a bug in the code for avoiding those is most easily fixed by
generating a GAP with an empty bit set.
Signed-off-by: Erik Boasson <eb@ilities.com>
This changes a few intertwined things at the same time:
* It allows configuring sending a partial message for large messages,
with a maximum derived from the discovered receive buffer sizes;
* It uses a different message size limit for datagrams that include
retransmits than for those that don't. The argument here is that,
having seen flaky networks where large datagrams cause trouble, it
makes sense to default to sending retransmits as datagrams that fit in
individual packets.
* The best performance is generally obtained using the maximum data gram
size, but the benefits do fall off quite quickly once they are
largish. For flaky networks, it doesn't make sense to go for 64kB
datagrams. This tries to find a reasonable compromise.
* It now packs mutiple fragments into a single DATAFRAG message to
eliminate the cost of using small fragment sizes.
The changes in buffer sizes cause the ddsperf sanity check to fail:
* The larger amounts of unacknowledged data cause the used memory to be
higher, failing the RSS check. Raising the limit seems
reasonable (the alternative would be to configure it back to the old
values, but it is all empirically determined anyway).
* The same also causes the publisher thread to get to run more and the
ping/pong bit gets less of a chance. Using fixed-frequency bursts
helps with this.
This therefore also adjust the test configuration and the thresholds a
bit.
Signed-off-by: Erik Boasson <eb@ilities.com>
An asymmetrical disconnect where the reader undiscovers and rediscovers
the writer, but the reader remains alive all the time for the writer
results in the "count" field of NACKFRAGs restarting. According to the
spec these must be ignored to protect against multi-pathing, but in this
scenario, ignoring them results in ignoring valid retransmit requests
until the "count" value catches up, which can take a very long time.
For ACKNACKs and HEARTBEATs the same problem exists, there it was
already handled by accepting backward jumps after some time has passed.
This reuses the same logic for NACKFRAGs.
This also changes the "count" fields to uint32_t throughout: the spec
defines them as int32_t, requires them to be strictly monotonically
increasing and omits any mention of a valid range or at what value the
counter should start. Thus, everything in [-2^31,2^31-1] is allowed,
switching to an uint32_t merely shifts the range. It also appears that
all implementations start at 0 or 1. The "strictly monotonically" part
was impossible to do without disconnecting anyway.
Signed-off-by: Erik Boasson <eb@ilities.com>
It is done by "do_locator" after it has decided that the locator is
well-formed and, crucially, not to be ignored. Setting it when there
are only ignored locators (of the unicast/multicast, data/metadata
variety) causes further processing to rely on uninitialized memory.
Signed-off-by: Erik Boasson <eb@ilities.com>
Reuse unicast data socket in MSM_NO_UNICAST, just like it did in all
modes before the extra socket was introduced in
d1ed8df9f3. This restores support for the
"raw ethernet" transport on Linux by no longer requiring the transport
to create a socket with an arbitrary "port".
Signed-off-by: Erik Boasson <eb@ilities.com>
The deinitialize would happen on most errors, but in all those cases it
would not have been initialized yet.
Signed-off-by: Erik Boasson <eb@ilities.com>
This removes the special handling of IP addresses in adding peer
locators from the configuration, instead relying on the general
string-to-locator conversion routines.
* This extends the common IP handling to code to handle the optional
presence of a port and the use of brackets, allowing them always for
IPv6 addresses, but requiring them only when needed for disambiguating
numerical IPv6 addresses when a port is present.
* The "multicast generator" format is now handled in UDPv4 code.
Signed-off-by: Erik Boasson <eb@ilities.com>
The src/core/ddsi/tests/locators.c test directly includes the header
files related to DDSI support for TCP and this pulled in openssl/ssl.h,
which in turn results in a build error in some environments because the
file can't be found.
There was no good reason why this dependency existed, the definitions
that relied on it were used only in the implementation of the TCP and
TLS support.
Signed-off-by: Erik Boasson <eb@ilities.com>
OpenSSL doesn't support using BIOs of the "fd" or "file" type when it is
built as a DLL and the executable didn't provide it with access to the
executable's CRT. Requiring all applications that wish to use security
to worry about this "applink.c" thing is too onerous a requirement.
* Check for the existence of "applink.c" in the OpenSSL include
directory, adding it to the security tests if it exists. This way,
all of OpenSSL can be used by the tests.
* Include it in the security core and built-in plugin tests. This way,
the test code can use the entirety of OpenSSL.
* In the authentication and access-control plugins, load X509 and
private keys from files by first reading them into a "mem" type BIO,
then reading them from that BIO.
* Take care not to call ddsrt_free on OpenSSL-allocated memory, either
by calling OPENSSL_free, or by allocating the memory using
ddsrt_malloc and letting OpenSSL fill that buffer.
Signed-off-by: Erik Boasson <eb@ilities.com>