Commit graph

391 commits

Author SHA1 Message Date
Erik Boasson
d700657cb7 ddsperf latnecy should include median, 90% and 99%
Signed-off-by: Erik Boasson <eb@ilities.com>
2019-05-02 20:53:20 +08:00
Erik Boasson
d693d8eac9 limit WHC, serdata, xmsg freelist memory use (#168)
High sample rates require rather high rates of allocating and freeing
WHC nodes, serialised samples (serdata), and RTPS message fragments
(xmsg).  A bunch of dedicated parallel allocators help take some
pressure off the regular malloc/free calls.  However, these used to
gobble up memory like crazy, in part because of rather generous limits,
and in part because there was no restriction on the size of the samples
that would be cached, and it could end up caching large numbers of
multi-MB samples.  It should be noted that there is no benefit to
caching large samples anyway, because the sample rate will be that much
lower.

This commit reduces the maximum number of entries for all three cases,
it furthermore limits the maximum size of a serdata or xmsg that can be
cached, and finally instead of instantiating a separate allocator for
WHC nodes per WHC, it now shares one across all WHCs.  Total memory use
should now be limited to a couple of MB.

The caching can be disabled by setting ``FREELIST_TYPE`` to
``FREELIST_NONE`` in ``q_freelist.h``.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-05-02 20:53:20 +08:00
Erik Boasson
6011422566 ddsperf: fix calculation of data rate in Mb/s
Multiplying time-in-ns since previous output line by 1e9 instead of
dividing it by 1e9 resulted in bit rate showing up as 0Mb/s.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-05-02 20:53:20 +08:00
Erik Boasson
fc5a349a72 out-of-bounds write nn_bitset_one w multiple of 32
nn_bitset_one sets the specified number of bits by first memset'ing the
words, then clearing bits set in a final partial word.  It mishandled
the case where the number of bits is a multiple of 32, clearing the
entire word following the last one it was to touch.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-05-02 20:53:20 +08:00
Jeroen Koekkoek
c9d827e420 Fix warnings related to fixed type integers
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
2019-04-29 19:22:11 +02:00
YuSheng
ca35c7afb2 add RPATH for compiled tools to find the libddsc.so (#153)
* add RPATH for compiled tools to find the libddsc.so

Signed-off-by: YuSheng <hello@cwyark.me>
2019-04-29 19:09:40 +02:00
Erik Boasson
b686ba858c make internal header files more C++ friendly
Generally one doesn't need to include any internal header files in an
application, but the (unstable) interface for application-defined sample
representation and serialization does require including some.  It turns
out a keyword clash had to be resolved (typename => type_name) and that
a whole bunch of them were missing the #ifdef __cplusplus / extern "C"
bit.

It further turned out that one had to pull in nearly all of the type
definitions, including some typedefs that are illegal in C++, e.g.,

  typedef struct os_sockWaitset *os_sockWaitset;

C++ is right to forbid this, but Cyclone's header files were wrong to
force inclusion of so much irrelevant stuff.  This commit leaves these
typedefs in place, but eliminates a few header file inclusions to avoid
the problem.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-29 11:15:41 +02:00
Erik Boasson
d2ebbbc880 address a handful of compiler warnings in ddsperf
These are fortunately all false positives.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-29 11:15:41 +02:00
eboasson
bf79e12e10
Merge pull request #152 from martinbremmer/mptproto3
Multi Process Testing framework
2019-04-25 21:56:52 +02:00
Martin Bremmer
7a705eabf0 Removed expand_envvars.h
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-25 13:29:11 +02:00
Martin Bremmer
e9f6ec6f48 Be sure to not trigger the SIGCHLD
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 15:13:30 +02:00
Martin Bremmer
74ca68e550 Improved mpt default timeout.
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 15:00:37 +02:00
Martin Bremmer
44ce20ebe0 Fixed proc compile warning.
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 15:00:37 +02:00
Martin Bremmer
973ae87e17 Moved expand_envvars.
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 15:00:37 +02:00
Martin Bremmer
17f9c361ea Multi Process Testing framework
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 14:46:46 +02:00
Martin Bremmer
0269774a60 Rudimentary process management.
Signed-off-by: Martin Bremmer <martin.bremmer@adlinktech.com>
2019-04-24 14:46:46 +02:00
Erik Boasson
d146716d1d remove Lease element from test config
The element has long been meaningless and got deprecated in commit
c3dca32a2f.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
06245d0d4a initial version of permance/network check tool
The current situation for performance measurements and checking network
behaviour is rather unsatisfactory, as the only tools available are
``pubsub`` and the ``roundtrip`` and ``throughput`` examples.  The first
can do many things thanks to its thousand-and-one options, but its
purpose really is to be able to read/write arbitrary data with arbitrary
QoS -- though the arbitrary data bit was lost in the hacked conversion
from the original code.  The latter two have a terrible user interface,
don't perform any verification that the measurement was successful and
do not provide the results in a convenient form.

Furthermore, the abuse of the two examples as the primary means for
measuring performance has resulted in a reduction of their value as an
example, e.g., they can do waitset- or listener-based reading (and the
throughput one also polling-based), but that kind of complication does
not help a new user understand what is going on.  Especially not given
that these features were simply hacked in.

Hence the need for a new tool, one that integrates the common
measurements and can be used to verify that the results make sense.  It
is not quite done yet, in particular it is lacking in a number of
aspects:

* no measurement of CPU- and network load, memory usage and context
  switches yet;
* very limited statistics (min/max/average, if you're lucky; no
  interesting things such as jitter on a throughput test yet);
* it can't yet gather the data from all participants in the network
  using DDS;
* it doesn't output the data in a convenient file format yet;
* it doesn't allow specifying boundaries within which the results
  must fall for the run to be successful.

What it does verify is that all the endpoint matches that should exist
given the discovered participant do in fact come into existence,
reporting an error (and exiting with an exit status code of 1) if they
don't, as well as checking the number of participants.  With the way the
DDSI protocol works, this is a pretty decent network connectivity check.

The raw measurements needed for the desired statistics (apart from
system-level measurements) are pretty much made, so the main thing that
still needs to be done is exploit them and output them.  It can already
replace the examples for most benchmarks (only the 50%/90%/99%
percentiles are still missing for a complete replacement).

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
46f61e09f5 missing m_observer_lock on (re)setting statuses
Most of the places where the status flags were reset, this happened
without holding m_observer_lock protecting these status flags.  For most
of these statuses, they are only ever set/reset while also holding the
entity lock, but this is not true for all of them (DATA_AVAILABLE for
example), and thus there are some cases where retrieving the status
could lead to losing the raising of a (at least a DATA_AVAILABLE)
status.

The problem was introduced in ba46cb1140.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
1a3d5c7aba Fix DATA_AVAILABLE race condition
The DATA_AVAILABLE status was reset by read and take while holding the
upper-layer reader lock, but after completing the read/take operation on
the RHC.  As data can be written into the RHC without holding the
upper-layer reader lock, new data could arrive in between the
reading/taking and the resetting of the DATA_AVAILABLE status, leading
to a missed detection.  Resetting DATA_AVAILABLE prior to accessing the
RHC solves this.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
1ecad3c047 remove "Error occurred on locking entity" messages
Those should not be printed to stderr (or wherever), there are errors
returned in these cases ...

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
9c1a739559 suppress EHOSTUNREACH and EHOSTDOWN errors in log
Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
1672268481 always append 0 byte to user/group/topic data
Changes the semantics of dds_qget_{user,group,topic}data to always
append a 0 byte to any non-empty value without counting it in the size.
(An empty value is always represented by a null pointer and a size of
0).  The advantage is that any code treating the data as the octet
sequence it formally is will do exactly the same, but any code written
with the knowledge that it should be a string can safely interpret it as
one.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
6c171a890d move util library into ddsrt
As was the plan with the introduction of ddsrt; this includes renaming
the identifiers to match the capitalization style and removes old junk.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
e965df5db7 add participant instance handle to builtin topics
Extend the endpoint built-in topic data with the participant instance
handle (the GUID was already present).  Having the instance handle
available makes it trivial to look up the participant, whereas a lookup
of the GUID is rather impractical.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
5735b5775d add setter for partition QoS for a single name
This adds dds_qset_partition1 as a convenience function to set the
partition QoS to a single name.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
7fb9ef2ab0 publish built-in topics prior to matching
The built-in topics for readers and writers should be published before a
subscription or publication matched listener is invoked, otherwise the
instance handle provided to the listener is not yet available in a
reader for the corresponding topic.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
4778d6c5df add QoS to ignore local readers/writers (#78)
Adds a new "ignorelocal" QoS to the readers/writers to ignore local
matching readers/writers, with three settings:

* DDS_IGNORELOCAL_NONE: default
* DDS_IGNORELOCAL_PARTICIPANT: ignores readers/writers in the same
  participant
* DDS_IGNORELOCAL_PROCESS: ignores readers/writers in the same process

These can be set/got using dds_qset_ignorelocal and
dds_qget_ignorelocal.

If a matching reader or writer is ignored because of this setting, it is
as-if that reader or writer doesn't exist.  No traffic will be generated
or data retained on its behalf.

There are no consequences for interoperability as this is (by
definition) a local affair.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
a6b5229510 crash invoking data available on built-in reader
The DDSI reader/writer pointers are now returned as out parameters
instead of as a return value, so that the upper-layer reference is set
before any listener can be invoked.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
31b8baa03b block signals in ddsrt_thread_create
Signal handling in multi-threaded processes is bad enough at the best of
times, and as we don't really use any signals in the Cyclone code, it
makes more sense to create all threads with most signals blocked.  That
way an application that wants to handle signals using sigwait() need not
block all signals prior to creating a participant.

Note that instead of blocking all signals, we block all except SIGXCPU.
The reason is that the liveliness monitoring and stack trace dumping
code currently relies on that signal.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
0202039f61 remove dds_rhc_fini abomination
It was called strangely early in the deleting of the reader, even before
the DDSI reader was no longer being accessed by other threads.  The
immediate and obvious problem is that it resets the pointer to the
upper-layer entity even though this can still be dereferenced in
invoking a listener, resulting in a crash.

Secondly it blocks until there are no listener calls any more (and the
resetting of that pointer will prevent any further listener
invocations), but a similar piece of logic is already in generic entity
code that resets the mask and then waits for all listener invocations to
complete.  Having both is a problem.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
d6edfada81 fix deadlock between listener, deleting reader, &c
If a (proxy) writer delivers data to a reader that has a data_available
listener calling read/take while that reader is being deleted, blocked
in set_listener waiting for the listeners to complete, then a deadlock
can occur:

* listener calling read/take then attempt to lock reader;
* deleting the reader locks the reader, then waits for the listeners to
  complete while holding the lock

This commits unlocks the reader before waiting for the listeners to
complete.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
2dd20c4273 add dds_entity_release counterpart to entity_claim
Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-24 14:09:30 +02:00
Erik Boasson
6227fe00b3 eliminate clang static analyzer false positive
Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
ec0062542c defer triggering dqueue thread until end-of-packet
There appears to be a minor performance benefit to not waking up the
delivery thread (if used) immediately upon enqueueing the first sample,
but rather to wait (typically) until the end of the packet.  In a
latency measurement it probably makes little difference: one shouldn't
use asynchronous delivery if one needs the lowest possible latency, and
the end of the packet is reached rather quickly normally.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
c92820677d enable printf format checking for dds_log
Also remove superfluous parameters in a TRACE statement and fix a format
specification in pong.c.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
c3dca32a2f nestable calls to thread_[state_]awake
Remove all the "if asleep then awake ..." stuff from the code by making
awake/asleep calls nestable, whereas before it "awake ; awake" really
meant a transition through "asleep".  This self-evidently necessitates
fixing those places where the old behaviour was relied on upon, but
fortunately those are few.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
9b3a71e1ab lift limits on handle allocation and reuse (#95)
The old entity handle mechanism suffered from a number of problems, the
most terrible one being that it would only ever allocate 1000 handles
(not even have at most 1000 in use at the same time).  Secondarily, it
was protected by a single mutex that actually does show up as a limiting
factor in, say, a polling-based throughput test with small messages.
Thirdly, it tried to provide for various use cases that don't exist in
practice but add complexity and overhead.

This commit totally rewrites the mechanism, by replacing the old array
with a hash table and allowing a near-arbitrary number of handles as
well as reuse of handles.  It also removes the entity "kind" bits in the
most significant bits of the handles, because they only resulted in
incorrect checking of argument validity.  All that is taken out, but
there is still more cleaning up to be done.  It furthermore removes an
indirection in the handle-to-entity lookup by embedding the
"dds_handle_link" structure in the entity.

Handle allocation is randomized to avoid the have a high probability of
quickly finding an available handle (the total number of handles is
limited to a number much smaller than the domain from which they are
allocated).  The likelihood of handle reuse is still dependent on the
number of allocated handles -- the fewer handles there are, the longer
the expected time to reuse.  Non-randomized handles would give a few
guarantees more, though.

It moreover moves the code from the "util" to the "core/ddsc" component,
because it really is only used for entities, and besides the new
implementation relies on the deferred freeing (a.k.a. garbage collection
mechanism) implemented in the core.

The actual handle management has two variants, selectable with a macro:
the preferred embodiment uses a concurrent hash table, the actually used
one performs all operations inside a single mutex and uses a
non-concurrent version of the hash table.  The reason the
less-predeferred embodiment is used is that the concurrent version
requires the freeing of entity objects to be deferred (much like the
GUID-to-entity hash tables in DDSI function, or indeed the key value to
instance handle mapping).  That is a fair bit of work, and the
non-concurrent version is a reasonable intermediate step.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
58c0cb2317 fix trace print of tkmap_instance address
Fix the trace to contain a print of the address of the tkamp_instance
(along with the instance id), rather than the address of the stack
variable pointing to the tkmap_instance.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
6f35d88d54 install core/ddsi and util header files
Some of the former are required to implement alternative serialisation
methods; the latter is just generally useful. For the time being these
are not part of the formal API and not subject to backwards
compatibility. Still, they have value for quickly building tools on that
use Cyclone and happen to need any of these functions.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
6e87841ea5 move MT19937 random generator to ddsrt
Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
dd9aceb713 small performance improvement in RHC
The introduction of properly functioning query conditions adds some
overhead, this commit removes some of that cost by avoiding some calls
to update_conditions when there are no query conditions.

It also removes the has_changed field from the instance, instead using a
local boolean to track whether DATA_AVAILABLE should be raised or not.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-21 16:05:06 +02:00
Erik Boasson
62a71a870f fix race: delete reader & delete writer (#159)
Adding and removing reader/writer matches can be done by multiple
threads, and this can result in two threads simultaneously trying to do
this on a single reader/writer pair.  The code therefore always checks
first whether the pair is (not) matched before proceeding.

However, removing a reader from a proxy writer had part of the code
outside this check.  Therefore, if both entities are being deleted
simultanously, there is a risk that local_reader_ary_remove is called
twice for the same argument, and in that case, it asserts in one of them
because the reader can no longer be found.  The counting of the number
of matched reliable readers suffers from the same race condition.

This commit eliminates these race conditions by moving these operations
into the block guarded by the aforementioned check.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-20 18:25:09 +02:00
Jeroen Koekkoek
0b106cc186 Remove JAVA_HOME regarding registry from .travis.yml
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
2019-04-18 19:28:46 +02:00
Jeroen Koekkoek
4a60000e58 Remove dependency on jdk8 Chocolatey package
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
2019-04-18 19:28:46 +02:00
Erik Boasson
671e73ec98 set DATA_AVAILABLE when deleting writer (#148)
Deleting a writer causes unregisters (and possibly disposes) in the rest
of the network, and these updates to the instances should trigger
DATA_AVAILABLE.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-11 10:09:35 +02:00
Erik Boasson
b14663c173 ignore data until a heartbeat is received (#146)
When data arrives before a heartbeat has been received, it is impossible
to know whether this is a new "live" sample or a retransmit, and for
this reason the requesting of historical data is delayed until a
heartbeat arrives that informs the readers of the range of sequence
numbers to request as historical data.

However, by this time, and without this new condition in place, the
reader may have already received some data directly, and may
consequently request some data twice.  That's not right.

Requiring a heartbeat to have been received before delivering the data
avoids this problem, but potentially delays receiving data after a new
writer/reader pair has been matched.  The delay caused by a full
handshake at that point seems less bad that the odd case of stuttering
where that isn't expected.  There are almost certainly some tricks
possible to avoid that delay in the common cases, but there are more
important things to do ...

Best-effort readers on a reliable proxy writer are a bit special: if
there are only best-effort readers, there is no guarantee that a
heartbeat will be received, and so the condition does not apply.  This
commit attempts to deal with that by only requiring a heartbeat if some
reliable readers exist, but that doesn't allow a smooth transition from
"only best-effort readers" to "some reliable readers".

One could moreover argue that this condition should not be imposed on
volatile readers (at worst you get a little bit of data from before the
match), but equally well that it should (there's no guarantee that no
sample would be skipped in the case of a keep-all writer, if the first
sample happened to be a retransmit).

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-11 10:09:35 +02:00
Jeroen Koekkoek
3bdd2a140d Move md5 from ddsi to ddsrt
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
2019-04-11 10:04:06 +02:00
Jeroen Koekkoek
63a5c87baf Fix format strings and signatures for fixed size integers
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
2019-04-11 10:04:06 +02:00
Erik Boasson
638cab9291 ignore all-zero durability service QoS in SEDP
For compatibility with TwinOaks CoreDX, ignore an all-zero durability
service QoS received over SEDP for volatile and transient-local
endpoints.

Signed-off-by: Erik Boasson <eb@ilities.com>
2019-04-08 20:07:29 +02:00