Use ddsperf in perf scripts and update README

* the scripted throughput test originally used for the throughput graph
  in the README now uses ddsperf;

* a scripted latency test has been added;

* updated the README with the results of these tests (and so now gives
  easy access not only to throughput, but also to latency and memory
  usage, as well as to latency over GbE.

Signed-off-by: Erik Boasson <eb@ilities.com>
This commit is contained in:
Erik Boasson 2019-08-02 09:06:39 +02:00 committed by eboasson
parent 952029dba0
commit af19c5681c
7 changed files with 554 additions and 189 deletions

145
README.md
View file

@ -1,9 +1,12 @@
# Eclipse Cyclone DDS
Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the
market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project
Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation. Cyclone DDS is developed completely in the open as an Eclipse IoT project
(see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)).
* [Getting Started](#getting-started)
* [Performance](#performance)
* [Configuration](#configuration)
# Getting Started
## Building Eclipse Cyclone DDS
@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con
arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at
the moment, OpenSSL).
## Configuration
## Documentation
The documentation is still rather limited, and at the moment only available in the sources (in the
form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
a
[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
intent is to automate the process of building the documentation and have them available in more
convenient formats and in the usual locations.
## Building and Running the Roundtrip Example
We will show you how to build and run an example program that measures latency. The examples are
built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
to run the program, it is merely to illustrate the process.
$ cd cyclonedds/examples/roundtrip
$ mkdir build
$ cd build
$ cmake ..
$ make
On one terminal start the application that will be responding to pings:
$ ./RoundtripPong
On another terminal, start the application that will be sending the pings:
$ ./RoundtripPing 0 0 0
# payloadSize: 0 | numSamples: 0 | timeOut: 0
# Waiting for startup jitter to stabilise
# Warm up complete.
# Latency measurements (in us)
# Latency [us] Write-access time [us] Read-access time [us]
# Seconds Count median min 99% max Count median min Count median min
1 28065 17 16 23 87 28065 8 6 28065 1 0
2 28115 17 16 23 46 28115 8 6 28115 1 0
3 28381 17 16 22 46 28381 8 6 28381 1 0
4 27928 17 16 24 127 27928 8 6 27928 1 0
5 28427 17 16 20 47 28427 8 6 28427 1 0
6 27685 17 16 26 51 27685 8 6 27685 1 0
7 28391 17 16 23 47 28391 8 6 28391 1 0
8 27938 17 16 24 63 27938 8 6 27938 1 0
9 28242 17 16 24 132 28242 8 6 28242 1 0
10 28075 17 16 23 46 28075 8 6 28075 1 0
The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From
these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
micro-seconds (used to be 25 micro-seconds) on this HW.
# Performance
Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100
byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two
Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the
executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build.
<img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-rate.png" alt="Throughput" height="400"><img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener.png" alt="Throughput" height="400">
This is with the subscriber in listener mode, using asynchronous delivery for the throughput
test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum
message size and fragment size, and an increased high-water mark for the reliability window on the
writer side. For details, see the [scripts](examples/perfscript) directory,
the
[environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and
the
[throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and
[latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data
underlying the graphs. These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png).
# Configuration
The out-of-the-box configuration should usually be fine, but there are a great many options that can
be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to
@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a
dump. Background information on configuring Cyclone DDS can be
found [here](https://docs/manual/config.rst).
## Documentation
The documentation is still rather limited, and at the moment only available in the sources (in the
form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
a
[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
intent is to automate the process of building the documentation and have them available in more
convenient formats and in the usual locations.
## Performance
Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R)
CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64,
connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is:
<img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput-polling.png" alt="Throughput" height="400">
This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the
message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about
750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration,
tweaked only to increase the high-water mark for the reliability window on the writer side. For
details, see the scripts in the ``performance`` directory and
the
[data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt).
There is some data on roundtrip latency below.
## Building and Running the Roundtrip Example
We will show you how to build and run an example program that measures latency. The examples are
built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
to run the program, it is merely to illustrate the process.
$ cd cyclonedds/examples/roundtrip
$ mkdir build
$ cd build
$ cmake ..
$ make
On one terminal start the application that will be responding to pings:
$ ./RoundtripPong
On another terminal, start the application that will be sending the pings:
$ ./RoundtripPing 0 0 0
# payloadSize: 0 | numSamples: 0 | timeOut: 0
# Waiting for startup jitter to stabilise
# Warm up complete.
# Round trip measurements (in us)
# Round trip time [us] Write-access time [us] Read-access time [us]
# Seconds Count median min 99% max Count median min Count median min
1 28065 17 16 23 87 28065 8 6 28065 1 0
2 28115 17 16 23 46 28115 8 6 28115 1 0
3 28381 17 16 22 46 28381 8 6 28381 1 0
4 27928 17 16 24 127 27928 8 6 27928 1 0
5 28427 17 16 20 47 28427 8 6 28427 1 0
6 27685 17 16 26 51 27685 8 6 27685 1 0
7 28391 17 16 23 47 28391 8 6 28391 1 0
8 27938 17 16 24 63 27938 8 6 27938 1 0
9 28242 17 16 24 132 28242 8 6 28242 1 0
10 28075 17 16 23 46 28075 8 6 28075 1 0
The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From
these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
micro-seconds (used to be 25 micro-seconds) on this HW.
# Trademarks
* "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation.

153
examples/perfscript/latency-test Executable file
View file

@ -0,0 +1,153 @@
#!/bin/bash
export nwif=eth0
bandwidth=1e9
remotedir="$PWD"
provision=false
asynclist="sync async"
modelist="listener waitset"
sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
timeout=30
loopback=true
resultdir="latency-result"
usage () {
cat >&2 <<EOF
usage: $0 [OPTIONS] user@remote [user@remote...]
OPTIONS
-i IF use network interface IF (default: $nwif)
-b 100|1000|B network bandwidth (100Mbps/1000Gbps or B bits/sec) for
calculating load % given load in bytes/second (default: 1000)
-d DIR use DIR on remote (default: PWD)
-p provision required binaries in DIR (default: $provision)
first ssh's in to try mkdir -p DIR, then follows up with scp
-t DUR run for DUR seconds per size (default $timeout)
-a ASYNCLIST run for delivery async settings ASYNCLIST (default:
"$asynclist")
-m MODELIST run with subscriber mode settings MODELIST (default:
"$modelist")
-s SIZELIST run for sizes in SIZELIST (default: "$sizelist")
if SIZELIST is empty, it uses ddsperf's OU topic instead
-l LOOPBACK enable/disable multicast loopback (true/false, default: $loopback)
-o DIR store results in dir (default: $resultdir)
Local host runs ddsperf in ping mode, remotes run pong. It assumes these are
available in DIR/bin. It also assumes that ssh user@remote works without
requiring a password.
EOF
exit 1
}
while getopts "i:d:pa:m:s:t:o:l:" opt ; do
case $opt in
i) nwif="$OPTARG" ;;
b) case "$OPTARG" in
100) bandwidth=1e8 ;;
1000) bandwidth=1e9 ;;
*) bandwidth="$OPTARG" ;;
esac ;;
d) remotedir="$OPTARG" ;;
p) provision=true ;;
a) asynclist="$OPTARG" ;;
m) modelist="$OPTARG" ;;
s) sizelist="$OPTARG" ;;
l) loopback="OPTARG" ;;
t) timeout="$OPTARG" ;;
o) resultdir="$OPTARG" ;;
h) usage ;;
esac
done
shift $((OPTIND-1))
if [ $# -lt 1 ] ; then usage ; fi
cfg=cdds-simple.xml
cat >$cfg <<EOF
<CycloneDDS>
<Domain>
<Id>17</Id>
</Domain>
<General>
<NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
<EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
<MaxMessageSize>65500B</MaxMessageSize>
<FragmentSize>4000B</FragmentSize>
</General>
<Internal>
<Watermarks>
<WhcHigh>500kB</WhcHigh>
</Watermarks>
<SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
<LeaseDuration>3s</LeaseDuration>
</Internal>
<Tracing>
<Verbosity>config</Verbosity>
</Tracing>
</CycloneDDS>
EOF
if [ ! -x bin/ddsperf ] ; then
echo "bin/ddsperf not found on the local machine" >&2
exit 1
fi
[ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; }
if $provision ; then
echo "provisioning ..."
for r in $pubremote "$@" ; do
ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
scp lib/libddsc.so.0 $r:$remotedir/lib
scp bin/ddsperf $r:$remotedir/bin
done
fi
topic=KS
[ -z "$sizelist" ] && topic=OU
export CYCLONEDDS_URI=file://$PWD/$cfg
for r in "$@" ; do
scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
done
for async_mode in $asynclist ; do
case "$async_mode" in
sync) async=0 ;;
async) async=1 ;;
*) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
esac
export async
for sub_mode in $modelist ; do
echo "======== ASYNC $async MODE $sub_mode ========="
cat > run-pong.tmp <<EOF
export CYCLONEDDS_URI=file://$remotedir/$cfg
export async=$async
cd $remotedir
nohup bin/ddsperf -T $topic pong $sub_mode > /dev/null &
echo \$!
EOF
killpongs=""
for r in "$@" ; do
scp run-pong.tmp $r:$remotedir
rpongpid=`ssh $r ". $remotedir/run-pong.tmp"`
killpongs="$killpongs ssh $r kill -9 $rpongpid &"
done
outdir=$resultdir/$async_mode-$sub_mode
mkdir $outdir
touch $outdir/ping.log
tail -f $outdir/ping.log & xpid=$!
for size in ${sizelist:-0} ; do
echo "size $size"
bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log
sleep 5
done
eval $killpongs
sleep 1
kill $xpid
wait
done
done

View file

@ -0,0 +1,95 @@
#!/usr/bin/perl -w
# Note: this is specialized for async delivery, listener mode because of the way it deals with
# thread names
use strict;
my %res = ();
my %meas;
while (<>) {
next unless s/^\[\d+\] \d+\.\d+\s+//;
if (s/^[^\@:]+:\d+\s+size (\d+) //) {
# size is always the first line of an output block
# ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
my %tmp = %meas;
push @{$res{$meas{size}}}, \%tmp if %meas;
%meas = (size => $1,
rawxmitbw => 0, rawrecvbw => 0,
subrss => 0, pubrss => 0,
subcpu => 0, subrecv => 0,
pubcpu => 0, pubrecv => 0);
$meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//;
die unless /cnt \d+$/;
} elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
my $side = defined $1 ? "pub" : "sub";
$meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1);
$meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
$meas{"${side}recv"} = cpuload ("recvUC", $_);
} elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
$meas{rawxmitbw} = $1;
$meas{rawrecvbw} = $2;
}
}
push @{$res{$meas{size}}}, \%meas if %meas;
die "no data found" unless keys %res > 0;
print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
my @sizes = sort { $a <=> $b } keys %res;
for my $sz (@sizes) {
my $ms = $res{$sz};
my $min = min ("min", $ms);
my $max = max ("max", $ms);
my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off
my $median = max ("50%", $ms); # also not quite correct ...
my $p90 = max ("90%", $ms);
my $p99 = max ("99%", $ms);
my $rawxmitbw = median ("rawxmitbw", $ms);
my $rawrecvbw = median ("rawrecvbw", $ms);
my $pubrss = max ("pubrss", $ms);
my $subrss = max ("subrss", $ms);
my $pubcpu = median ("pubcpu", $ms);
my $pubrecv = median ("pubrecv", $ms);
my $subcpu = median ("subcpu", $ms);
my $subrecv = median ("subrecv", $ms);
print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
}
sub cpuload {
my ($thread, $line) = @_;
$thread =~ s/\./\\./g;
if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
return $1+$2;
} else {
return 0;
}
}
sub max {
my $v;
for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
return $v;
}
sub min {
my $v;
for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; }
return $v;
}
sub mean {
my $v = 0;
my @xs = extract (@_);
$v += $_ for @xs;
return $v / @xs;
}
sub median {
my @xs = sort { $a <=> $b } (extract (@_));
return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
}
sub extract {
my ($key, $msref) = @_;
return map { $_->{$key} } @$msref;
}

View file

@ -0,0 +1,46 @@
#!/bin/bash
`dirname $0`/latency-test-extract "$@" > data.txt
gnuplot <<\EOF
set term pngcairo size 1024,768
set output "latency-sync-listener.png"
set st d lp
set st li 1 lw 2
set st li 2 lw 2
set st li 3 lw 2
set st li 4 lw 2
set st li 5 lw 2
set multiplot
set logscale xy
set title "Latency"
set ylabel "[us]"
set grid xtics ytics mytics
set xlabel "payload size [bytes]"
p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
unset logscale y
unset xlabel
unset ylabel
unset title
set grid nomytics
set origin .1, .43
set size .55, .5
clear
p [10:1000] "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
unset multiplot
unset origin
unset size
unset logscale
set logscale x
set output "latency-sync-listener-bwcpu.png"
set title "Latency: network bandwidth and CPU usage"
set y2tics
set ylabel "[Mbps]"
set y2label "CPU [%]"
set xlabel "payload size [bytes]"
set key at graph 1, 0.7
p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)"
EOF

114
examples/perfscript/throughput-test Normal file → Executable file
View file

@ -1,45 +1,53 @@
#!/bin/bash
export nwif=eth0
bandwidth=1e9
remotedir="$PWD"
provision=false
asynclist="sync async"
modelist="listener polling waitset"
sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
timeout=30
loopback=true
resultdir="throughput-result"
usage () {
cat >&2 <<EOF
usage: $0 [OPTIONS] user@remote [user@remote...]
OPTIONS
-i IF use network interface IF (default: eth0)
-b 100|1000 network bandwidth (100Mbps/1000Gbps) for calculating load
% given load in bytes/second (default: 1000)
-i IF use network interface IF (default: $nwif)
-b 100|1000|B network bandwidth (100Mbps/1000Gbps or B bits/sec) for
calculating load % given load in bytes/second (default: 1000)
-d DIR use DIR on remote (default: PWD)
-p provision required binaries in DIR (default: false)
-p provision required binaries in DIR (default: $provision)
first ssh's in to try mkdir -p DIR, then follows up with scp
-t DUR run for DUR seconds per size (default 20)
-a ASYNCLIST run for delivery async settings ASYNCLIST (default: "0 1")
-m MODELIST run with subscriber mode settings MODELIST (default: "-1 0 1")
-s SIZELIST run for sizes in SIZELIST (default: "0 16 32 64 128 256")
-l LOOPBACK enable/disable multicast loopback (true/false, default: true)
-o DIR store results in dir (default: throughput-result)
-t DUR run for DUR seconds per size (default $timeout)
-a ASYNCLIST run for delivery async settings ASYNCLIST (default:
"$asynclist")
-m MODELIST run with subscriber mode settings MODELIST (default:
"$modelist")
-s SIZELIST run for sizes in SIZELIST (default: "$sizelist")
if SIZELIST is empty, it uses ddsperf's OU topic instead
-l LOOPBACK enable/disable multicast loopback (true/false, default: $loopback)
-o DIR store results in dir (default: $resultdir)
Local host runs ThroughputSubscriber, first remote runs ThroughputPublisher,
further remotes also run ThroughputSubscriber. It assumes these are
available in DIR/bin. It also assumes that ssh user@remote works without
requiring a password.
Local host runs "ddsperf" in subscriber mode, first remote runs it publisher
mode, further remotes also run subcribers. It assumes these are available in
DIR/bin. It also assumes that ssh user@remote works without requiring a
password.
EOF
exit 1
}
export nwif=eth0
bandwidth=1000
remotedir="$PWD"
provision=false
asynclist="0 1"
modelist="-1 0 1"
sizelist="0 16 32 64 128 256"
timeout=20
loopback=true
resultdir="throughput-result"
while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
case $opt in
i) nwif="$OPTARG" ;;
b) bandwidth="$OPTARG" ;;
b) case "$OPTARG" in
100) bandwidth=1e8 ;;
1000) bandwidth=1e9 ;;
*) bandwidth="$OPTARG" ;;
esac ;;
d) remotedir="$OPTARG" ;;
p) provision=true ;;
a) asynclist="$OPTARG" ;;
@ -53,7 +61,6 @@ while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
done
shift $((OPTIND-1))
if [ $# -lt 1 ] ; then usage ; fi
ethload=`dirname $0`/ethload
pubremote=$1
shift
@ -63,24 +70,27 @@ cat >$cfg <<EOF
<Domain>
<Id>17</Id>
</Domain>
<DDSI2E>
<General>
<NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
<EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
<MaxMessageSize>65500B</MaxMessageSize>
<FragmentSize>4000B</FragmentSize>
</General>
<Internal>
<Watermarks>
<WhcHigh>500kB</WhcHigh>
</Watermarks>
<SynchronousDeliveryPriorityThreshold>${async:-0}</SynchronousDeliveryPriorityThreshold>
<SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
<LeaseDuration>3s</LeaseDuration>
</Internal>
</DDSI2E>
<Tracing>
<Verbosity>config</Verbosity>
</Tracing>
</CycloneDDS>
EOF
if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then
echo "some check for existence of a file failed on the local machine" >&2
if [ ! -x bin/ddsperf ] ; then
echo "bin/ddsperf not found on the local machine" >&2
exit 1
fi
@ -91,33 +101,35 @@ if $provision ; then
for r in $pubremote "$@" ; do
ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
scp lib/libddsc.so.0 $r:$remotedir/lib
scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin
scp bin/ddsperf $r:$remotedir/bin
done
fi
topic=KS
[ -z "$sizelist" ] && topic=OU
export CYCLONEDDS_URI=file://$PWD/$cfg
for r in $pubremote "$@" ; do
scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
done
for async in $asynclist ; do
for async_mode in $asynclist ; do
case "$async_mode" in
sync) async=0 ;;
async) async=1 ;;
*) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
esac
export async
for mode in $modelist ; do
echo "======== ASYNC $async MODE $mode ========="
for sub_mode in $modelist ; do
echo "======== ASYNC $async MODE $sub_mode ========="
cat > run-publisher.tmp <<EOF
export CYCLONEDDS_URI=file://$remotedir/$cfg
export async=$async
cd $remotedir
rm -f pub-top.log
for size in $sizelist ; do
for size in ${sizelist:-0} ; do
echo "size \$size"
bin/ThroughputPublisher \$size > pub.log & ppid=\$!
top -b -d1 -p \$ppid >> pub-top.log & tpid=\$!
sleep $timeout
kill \$tpid
kill -2 \$ppid
wait \$ppid
bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log
sleep 5
done
wait
@ -129,7 +141,7 @@ EOF
export CYCLONEDDS_URI=file://$remotedir/$cfg
export async=$async
cd $remotedir
nohup bin/ThroughputSubscriber 0 $mode > /dev/null &
nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null &
echo \$!
EOF
for r in "$@" ; do
@ -139,21 +151,17 @@ EOF
done
fi
outdir=$resultdir/data-async$async-mode$mode
outdir=$resultdir/$async_mode-$sub_mode
mkdir $outdir
rm -f sub-top.log
$ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$!
bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$!
top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$!
bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$!
tail -f $outdir/sub.log & xpid=$!
ssh $pubremote ". $remotedir/run-publisher.tmp"
kill $tpid
kill -2 $spid
kill $spid
eval $killremotesubs
sleep 1
kill $lpid $xpid
kill $xpid
wait
scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir
scp $pubremote:$remotedir/pub.log $outdir
done
done

View file

@ -1,59 +1,76 @@
#!/usr/bin/perl -w
# Note: this is specialized for async delivery, listener mode because of the way it deals with
# thread names
use strict;
my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1",
"async1-mode-1", "async1-mode0", "async1-mode1");
my %res = ();
my %meas;
while (<>) {
next unless s/^\[\d+\] \d+\.\d+\s+//;
if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) {
# size is always the first line of an output block
# ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
my %tmp = %meas;
push @{$res{$meas{size}}}, \%tmp if %meas;
%meas = (size => $1, rate => $2, cookedbw => $3,
rawxmitbw => 0, rawrecvbw => 0,
subrss => 0, pubrss => 0,
subcpu => 0, subrecv => 0,
pubcpu => 0, pubrecv => 0);
} elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
my $side = defined $1 ? "pub" : "sub";
$meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1);
$meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
$meas{"${side}recv"} = cpuload ("recvUC", $_);
} elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
$meas{rawxmitbw} = $1;
$meas{rawrecvbw} = $2;
}
}
push @{$res{$meas{size}}}, \%meas if %meas;
die "no data found" unless keys %res > 0;
my $dataset = 0;
my $basedir = "throughput-result";
$basedir = $ARGV[0] if @ARGV== 1;
my $load_threshold = 20;
for my $dir (@dirs) {
my @loads = ();
print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
my @sizes = sort { $a <=> $b } keys %res;
for my $sz (@sizes) {
my $ms = $res{$sz};
my $rate = median ("rate", $ms);
my $cookedbw = median ("cookedbw", $ms);
my $rawxmitbw = median ("rawxmitbw", $ms);
my $rawrecvbw = median ("rawrecvbw", $ms);
my $pubrss = max ("pubrss", $ms);
my $subrss = max ("subrss", $ms);
my $pubcpu = median ("pubcpu", $ms);
my $pubrecv = median ("pubrecv", $ms);
my $subcpu = median ("subcpu", $ms);
my $subrecv = median ("subrecv", $ms);
print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
}
{
open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log";
my @curload = ();
while (<LH>) {
next unless /^r +([0-9.]+).*\( *(\d+)/;
push @curload, $2 if $1 > $load_threshold;
if (@curload && $1 < $load_threshold) {
push @loads, median (@curload);
@curload = ();
}
}
push @loads, median (@curload) if @curload;
close LH;
sub cpuload {
my ($thread, $line) = @_;
$thread =~ s/\./\\./g;
if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
return $1+$2;
} else {
return 0;
}
}
open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log";
print "\n\n" if $dataset++;
print "# mode $dir\n";
print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n";
my $psz;
my @rate = ();
while (<FH>) {
next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/;
my $psz_cur = $1; my $rate_cur = $2;
$psz = $psz_cur unless defined $psz;
if ($psz != $psz_cur) {
my $load = shift @loads;
my $rate = median (@rate);
printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
@rate = ();
}
$psz = $psz_cur;
push @rate, ($rate_cur + 0.0);
}
my $load = shift @loads;
my $rate = median (@rate);
printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
close FH;
sub max {
my $v;
for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
return $v;
}
sub median {
my @xs = sort { $a <=> $b } @_;
my @xs = sort { $a <=> $b } (extract (@_));
return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
}
sub extract {
my ($key, $msref) = @_;
return map { $_->{$key} } @$msref;
}

View file

@ -1,14 +1,55 @@
#!/bin/bash
`dirname $0`/throughput-test-extract > data.txt
`dirname $0`/throughput-test-extract "$@" > data.txt
gnuplot <<\EOF
set term png size 1024,768
set output "throughput-polling.png"
set st d l
set title "Throughput (polling with 1ms sleeps)"
set ylabel "M sample/s"
set y2label "Mbps"
set y2tics
set term pngcairo size 1024,768
set output "throughput-async-listener-rate.png"
set st d lp
set st li 1 lw 2
set st li 2 lw 2
set st li 3 lw 2
set multiplot
set logscale xyy2
set title "Throughput"
set ylabel "[Mbps]"
set ytics (100,200,300,400,500,600,700,800,900,1000)
set grid xtics ytics mytics
set xlabel "payload size [bytes]"
p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]"
# sample rate in data.txt is in kS/s
# GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps
set key at graph 1, 0.9
p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth"
set ytics auto
set key default
unset xlabel
unset title
set grid nomytics
set ylabel "[M sample/s]"
set origin .3, .1
set size .6, .6
clear
p "data.txt" u 1:($2/1e3) ti "rate"
unset multiplot
unset origin
unset size
unset logscale
set logscale x
set output "throughput-async-listener-memory.png"
set title "Throughput: memory"
set ylabel "RSS [MB]"
set xlabel "payload size [bytes]"
p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber"
unset logscale
set logscale x
set output "throughput-async-listener-cpu.png"
set title "Throughput: CPU"
set ylabel "CPU [%]"
set xlabel "payload size [bytes]"
p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)"
EOF