diff --git a/README.md b/README.md
index d8ac3e3..6f30c41 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,12 @@
# Eclipse Cyclone DDS
-Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the
-market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project
+Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation. Cyclone DDS is developed completely in the open as an Eclipse IoT project
(see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)).
+* [Getting Started](#getting-started)
+* [Performance](#performance)
+* [Configuration](#configuration)
+
# Getting Started
## Building Eclipse Cyclone DDS
@@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con
arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at
the moment, OpenSSL).
-## Configuration
+## Documentation
+
+The documentation is still rather limited, and at the moment only available in the sources (in the
+form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
+a
+[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
+intent is to automate the process of building the documentation and have them available in more
+convenient formats and in the usual locations.
+
+## Building and Running the Roundtrip Example
+
+We will show you how to build and run an example program that measures latency. The examples are
+built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
+to run the program, it is merely to illustrate the process.
+
+ $ cd cyclonedds/examples/roundtrip
+ $ mkdir build
+ $ cd build
+ $ cmake ..
+ $ make
+
+On one terminal start the application that will be responding to pings:
+
+ $ ./RoundtripPong
+
+On another terminal, start the application that will be sending the pings:
+
+ $ ./RoundtripPing 0 0 0
+ # payloadSize: 0 | numSamples: 0 | timeOut: 0
+ # Waiting for startup jitter to stabilise
+ # Warm up complete.
+ # Latency measurements (in us)
+ # Latency [us] Write-access time [us] Read-access time [us]
+ # Seconds Count median min 99% max Count median min Count median min
+ 1 28065 17 16 23 87 28065 8 6 28065 1 0
+ 2 28115 17 16 23 46 28115 8 6 28115 1 0
+ 3 28381 17 16 22 46 28381 8 6 28381 1 0
+ 4 27928 17 16 24 127 27928 8 6 27928 1 0
+ 5 28427 17 16 20 47 28427 8 6 28427 1 0
+ 6 27685 17 16 26 51 27685 8 6 27685 1 0
+ 7 28391 17 16 23 47 28391 8 6 28391 1 0
+ 8 27938 17 16 24 63 27938 8 6 27938 1 0
+ 9 28242 17 16 24 132 28242 8 6 28242 1 0
+ 10 28075 17 16 23 46 28075 8 6 28075 1 0
+
+The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From
+these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
+micro-seconds (used to be 25 micro-seconds) on this HW.
+
+# Performance
+
+Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100
+byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two
+Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the
+executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build.
+
+
+
+This is with the subscriber in listener mode, using asynchronous delivery for the throughput
+test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum
+message size and fragment size, and an increased high-water mark for the reliability window on the
+writer side. For details, see the [scripts](examples/perfscript) directory,
+the
+[environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and
+the
+[throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and
+[latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data
+underlying the graphs. These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png).
+
+# Configuration
The out-of-the-box configuration should usually be fine, but there are a great many options that can
be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to
@@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a
dump. Background information on configuring Cyclone DDS can be
found [here](https://docs/manual/config.rst).
-## Documentation
-
-The documentation is still rather limited, and at the moment only available in the sources (in the
-form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
-a
-[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
-intent is to automate the process of building the documentation and have them available in more
-convenient formats and in the usual locations.
-
-## Performance
-
-Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R)
-CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64,
-connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is:
-
-
-
-This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the
-message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about
-750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration,
-tweaked only to increase the high-water mark for the reliability window on the writer side. For
-details, see the scripts in the ``performance`` directory and
-the
-[data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt).
-
-There is some data on roundtrip latency below.
-
-## Building and Running the Roundtrip Example
-
-We will show you how to build and run an example program that measures latency. The examples are
-built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
-to run the program, it is merely to illustrate the process.
-
- $ cd cyclonedds/examples/roundtrip
- $ mkdir build
- $ cd build
- $ cmake ..
- $ make
-
-On one terminal start the application that will be responding to pings:
-
- $ ./RoundtripPong
-
-On another terminal, start the application that will be sending the pings:
-
- $ ./RoundtripPing 0 0 0
- # payloadSize: 0 | numSamples: 0 | timeOut: 0
- # Waiting for startup jitter to stabilise
- # Warm up complete.
- # Round trip measurements (in us)
- # Round trip time [us] Write-access time [us] Read-access time [us]
- # Seconds Count median min 99% max Count median min Count median min
- 1 28065 17 16 23 87 28065 8 6 28065 1 0
- 2 28115 17 16 23 46 28115 8 6 28115 1 0
- 3 28381 17 16 22 46 28381 8 6 28381 1 0
- 4 27928 17 16 24 127 27928 8 6 27928 1 0
- 5 28427 17 16 20 47 28427 8 6 28427 1 0
- 6 27685 17 16 26 51 27685 8 6 27685 1 0
- 7 28391 17 16 23 47 28391 8 6 28391 1 0
- 8 27938 17 16 24 63 27938 8 6 27938 1 0
- 9 28242 17 16 24 132 28242 8 6 28242 1 0
- 10 28075 17 16 23 46 28075 8 6 28075 1 0
-
-The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From
-these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
-micro-seconds (used to be 25 micro-seconds) on this HW.
-
# Trademarks
* "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation.
diff --git a/examples/perfscript/latency-test b/examples/perfscript/latency-test
new file mode 100755
index 0000000..ead88ef
--- /dev/null
+++ b/examples/perfscript/latency-test
@@ -0,0 +1,153 @@
+#!/bin/bash
+
+export nwif=eth0
+bandwidth=1e9
+remotedir="$PWD"
+provision=false
+asynclist="sync async"
+modelist="listener waitset"
+sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
+timeout=30
+loopback=true
+resultdir="latency-result"
+
+usage () {
+ cat >&2 <$cfg <
+
+ 17
+
+
+ $nwif
+ $loopback
+ 65500B
+ 4000B
+
+
+
+ 500kB
+
+ \${async:-0}
+ 3s
+
+
+ config
+
+
+EOF
+
+if [ ! -x bin/ddsperf ] ; then
+ echo "bin/ddsperf not found on the local machine" >&2
+ exit 1
+fi
+
+[ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; }
+
+if $provision ; then
+ echo "provisioning ..."
+ for r in $pubremote "$@" ; do
+ ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
+ scp lib/libddsc.so.0 $r:$remotedir/lib
+ scp bin/ddsperf $r:$remotedir/bin
+ done
+fi
+
+topic=KS
+[ -z "$sizelist" ] && topic=OU
+
+export CYCLONEDDS_URI=file://$PWD/$cfg
+for r in "$@" ; do
+ scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
+done
+
+for async_mode in $asynclist ; do
+ case "$async_mode" in
+ sync) async=0 ;;
+ async) async=1 ;;
+ *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
+ esac
+ export async
+ for sub_mode in $modelist ; do
+ echo "======== ASYNC $async MODE $sub_mode ========="
+
+
+ cat > run-pong.tmp < /dev/null &
+echo \$!
+EOF
+ killpongs=""
+ for r in "$@" ; do
+ scp run-pong.tmp $r:$remotedir
+ rpongpid=`ssh $r ". $remotedir/run-pong.tmp"`
+ killpongs="$killpongs ssh $r kill -9 $rpongpid &"
+ done
+
+ outdir=$resultdir/$async_mode-$sub_mode
+ mkdir $outdir
+
+ touch $outdir/ping.log
+ tail -f $outdir/ping.log & xpid=$!
+ for size in ${sizelist:-0} ; do
+ echo "size $size"
+ bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log
+ sleep 5
+ done
+ eval $killpongs
+ sleep 1
+ kill $xpid
+ wait
+ done
+done
diff --git a/examples/perfscript/latency-test-extract b/examples/perfscript/latency-test-extract
new file mode 100755
index 0000000..b665e88
--- /dev/null
+++ b/examples/perfscript/latency-test-extract
@@ -0,0 +1,95 @@
+#!/usr/bin/perl -w
+
+# Note: this is specialized for async delivery, listener mode because of the way it deals with
+# thread names
+
+use strict;
+
+my %res = ();
+my %meas;
+while (<>) {
+ next unless s/^\[\d+\] \d+\.\d+\s+//;
+ if (s/^[^\@:]+:\d+\s+size (\d+) //) {
+ # size is always the first line of an output block
+ # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
+ my %tmp = %meas;
+ push @{$res{$meas{size}}}, \%tmp if %meas;
+ %meas = (size => $1,
+ rawxmitbw => 0, rawrecvbw => 0,
+ subrss => 0, pubrss => 0,
+ subcpu => 0, subrecv => 0,
+ pubcpu => 0, pubrecv => 0);
+ $meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//;
+ die unless /cnt \d+$/;
+ } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
+ my $side = defined $1 ? "pub" : "sub";
+ $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1);
+ $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
+ $meas{"${side}recv"} = cpuload ("recvUC", $_);
+ } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
+ $meas{rawxmitbw} = $1;
+ $meas{rawrecvbw} = $2;
+ }
+}
+push @{$res{$meas{size}}}, \%meas if %meas;
+die "no data found" unless keys %res > 0;
+
+print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
+my @sizes = sort { $a <=> $b } keys %res;
+for my $sz (@sizes) {
+ my $ms = $res{$sz};
+ my $min = min ("min", $ms);
+ my $max = max ("max", $ms);
+ my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off
+ my $median = max ("50%", $ms); # also not quite correct ...
+ my $p90 = max ("90%", $ms);
+ my $p99 = max ("99%", $ms);
+ my $rawxmitbw = median ("rawxmitbw", $ms);
+ my $rawrecvbw = median ("rawrecvbw", $ms);
+ my $pubrss = max ("pubrss", $ms);
+ my $subrss = max ("subrss", $ms);
+ my $pubcpu = median ("pubcpu", $ms);
+ my $pubrecv = median ("pubrecv", $ms);
+ my $subcpu = median ("subcpu", $ms);
+ my $subrecv = median ("subrecv", $ms);
+ print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
+}
+
+sub cpuload {
+ my ($thread, $line) = @_;
+ $thread =~ s/\./\\./g;
+ if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
+ return $1+$2;
+ } else {
+ return 0;
+ }
+}
+
+sub max {
+ my $v;
+ for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
+ return $v;
+}
+
+sub min {
+ my $v;
+ for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; }
+ return $v;
+}
+
+sub mean {
+ my $v = 0;
+ my @xs = extract (@_);
+ $v += $_ for @xs;
+ return $v / @xs;
+}
+
+sub median {
+ my @xs = sort { $a <=> $b } (extract (@_));
+ return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
+}
+
+sub extract {
+ my ($key, $msref) = @_;
+ return map { $_->{$key} } @$msref;
+}
diff --git a/examples/perfscript/latency-test-plot b/examples/perfscript/latency-test-plot
new file mode 100755
index 0000000..d7217a2
--- /dev/null
+++ b/examples/perfscript/latency-test-plot
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+`dirname $0`/latency-test-extract "$@" > data.txt
+gnuplot <<\EOF
+set term pngcairo size 1024,768
+set output "latency-sync-listener.png"
+set st d lp
+set st li 1 lw 2
+set st li 2 lw 2
+set st li 3 lw 2
+set st li 4 lw 2
+set st li 5 lw 2
+
+set multiplot
+set logscale xy
+set title "Latency"
+set ylabel "[us]"
+set grid xtics ytics mytics
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
+unset logscale y
+unset xlabel
+unset ylabel
+unset title
+set grid nomytics
+set origin .1, .43
+set size .55, .5
+clear
+p [10:1000] "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
+unset multiplot
+
+unset origin
+unset size
+
+unset logscale
+set logscale x
+set output "latency-sync-listener-bwcpu.png"
+set title "Latency: network bandwidth and CPU usage"
+set y2tics
+set ylabel "[Mbps]"
+set y2label "CPU [%]"
+set xlabel "payload size [bytes]"
+set key at graph 1, 0.7
+p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)"
+
+EOF
diff --git a/examples/perfscript/throughput-test b/examples/perfscript/throughput-test
old mode 100644
new mode 100755
index 36edaa3..d654056
--- a/examples/perfscript/throughput-test
+++ b/examples/perfscript/throughput-test
@@ -1,45 +1,53 @@
#!/bin/bash
+export nwif=eth0
+bandwidth=1e9
+remotedir="$PWD"
+provision=false
+asynclist="sync async"
+modelist="listener polling waitset"
+sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
+timeout=30
+loopback=true
+resultdir="throughput-result"
+
usage () {
cat >&2 <$cfg <
17
-
-
- $nwif
- $loopback
-
-
-
- 500kB
-
- ${async:-0}
- 3s
-
-
+
+ $nwif
+ $loopback
+ 65500B
+ 4000B
+
+
+
+ 500kB
+
+ \${async:-0}
+ 3s
+
+
+ config
+
EOF
-if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then
- echo "some check for existence of a file failed on the local machine" >&2
+if [ ! -x bin/ddsperf ] ; then
+ echo "bin/ddsperf not found on the local machine" >&2
exit 1
fi
@@ -91,33 +101,35 @@ if $provision ; then
for r in $pubremote "$@" ; do
ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
scp lib/libddsc.so.0 $r:$remotedir/lib
- scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin
+ scp bin/ddsperf $r:$remotedir/bin
done
fi
+topic=KS
+[ -z "$sizelist" ] && topic=OU
+
export CYCLONEDDS_URI=file://$PWD/$cfg
for r in $pubremote "$@" ; do
scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
done
-for async in $asynclist ; do
+for async_mode in $asynclist ; do
+ case "$async_mode" in
+ sync) async=0 ;;
+ async) async=1 ;;
+ *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
+ esac
export async
- for mode in $modelist ; do
- echo "======== ASYNC $async MODE $mode ========="
+ for sub_mode in $modelist ; do
+ echo "======== ASYNC $async MODE $sub_mode ========="
cat > run-publisher.tmp < pub.log & ppid=\$!
- top -b -d1 -p \$ppid >> pub-top.log & tpid=\$!
- sleep $timeout
- kill \$tpid
- kill -2 \$ppid
- wait \$ppid
+ bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log
sleep 5
done
wait
@@ -129,7 +141,7 @@ EOF
export CYCLONEDDS_URI=file://$remotedir/$cfg
export async=$async
cd $remotedir
-nohup bin/ThroughputSubscriber 0 $mode > /dev/null &
+nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null &
echo \$!
EOF
for r in "$@" ; do
@@ -138,22 +150,18 @@ EOF
killremotesubs="$killremotesubs ssh $r kill -9 $rsubpid &"
done
fi
-
- outdir=$resultdir/data-async$async-mode$mode
+
+ outdir=$resultdir/$async_mode-$sub_mode
mkdir $outdir
- rm -f sub-top.log
- $ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$!
- bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$!
- top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$!
+ bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$!
tail -f $outdir/sub.log & xpid=$!
ssh $pubremote ". $remotedir/run-publisher.tmp"
- kill $tpid
- kill -2 $spid
+ kill $spid
eval $killremotesubs
sleep 1
- kill $lpid $xpid
+ kill $xpid
wait
- scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir
+ scp $pubremote:$remotedir/pub.log $outdir
done
done
diff --git a/examples/perfscript/throughput-test-extract b/examples/perfscript/throughput-test-extract
index 9f16ff3..973f397 100755
--- a/examples/perfscript/throughput-test-extract
+++ b/examples/perfscript/throughput-test-extract
@@ -1,59 +1,76 @@
#!/usr/bin/perl -w
+# Note: this is specialized for async delivery, listener mode because of the way it deals with
+# thread names
+
use strict;
-my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1",
- "async1-mode-1", "async1-mode0", "async1-mode1");
-
-my $dataset = 0;
-my $basedir = "throughput-result";
-$basedir = $ARGV[0] if @ARGV== 1;
-my $load_threshold = 20;
-for my $dir (@dirs) {
- my @loads = ();
-
- {
- open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log";
- my @curload = ();
- while () {
- next unless /^r +([0-9.]+).*\( *(\d+)/;
- push @curload, $2 if $1 > $load_threshold;
- if (@curload && $1 < $load_threshold) {
- push @loads, median (@curload);
- @curload = ();
- }
- }
- push @loads, median (@curload) if @curload;
- close LH;
+my %res = ();
+my %meas;
+while (<>) {
+ next unless s/^\[\d+\] \d+\.\d+\s+//;
+ if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) {
+ # size is always the first line of an output block
+ # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
+ my %tmp = %meas;
+ push @{$res{$meas{size}}}, \%tmp if %meas;
+ %meas = (size => $1, rate => $2, cookedbw => $3,
+ rawxmitbw => 0, rawrecvbw => 0,
+ subrss => 0, pubrss => 0,
+ subcpu => 0, subrecv => 0,
+ pubcpu => 0, pubrecv => 0);
+ } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
+ my $side = defined $1 ? "pub" : "sub";
+ $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1);
+ $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
+ $meas{"${side}recv"} = cpuload ("recvUC", $_);
+ } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
+ $meas{rawxmitbw} = $1;
+ $meas{rawrecvbw} = $2;
}
+}
+push @{$res{$meas{size}}}, \%meas if %meas;
+die "no data found" unless keys %res > 0;
- open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log";
- print "\n\n" if $dataset++;
- print "# mode $dir\n";
- print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n";
- my $psz;
- my @rate = ();
- while () {
- next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/;
- my $psz_cur = $1; my $rate_cur = $2;
- $psz = $psz_cur unless defined $psz;
- if ($psz != $psz_cur) {
- my $load = shift @loads;
- my $rate = median (@rate);
- printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
- @rate = ();
- }
- $psz = $psz_cur;
- push @rate, ($rate_cur + 0.0);
+print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
+my @sizes = sort { $a <=> $b } keys %res;
+for my $sz (@sizes) {
+ my $ms = $res{$sz};
+ my $rate = median ("rate", $ms);
+ my $cookedbw = median ("cookedbw", $ms);
+ my $rawxmitbw = median ("rawxmitbw", $ms);
+ my $rawrecvbw = median ("rawrecvbw", $ms);
+ my $pubrss = max ("pubrss", $ms);
+ my $subrss = max ("subrss", $ms);
+ my $pubcpu = median ("pubcpu", $ms);
+ my $pubrecv = median ("pubrecv", $ms);
+ my $subcpu = median ("subcpu", $ms);
+ my $subrecv = median ("subrecv", $ms);
+ print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
+}
+
+sub cpuload {
+ my ($thread, $line) = @_;
+ $thread =~ s/\./\\./g;
+ if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
+ return $1+$2;
+ } else {
+ return 0;
}
- my $load = shift @loads;
- my $rate = median (@rate);
- printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
- close FH;
+}
+
+sub max {
+ my $v;
+ for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
+ return $v;
}
sub median {
- my @xs = sort { $a <=> $b } @_;
+ my @xs = sort { $a <=> $b } (extract (@_));
return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
}
+sub extract {
+ my ($key, $msref) = @_;
+ return map { $_->{$key} } @$msref;
+}
diff --git a/examples/perfscript/throughput-test-plot b/examples/perfscript/throughput-test-plot
index 9840b07..0e0157f 100755
--- a/examples/perfscript/throughput-test-plot
+++ b/examples/perfscript/throughput-test-plot
@@ -1,14 +1,55 @@
#!/bin/bash
-`dirname $0`/throughput-test-extract > data.txt
+`dirname $0`/throughput-test-extract "$@" > data.txt
gnuplot <<\EOF
-set term png size 1024,768
-set output "throughput-polling.png"
-set st d l
-set title "Throughput (polling with 1ms sleeps)"
-set ylabel "M sample/s"
-set y2label "Mbps"
-set y2tics
+set term pngcairo size 1024,768
+set output "throughput-async-listener-rate.png"
+set st d lp
+set st li 1 lw 2
+set st li 2 lw 2
+set st li 3 lw 2
+
+set multiplot
+set logscale xyy2
+set title "Throughput"
+set ylabel "[Mbps]"
+set ytics (100,200,300,400,500,600,700,800,900,1000)
+set grid xtics ytics mytics
set xlabel "payload size [bytes]"
-p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]"
+# sample rate in data.txt is in kS/s
+# GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps
+set key at graph 1, 0.9
+p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth"
+set ytics auto
+set key default
+
+unset xlabel
+unset title
+set grid nomytics
+set ylabel "[M sample/s]"
+set origin .3, .1
+set size .6, .6
+clear
+p "data.txt" u 1:($2/1e3) ti "rate"
+unset multiplot
+
+unset origin
+unset size
+
+unset logscale
+set logscale x
+set output "throughput-async-listener-memory.png"
+set title "Throughput: memory"
+set ylabel "RSS [MB]"
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber"
+
+unset logscale
+set logscale x
+set output "throughput-async-listener-cpu.png"
+set title "Throughput: CPU"
+set ylabel "CPU [%]"
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)"
+
EOF