Use ddsperf in perf scripts and update README

* the scripted throughput test originally used for the throughput graph in the README now uses ddsperf; * a scripted latency test has been added; * updated the README with the results of these tests (and so now gives easy access not only to throughput, but also to latency and memory usage, as well as to latency over GbE. Signed-off-by: Erik Boasson <eb@ilities.com>
2019-08-02 09:06:39 +02:00 · 2019-08-02 09:06:39 +02:00 · af19c5681c
commit af19c5681c
parent 952029dba0
7 changed files with 554 additions and 189 deletions
--- a/README.md
+++ b/README.md
@ -1,9 +1,12 @@
 # Eclipse Cyclone DDS
-Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the
+Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation.  Cyclone DDS is developed completely in the open as an Eclipse IoT project
 market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project
 (see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)).
 * [Getting Started](#getting-started)
 * [Performance](#performance)
 * [Configuration](#configuration)
 # Getting Started
 ## Building Eclipse Cyclone DDS
@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con
 arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at
 the moment, OpenSSL).
-## Configuration
+## Documentation
 The documentation is still rather limited, and at the moment only available in the sources (in the
 form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
 a
 [PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
 intent is to automate the process of building the documentation and have them available in more
 convenient formats and in the usual locations.
 ## Building and Running the Roundtrip Example
 We will show you how to build and run an example program that measures latency.  The examples are
 built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
 to run the program, it is merely to illustrate the process.
    $ cd cyclonedds/examples/roundtrip
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
 On one terminal start the application that will be responding to pings:
    $ ./RoundtripPong
 On another terminal, start the application that will be sending the pings:
    $ ./RoundtripPing 0 0 0 
    # payloadSize: 0 | numSamples: 0 | timeOut: 0
    # Waiting for startup jitter to stabilise
    # Warm up complete.
    # Latency measurements (in us)
    #             Latency [us]                                   Write-access time [us]       Read-access time [us]
    # Seconds     Count   median      min      99%      max      Count   median      min      Count   median      min
        1     28065       17       16       23       87      28065        8        6      28065        1        0
        2     28115       17       16       23       46      28115        8        6      28115        1        0
        3     28381       17       16       22       46      28381        8        6      28381        1        0
        4     27928       17       16       24      127      27928        8        6      27928        1        0
        5     28427       17       16       20       47      28427        8        6      28427        1        0
        6     27685       17       16       26       51      27685        8        6      27685        1        0
        7     28391       17       16       23       47      28391        8        6      28391        1        0
        8     27938       17       16       24       63      27938        8        6      27938        1        0
        9     28242       17       16       24      132      28242        8        6      28242        1        0
       10     28075       17       16       23       46      28075        8        6      28075        1        0
 The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018.  From
 these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
 micro-seconds (used to be 25 micro-seconds) on this HW.
 # Performance
 Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100
 byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two
 Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the
 executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build.
 <img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-rate.png" alt="Throughput" height="400"><img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener.png" alt="Throughput" height="400">
 This is with the subscriber in listener mode, using asynchronous delivery for the throughput
 test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum
 message size and fragment size, and an increased high-water mark for the reliability window on the
 writer side. For details, see the [scripts](examples/perfscript) directory,
 the
 [environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and
 the
 [throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and
 [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data
 underlying the graphs.  These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png).
 # Configuration
 The out-of-the-box configuration should usually be fine, but there are a great many options that can
 be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to
@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a
 dump.  Background information on configuring Cyclone DDS can be
 found [here](https://docs/manual/config.rst).
 ## Documentation
 The documentation is still rather limited, and at the moment only available in the sources (in the
 form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
 a
 [PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
 intent is to automate the process of building the documentation and have them available in more
 convenient formats and in the usual locations.
 ## Performance
 Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R)
 CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64,
 connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is:
 <img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput-polling.png" alt="Throughput" height="400">
 This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the
 message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about
 750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration,
 tweaked only to increase the high-water mark for the reliability window on the writer side. For
 details, see the scripts in the ``performance`` directory and
 the
 [data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt).
 There is some data on roundtrip latency below.
 ## Building and Running the Roundtrip Example
 We will show you how to build and run an example program that measures latency.  The examples are
 built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
 to run the program, it is merely to illustrate the process.
    $ cd cyclonedds/examples/roundtrip
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
 On one terminal start the application that will be responding to pings:
    $ ./RoundtripPong
 On another terminal, start the application that will be sending the pings:
    $ ./RoundtripPing 0 0 0 
    # payloadSize: 0 | numSamples: 0 | timeOut: 0
    # Waiting for startup jitter to stabilise
    # Warm up complete.
    # Round trip measurements (in us)
    #             Round trip time [us]                           Write-access time [us]       Read-access time [us]
    # Seconds     Count   median      min      99%      max      Count   median      min      Count   median      min
        1     28065       17       16       23       87      28065        8        6      28065        1        0
        2     28115       17       16       23       46      28115        8        6      28115        1        0
        3     28381       17       16       22       46      28381        8        6      28381        1        0
        4     27928       17       16       24      127      27928        8        6      27928        1        0
        5     28427       17       16       20       47      28427        8        6      28427        1        0
        6     27685       17       16       26       51      27685        8        6      27685        1        0
        7     28391       17       16       23       47      28391        8        6      28391        1        0
        8     27938       17       16       24       63      27938        8        6      27938        1        0
        9     28242       17       16       24      132      28242        8        6      28242        1        0
       10     28075       17       16       23       46      28075        8        6      28075        1        0
 The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018.  From
 these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
 micro-seconds (used to be 25 micro-seconds) on this HW.
 # Trademarks
 * "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation.
--- a/examples/perfscript/latency-test
+++ b/examples/perfscript/latency-test
@ -0,0 +1,153 @@
 #!/bin/bash
 export nwif=eth0
 bandwidth=1e9
 remotedir="$PWD"
 provision=false
 asynclist="sync async"
 modelist="listener waitset"
 sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
 timeout=30
 loopback=true
 resultdir="latency-result"
 usage () {
    cat >&2 <<EOF
 usage: $0 [OPTIONS] user@remote [user@remote...]
 OPTIONS
  -i IF        use network interface IF (default: $nwif)
  -b 100|1000|B  network bandwidth (100Mbps/1000Gbps or B bits/sec) for
               calculating load % given load in bytes/second (default: 1000)
  -d DIR       use DIR on remote (default: PWD)
  -p           provision required binaries in DIR (default: $provision)
               first ssh's in to try mkdir -p DIR, then follows up with scp
  -t DUR       run for DUR seconds per size (default $timeout)
  -a ASYNCLIST run for delivery async settings ASYNCLIST (default:
               "$asynclist")
  -m MODELIST  run with subscriber mode settings MODELIST (default: 
               "$modelist")
  -s SIZELIST  run for sizes in SIZELIST (default: "$sizelist")
               if SIZELIST is empty, it uses ddsperf's OU topic instead
  -l LOOPBACK  enable/disable multicast loopback (true/false, default: $loopback)
  -o DIR       store results in dir (default: $resultdir)
 Local host runs ddsperf in ping mode, remotes run pong.  It assumes these are
 available in DIR/bin.  It also assumes that ssh user@remote works without
 requiring a password.
 EOF
    exit 1
 }
 while getopts "i:d:pa:m:s:t:o:l:" opt ; do
    case $opt in
        i) nwif="$OPTARG" ;;
        b) case "$OPTARG" in
               100) bandwidth=1e8 ;;
               1000) bandwidth=1e9 ;;
               *) bandwidth="$OPTARG" ;;
           esac ;;
        d) remotedir="$OPTARG" ;;
        p) provision=true ;;
        a) asynclist="$OPTARG" ;;
        m) modelist="$OPTARG" ;;
        s) sizelist="$OPTARG" ;;
        l) loopback="OPTARG" ;;
        t) timeout="$OPTARG" ;;
        o) resultdir="$OPTARG" ;;
        h) usage ;;
    esac
 done
 shift $((OPTIND-1))
 if [ $# -lt 1 ] ; then usage ; fi
 cfg=cdds-simple.xml
 cat >$cfg <<EOF
 <CycloneDDS>
  <Domain>
    <Id>17</Id>
  </Domain>
  <General>
    <NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
    <EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
    <MaxMessageSize>65500B</MaxMessageSize>
    <FragmentSize>4000B</FragmentSize>
  </General>
  <Internal>
    <Watermarks>
      <WhcHigh>500kB</WhcHigh>
    </Watermarks>
    <SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
    <LeaseDuration>3s</LeaseDuration>
  </Internal>
  <Tracing>
    <Verbosity>config</Verbosity>
  </Tracing>
 </CycloneDDS>
 EOF
 if [ ! -x bin/ddsperf ] ; then
    echo "bin/ddsperf not found on the local machine" >&2
    exit 1
 fi
 [ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; }
 if $provision ; then
    echo "provisioning ..."
    for r in $pubremote "$@" ; do
        ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
        scp lib/libddsc.so.0 $r:$remotedir/lib
        scp bin/ddsperf $r:$remotedir/bin
    done
 fi
 topic=KS
 [ -z "$sizelist" ] && topic=OU
 export CYCLONEDDS_URI=file://$PWD/$cfg
 for r in "$@" ; do
    scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
 done
 for async_mode in $asynclist ; do
    case "$async_mode" in
        sync) async=0 ;;
        async) async=1 ;;
        *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
    esac
    export async
    for sub_mode in $modelist ; do
        echo "======== ASYNC $async MODE $sub_mode ========="
        cat > run-pong.tmp <<EOF
 export CYCLONEDDS_URI=file://$remotedir/$cfg
 export async=$async
 cd $remotedir
 nohup bin/ddsperf -T $topic pong $sub_mode > /dev/null &
 echo \$!
 EOF
        killpongs=""
        for r in "$@" ; do
            scp run-pong.tmp $r:$remotedir
            rpongpid=`ssh $r ". $remotedir/run-pong.tmp"`
            killpongs="$killpongs ssh $r kill -9 $rpongpid &"
        done
        outdir=$resultdir/$async_mode-$sub_mode
        mkdir $outdir
        touch $outdir/ping.log
        tail -f $outdir/ping.log & xpid=$!
        for size in ${sizelist:-0} ; do
            echo "size $size"
            bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log
            sleep 5
        done
        eval $killpongs
        sleep 1
        kill $xpid
        wait
    done
 done
--- a/examples/perfscript/latency-test-extract
+++ b/examples/perfscript/latency-test-extract
@ -0,0 +1,95 @@
 #!/usr/bin/perl -w
 # Note: this is specialized for async delivery, listener mode because of the way it deals with
 # thread names
 use strict;
 my %res = ();
 my %meas;
 while (<>) {
  next unless s/^\[\d+\] \d+\.\d+\s+//;
  if (s/^[^\@:]+:\d+\s+size (\d+) //) {
    # size is always the first line of an output block
    # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
    my %tmp = %meas;
    push @{$res{$meas{size}}}, \%tmp if %meas;
    %meas = (size => $1,
             rawxmitbw => 0, rawrecvbw => 0,
             subrss => 0, pubrss => 0,
             subcpu => 0, subrecv => 0,
             pubcpu => 0, pubrecv => 0);
    $meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//;
    die unless /cnt \d+$/;
  } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
    my $side = defined $1 ? "pub" : "sub";
    $meas{"${side}rss"}  = $2 / ($3 eq "k" ? 1024.0 : 1);
    $meas{"${side}cpu"}  = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
    $meas{"${side}recv"} = cpuload ("recvUC", $_);
  } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
    $meas{rawxmitbw} = $1;
    $meas{rawrecvbw} = $2;
  }
 }
 push @{$res{$meas{size}}}, \%meas if %meas;
 die "no data found" unless keys %res > 0;
 print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
 my @sizes = sort { $a <=> $b } keys %res;
 for my $sz (@sizes) {
  my $ms = $res{$sz};
  my $min = min ("min", $ms);
  my $max = max ("max", $ms);
  my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off
  my $median = max ("50%", $ms); # also not quite correct ...
  my $p90 = max ("90%", $ms);
  my $p99 = max ("99%", $ms);
  my $rawxmitbw = median ("rawxmitbw", $ms);
  my $rawrecvbw = median ("rawrecvbw", $ms);
  my $pubrss = max ("pubrss", $ms);
  my $subrss = max ("subrss", $ms);
  my $pubcpu = median ("pubcpu", $ms);
  my $pubrecv = median ("pubrecv", $ms);
  my $subcpu = median ("subcpu", $ms);
  my $subrecv = median ("subrecv", $ms);
  print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
 }
 sub cpuload {
  my ($thread, $line) = @_;
  $thread =~ s/\./\\./g;
  if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
    return $1+$2;
  } else {
    return 0;
  }
 }
 sub max {
  my $v;
  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
  return $v;
 }
 sub min {
  my $v;
  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; }
  return $v;
 }
 sub mean {
  my $v = 0;
  my @xs = extract (@_);
  $v += $_ for @xs;
  return $v / @xs;
 }
 sub median {
  my @xs = sort { $a <=> $b } (extract (@_));
  return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
 }
 sub extract {
  my ($key, $msref) = @_;
  return map { $_->{$key} } @$msref;
 }
--- a/examples/perfscript/latency-test-plot
+++ b/examples/perfscript/latency-test-plot
@ -0,0 +1,46 @@
 #!/bin/bash
 `dirname $0`/latency-test-extract "$@" > data.txt
 gnuplot <<\EOF
 set term pngcairo size 1024,768
 set output "latency-sync-listener.png"
 set st d lp
 set st li 1 lw 2
 set st li 2 lw 2
 set st li 3 lw 2
 set st li 4 lw 2
 set st li 5 lw 2
 set multiplot
 set logscale xy
 set title "Latency"
 set ylabel "[us]"
 set grid xtics ytics mytics
 set xlabel "payload size [bytes]"
 p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
 unset logscale y
 unset xlabel
 unset ylabel
 unset title
 set grid nomytics
 set origin .1, .43
 set size .55, .5
 clear
 p [10:1000]  "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
 unset multiplot
 unset origin
 unset size
 unset logscale
 set logscale x
 set output "latency-sync-listener-bwcpu.png"
 set title "Latency: network bandwidth and CPU usage"
 set y2tics
 set ylabel "[Mbps]"
 set y2label "CPU [%]"
 set xlabel "payload size [bytes]"
 set key at graph 1, 0.7
 p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)"
 EOF
--- a/examples/perfscript/throughput-test
+++ b/examples/perfscript/throughput-test
@ -1,45 +1,53 @@
 #!/bin/bash
 export nwif=eth0
 bandwidth=1e9
 remotedir="$PWD"
 provision=false
 asynclist="sync async"
 modelist="listener polling waitset"
 sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
 timeout=30
 loopback=true
 resultdir="throughput-result"
 usage () {
    cat >&2 <<EOF
 usage: $0 [OPTIONS] user@remote [user@remote...]
 OPTIONS
-  -i IF        use network interface IF (default: eth0)
+  -i IF        use network interface IF (default: $nwif)
-  -b 100|1000  network bandwidth (100Mbps/1000Gbps) for calculating load
+  -b 100|1000|B  network bandwidth (100Mbps/1000Gbps or B bits/sec) for
-               % given load in bytes/second (default: 1000)
+               calculating load % given load in bytes/second (default: 1000)
  -d DIR       use DIR on remote (default: PWD)
-  -p           provision required binaries in DIR (default: false)
+  -p           provision required binaries in DIR (default: $provision)
               first ssh's in to try mkdir -p DIR, then follows up with scp
-  -t DUR       run for DUR seconds per size (default 20)
+  -t DUR       run for DUR seconds per size (default $timeout)
-  -a ASYNCLIST run for delivery async settings ASYNCLIST (default: "0 1")
+  -a ASYNCLIST run for delivery async settings ASYNCLIST (default:
-  -m MODELIST  run with subscriber mode settings MODELIST (default: "-1 0 1")
+               "$asynclist")
-  -s SIZELIST  run for sizes in SIZELIST (default: "0 16 32 64 128 256")
+  -m MODELIST  run with subscriber mode settings MODELIST (default: 
-  -l LOOPBACK  enable/disable multicast loopback (true/false, default: true)
+               "$modelist")
-  -o DIR       store results in dir (default: throughput-result)
+  -s SIZELIST  run for sizes in SIZELIST (default: "$sizelist")
               if SIZELIST is empty, it uses ddsperf's OU topic instead
  -l LOOPBACK  enable/disable multicast loopback (true/false, default: $loopback)
  -o DIR       store results in dir (default: $resultdir)
-Local host runs ThroughputSubscriber, first remote runs ThroughputPublisher,
+Local host runs "ddsperf" in subscriber mode, first remote runs it publisher
-further remotes also run ThroughputSubscriber.  It assumes these are
+mode, further remotes also run subcribers.  It assumes these are available in
-available in DIR/bin.  It also assumes that ssh user@remote works without
+DIR/bin.  It also assumes that ssh user@remote works without requiring a
-requiring a password.
+password.
 EOF
    exit 1
 }
 export nwif=eth0
 bandwidth=1000
 remotedir="$PWD"
 provision=false
 asynclist="0 1"
 modelist="-1 0 1"
 sizelist="0 16 32 64 128 256"
 timeout=20
 loopback=true
 resultdir="throughput-result"
 while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
    case $opt in
        i) nwif="$OPTARG" ;;
-        b) bandwidth="$OPTARG" ;;
+        b) case "$OPTARG" in
               100) bandwidth=1e8 ;;
               1000) bandwidth=1e9 ;;
               *) bandwidth="$OPTARG" ;;
           esac ;;
        d) remotedir="$OPTARG" ;;
        p) provision=true ;;
        a) asynclist="$OPTARG" ;;
@ -53,7 +61,6 @@ while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
 done
 shift $((OPTIND-1))
 if [ $# -lt 1 ] ; then usage ; fi
 ethload=`dirname $0`/ethload
 pubremote=$1
 shift
@ -63,24 +70,27 @@ cat >$cfg <<EOF
  <Domain>
    <Id>17</Id>
  </Domain>
  <DDSI2E>
  <General>
    <NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
    <EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
    <MaxMessageSize>65500B</MaxMessageSize>
    <FragmentSize>4000B</FragmentSize>
  </General>
  <Internal>
    <Watermarks>
      <WhcHigh>500kB</WhcHigh>
    </Watermarks>
-      <SynchronousDeliveryPriorityThreshold>${async:-0}</SynchronousDeliveryPriorityThreshold>
+    <SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
    <LeaseDuration>3s</LeaseDuration>
  </Internal>
-  </DDSI2E>
+  <Tracing>
    <Verbosity>config</Verbosity>
  </Tracing>
 </CycloneDDS>
 EOF
-if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then
+if [ ! -x bin/ddsperf ] ; then
-    echo "some check for existence of a file failed on the local machine" >&2
+    echo "bin/ddsperf not found on the local machine" >&2
    exit 1
 fi
@ -91,33 +101,35 @@ if $provision ; then
    for r in $pubremote "$@" ; do
        ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
        scp lib/libddsc.so.0 $r:$remotedir/lib
-        scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin
+        scp bin/ddsperf $r:$remotedir/bin
    done
 fi
 topic=KS
 [ -z "$sizelist" ] && topic=OU
 export CYCLONEDDS_URI=file://$PWD/$cfg
 for r in $pubremote "$@" ; do
    scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
 done
-for async in $asynclist ; do
+for async_mode in $asynclist ; do
    case "$async_mode" in
        sync) async=0 ;;
        async) async=1 ;;
        *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
    esac
    export async
-    for mode in $modelist ; do
+    for sub_mode in $modelist ; do
-        echo "======== ASYNC $async MODE $mode ========="
+        echo "======== ASYNC $async MODE $sub_mode ========="
        cat > run-publisher.tmp <<EOF
 export CYCLONEDDS_URI=file://$remotedir/$cfg
 export async=$async
 cd $remotedir
-rm -f pub-top.log
+for size in ${sizelist:-0} ; do
 for size in $sizelist ; do
  echo "size \$size"
-  bin/ThroughputPublisher \$size > pub.log & ppid=\$!
+  bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log
  top -b -d1 -p \$ppid >> pub-top.log & tpid=\$!
  sleep $timeout
  kill \$tpid
  kill -2 \$ppid
  wait \$ppid
  sleep 5
 done
 wait
@ -129,7 +141,7 @@ EOF
 export CYCLONEDDS_URI=file://$remotedir/$cfg
 export async=$async
 cd $remotedir
-nohup bin/ThroughputSubscriber 0 $mode > /dev/null &
+nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null &
 echo \$!
 EOF
            for r in "$@" ; do
@ -139,21 +151,17 @@ EOF
            done
        fi
-        outdir=$resultdir/data-async$async-mode$mode
+        outdir=$resultdir/$async_mode-$sub_mode
        mkdir $outdir
-        rm -f sub-top.log
+        bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$!
        $ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$!
        bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$!
        top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$!
        tail -f $outdir/sub.log & xpid=$!
        ssh $pubremote ". $remotedir/run-publisher.tmp"
-        kill $tpid
+        kill $spid
        kill -2 $spid
        eval $killremotesubs
        sleep 1
-        kill $lpid $xpid
+        kill $xpid
        wait
-        scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir
+        scp $pubremote:$remotedir/pub.log $outdir
    done
 done
--- a/examples/perfscript/throughput-test-extract
+++ b/examples/perfscript/throughput-test-extract
@ -1,59 +1,76 @@
 #!/usr/bin/perl -w
 # Note: this is specialized for async delivery, listener mode because of the way it deals with
 # thread names
 use strict;
-my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1",
+my %res = ();
-            "async1-mode-1", "async1-mode0", "async1-mode1");
+my %meas;
-
+while (<>) {
-my $dataset = 0;
+  next unless s/^\[\d+\] \d+\.\d+\s+//;
-my $basedir = "throughput-result";
+  if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) {
-$basedir = $ARGV[0] if @ARGV== 1;
+    # size is always the first line of an output block
-my $load_threshold = 20;
+    # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
-for my $dir (@dirs) {
+    my %tmp = %meas;
-  my @loads = ();
+    push @{$res{$meas{size}}}, \%tmp if %meas;
-
+    %meas = (size => $1, rate => $2, cookedbw => $3,
-  {
+             rawxmitbw => 0, rawrecvbw => 0,
-    open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log";
+             subrss => 0, pubrss => 0,
-    my @curload = ();
+             subcpu => 0, subrecv => 0,
-    while (<LH>) {
+             pubcpu => 0, pubrecv => 0);
-      next unless /^r +([0-9.]+).*\( *(\d+)/;
+  } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
-      push @curload, $2 if $1 > $load_threshold;
+    my $side = defined $1 ? "pub" : "sub";
-      if (@curload && $1 < $load_threshold) {
+    $meas{"${side}rss"}  = $2 / ($3 eq "k" ? 1024.0 : 1);
-        push @loads, median (@curload);
+    $meas{"${side}cpu"}  = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
-        @curload = ();
+    $meas{"${side}recv"} = cpuload ("recvUC", $_);
  } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
    $meas{rawxmitbw} = $1;
    $meas{rawrecvbw} = $2;
  }
 }
-    push @loads, median (@curload) if @curload;
+push @{$res{$meas{size}}}, \%meas if %meas;
-    close LH;
+die "no data found" unless keys %res > 0;
 print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
 my @sizes = sort { $a <=> $b } keys %res;
 for my $sz (@sizes) {
  my $ms = $res{$sz};
  my $rate = median ("rate", $ms);
  my $cookedbw = median ("cookedbw", $ms);
  my $rawxmitbw = median ("rawxmitbw", $ms);
  my $rawrecvbw = median ("rawrecvbw", $ms);
  my $pubrss = max ("pubrss", $ms);
  my $subrss = max ("subrss", $ms);
  my $pubcpu = median ("pubcpu", $ms);
  my $pubrecv = median ("pubrecv", $ms);
  my $subcpu = median ("subcpu", $ms);
  my $subrecv = median ("subrecv", $ms);
  print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
 }
-  open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log";
+sub cpuload {
-  print "\n\n" if $dataset++;
+  my ($thread, $line) = @_;
-  print "# mode $dir\n";
+  $thread =~ s/\./\\./g;
-  print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n";
+  if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
-  my $psz;
+    return $1+$2;
-  my @rate = ();
+  } else {
-  while (<FH>) {
+    return 0;
    next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/;
    my $psz_cur = $1; my $rate_cur = $2;
    $psz = $psz_cur unless defined $psz;
    if ($psz != $psz_cur) {
      my $load = shift @loads;
      my $rate = median (@rate);
      printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
      @rate = ();
  }
    $psz = $psz_cur;
    push @rate, ($rate_cur + 0.0);
 }
-  my $load = shift @loads;
+
-  my $rate = median (@rate);
+sub max {
-  printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
+  my $v;
-  close FH;
+  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
  return $v;
 }
 sub median {
-  my @xs = sort { $a <=> $b } @_;
+  my @xs = sort { $a <=> $b } (extract (@_));
  return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
 }
 sub extract {
  my ($key, $msref) = @_;
  return map { $_->{$key} } @$msref;
 }
--- a/examples/perfscript/throughput-test-plot
+++ b/examples/perfscript/throughput-test-plot
@ -1,14 +1,55 @@
 #!/bin/bash
-`dirname $0`/throughput-test-extract > data.txt
+`dirname $0`/throughput-test-extract "$@" > data.txt
 gnuplot <<\EOF
-set term png size 1024,768
+set term pngcairo size 1024,768
-set output "throughput-polling.png"
+set output "throughput-async-listener-rate.png"
-set st d l
+set st d lp
-set title "Throughput (polling with 1ms sleeps)"
+set st li 1 lw 2
-set ylabel "M sample/s"
+set st li 2 lw 2
-set y2label "Mbps"
+set st li 3 lw 2
-set y2tics
+
 set multiplot
 set logscale xyy2
 set title "Throughput"
 set ylabel "[Mbps]"
 set ytics (100,200,300,400,500,600,700,800,900,1000)
 set grid xtics ytics mytics
 set xlabel "payload size [bytes]"
-p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]"
+# sample rate in data.txt is in kS/s
 # GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps
 set key at graph 1, 0.9
 p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth"
 set ytics auto
 set key default
 unset xlabel
 unset title
 set grid nomytics
 set ylabel "[M sample/s]"
 set origin .3, .1
 set size .6, .6
 clear
 p "data.txt" u 1:($2/1e3) ti "rate"
 unset multiplot
 unset origin
 unset size
 unset logscale
 set logscale x
 set output "throughput-async-listener-memory.png"
 set title "Throughput: memory"
 set ylabel "RSS [MB]"
 set xlabel "payload size [bytes]"
 p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber"
 unset logscale
 set logscale x
 set output "throughput-async-listener-cpu.png"
 set title "Throughput: CPU"
 set ylabel "CPU [%]"
 set xlabel "payload size [bytes]"
 p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)"
 EOF