diff --git a/README.md b/README.md
index d8ac3e3..6f30c41 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,12 @@
 # Eclipse Cyclone DDS
 
-Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the
-market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project
+Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation.  Cyclone DDS is developed completely in the open as an Eclipse IoT project
 (see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)).
 
+* [Getting Started](#getting-started)
+* [Performance](#performance)
+* [Configuration](#configuration)
+
 # Getting Started
 
 ## Building Eclipse Cyclone DDS
@@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con
 arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at
 the moment, OpenSSL).
 
-## Configuration
+## Documentation
+
+The documentation is still rather limited, and at the moment only available in the sources (in the
+form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
+a
+[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
+intent is to automate the process of building the documentation and have them available in more
+convenient formats and in the usual locations.
+
+## Building and Running the Roundtrip Example
+
+We will show you how to build and run an example program that measures latency.  The examples are
+built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
+to run the program, it is merely to illustrate the process.
+
+    $ cd cyclonedds/examples/roundtrip
+    $ mkdir build
+    $ cd build
+    $ cmake ..
+    $ make
+    
+On one terminal start the application that will be responding to pings:
+
+    $ ./RoundtripPong
+
+On another terminal, start the application that will be sending the pings:
+    
+    $ ./RoundtripPing 0 0 0 
+    # payloadSize: 0 | numSamples: 0 | timeOut: 0
+    # Waiting for startup jitter to stabilise
+    # Warm up complete.
+    # Latency measurements (in us)
+    #             Latency [us]                                   Write-access time [us]       Read-access time [us]
+    # Seconds     Count   median      min      99%      max      Count   median      min      Count   median      min
+        1     28065       17       16       23       87      28065        8        6      28065        1        0
+        2     28115       17       16       23       46      28115        8        6      28115        1        0
+        3     28381       17       16       22       46      28381        8        6      28381        1        0
+        4     27928       17       16       24      127      27928        8        6      27928        1        0
+        5     28427       17       16       20       47      28427        8        6      28427        1        0
+        6     27685       17       16       26       51      27685        8        6      27685        1        0
+        7     28391       17       16       23       47      28391        8        6      28391        1        0
+        8     27938       17       16       24       63      27938        8        6      27938        1        0
+        9     28242       17       16       24      132      28242        8        6      28242        1        0
+       10     28075       17       16       23       46      28075        8        6      28075        1        0
+
+The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018.  From
+these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
+micro-seconds (used to be 25 micro-seconds) on this HW.
+
+# Performance
+
+Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100
+byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two
+Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the
+executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build.
+
+<img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-rate.png" alt="Throughput" height="400"><img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener.png" alt="Throughput" height="400">
+
+This is with the subscriber in listener mode, using asynchronous delivery for the throughput
+test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum
+message size and fragment size, and an increased high-water mark for the reliability window on the
+writer side. For details, see the [scripts](examples/perfscript) directory,
+the
+[environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and
+the
+[throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and
+[latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data
+underlying the graphs.  These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png).
+
+# Configuration
 
 The out-of-the-box configuration should usually be fine, but there are a great many options that can
 be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to
@@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a
 dump.  Background information on configuring Cyclone DDS can be
 found [here](https://docs/manual/config.rst).
 
-## Documentation
-
-The documentation is still rather limited, and at the moment only available in the sources (in the
-form of restructured text files in ``docs`` and Doxygen comments in the header files), or as
-a
-[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The
-intent is to automate the process of building the documentation and have them available in more
-convenient formats and in the usual locations.
-
-## Performance
-
-Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R)
-CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64,
-connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is:
-
-<img src="https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput-polling.png" alt="Throughput" height="400">
-
-This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the
-message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about
-750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration,
-tweaked only to increase the high-water mark for the reliability window on the writer side. For
-details, see the scripts in the ``performance`` directory and
-the
-[data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt).
-
-There is some data on roundtrip latency below.
-
-## Building and Running the Roundtrip Example
-
-We will show you how to build and run an example program that measures latency.  The examples are
-built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able
-to run the program, it is merely to illustrate the process.
-
-    $ cd cyclonedds/examples/roundtrip
-    $ mkdir build
-    $ cd build
-    $ cmake ..
-    $ make
-    
-On one terminal start the application that will be responding to pings:
-
-    $ ./RoundtripPong
-
-On another terminal, start the application that will be sending the pings:
-    
-    $ ./RoundtripPing 0 0 0 
-    # payloadSize: 0 | numSamples: 0 | timeOut: 0
-    # Waiting for startup jitter to stabilise
-    # Warm up complete.
-    # Round trip measurements (in us)
-    #             Round trip time [us]                           Write-access time [us]       Read-access time [us]
-    # Seconds     Count   median      min      99%      max      Count   median      min      Count   median      min
-        1     28065       17       16       23       87      28065        8        6      28065        1        0
-        2     28115       17       16       23       46      28115        8        6      28115        1        0
-        3     28381       17       16       22       46      28381        8        6      28381        1        0
-        4     27928       17       16       24      127      27928        8        6      27928        1        0
-        5     28427       17       16       20       47      28427        8        6      28427        1        0
-        6     27685       17       16       26       51      27685        8        6      27685        1        0
-        7     28391       17       16       23       47      28391        8        6      28391        1        0
-        8     27938       17       16       24       63      27938        8        6      27938        1        0
-        9     28242       17       16       24      132      28242        8        6      28242        1        0
-       10     28075       17       16       23       46      28075        8        6      28075        1        0
-
-The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018.  From
-these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17
-micro-seconds (used to be 25 micro-seconds) on this HW.
-
 # Trademarks
 
 * "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation.
diff --git a/examples/perfscript/latency-test b/examples/perfscript/latency-test
new file mode 100755
index 0000000..ead88ef
--- /dev/null
+++ b/examples/perfscript/latency-test
@@ -0,0 +1,153 @@
+#!/bin/bash
+
+export nwif=eth0
+bandwidth=1e9
+remotedir="$PWD"
+provision=false
+asynclist="sync async"
+modelist="listener waitset"
+sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
+timeout=30
+loopback=true
+resultdir="latency-result"
+
+usage () {
+    cat >&2 <<EOF
+usage: $0 [OPTIONS] user@remote [user@remote...]
+
+OPTIONS
+  -i IF        use network interface IF (default: $nwif)
+  -b 100|1000|B  network bandwidth (100Mbps/1000Gbps or B bits/sec) for
+               calculating load % given load in bytes/second (default: 1000)
+  -d DIR       use DIR on remote (default: PWD)
+  -p           provision required binaries in DIR (default: $provision)
+               first ssh's in to try mkdir -p DIR, then follows up with scp
+  -t DUR       run for DUR seconds per size (default $timeout)
+  -a ASYNCLIST run for delivery async settings ASYNCLIST (default:
+               "$asynclist")
+  -m MODELIST  run with subscriber mode settings MODELIST (default: 
+               "$modelist")
+  -s SIZELIST  run for sizes in SIZELIST (default: "$sizelist")
+               if SIZELIST is empty, it uses ddsperf's OU topic instead
+  -l LOOPBACK  enable/disable multicast loopback (true/false, default: $loopback)
+  -o DIR       store results in dir (default: $resultdir)
+
+Local host runs ddsperf in ping mode, remotes run pong.  It assumes these are
+available in DIR/bin.  It also assumes that ssh user@remote works without
+requiring a password.
+EOF
+    exit 1
+}
+
+while getopts "i:d:pa:m:s:t:o:l:" opt ; do
+    case $opt in
+        i) nwif="$OPTARG" ;;
+        b) case "$OPTARG" in
+               100) bandwidth=1e8 ;;
+               1000) bandwidth=1e9 ;;
+               *) bandwidth="$OPTARG" ;;
+           esac ;;
+        d) remotedir="$OPTARG" ;;
+        p) provision=true ;;
+        a) asynclist="$OPTARG" ;;
+        m) modelist="$OPTARG" ;;
+        s) sizelist="$OPTARG" ;;
+        l) loopback="OPTARG" ;;
+        t) timeout="$OPTARG" ;;
+        o) resultdir="$OPTARG" ;;
+        h) usage ;;
+    esac
+done
+shift $((OPTIND-1))
+if [ $# -lt 1 ] ; then usage ; fi
+
+cfg=cdds-simple.xml
+cat >$cfg <<EOF
+<CycloneDDS>
+  <Domain>
+    <Id>17</Id>
+  </Domain>
+  <General>
+    <NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
+    <EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
+    <MaxMessageSize>65500B</MaxMessageSize>
+    <FragmentSize>4000B</FragmentSize>
+  </General>
+  <Internal>
+    <Watermarks>
+      <WhcHigh>500kB</WhcHigh>
+    </Watermarks>
+    <SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
+    <LeaseDuration>3s</LeaseDuration>
+  </Internal>
+  <Tracing>
+    <Verbosity>config</Verbosity>
+  </Tracing>
+</CycloneDDS>
+EOF
+
+if [ ! -x bin/ddsperf ] ; then
+    echo "bin/ddsperf not found on the local machine" >&2
+    exit 1
+fi
+
+[ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; }
+
+if $provision ; then
+    echo "provisioning ..."
+    for r in $pubremote "$@" ; do
+        ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
+        scp lib/libddsc.so.0 $r:$remotedir/lib
+        scp bin/ddsperf $r:$remotedir/bin
+    done
+fi
+
+topic=KS
+[ -z "$sizelist" ] && topic=OU
+
+export CYCLONEDDS_URI=file://$PWD/$cfg
+for r in "$@" ; do
+    scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
+done
+
+for async_mode in $asynclist ; do
+    case "$async_mode" in
+        sync) async=0 ;;
+        async) async=1 ;;
+        *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
+    esac
+    export async
+    for sub_mode in $modelist ; do
+        echo "======== ASYNC $async MODE $sub_mode ========="
+
+        
+        cat > run-pong.tmp <<EOF
+export CYCLONEDDS_URI=file://$remotedir/$cfg
+export async=$async
+cd $remotedir
+nohup bin/ddsperf -T $topic pong $sub_mode > /dev/null &
+echo \$!
+EOF
+        killpongs=""
+        for r in "$@" ; do
+            scp run-pong.tmp $r:$remotedir
+            rpongpid=`ssh $r ". $remotedir/run-pong.tmp"`
+            killpongs="$killpongs ssh $r kill -9 $rpongpid &"
+        done
+
+        outdir=$resultdir/$async_mode-$sub_mode
+        mkdir $outdir
+
+        touch $outdir/ping.log
+        tail -f $outdir/ping.log & xpid=$!
+        for size in ${sizelist:-0} ; do
+            echo "size $size"
+            bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log
+            sleep 5
+        done
+        eval $killpongs
+        sleep 1
+        kill $xpid
+        wait
+    done
+done
diff --git a/examples/perfscript/latency-test-extract b/examples/perfscript/latency-test-extract
new file mode 100755
index 0000000..b665e88
--- /dev/null
+++ b/examples/perfscript/latency-test-extract
@@ -0,0 +1,95 @@
+#!/usr/bin/perl -w
+
+# Note: this is specialized for async delivery, listener mode because of the way it deals with
+# thread names
+
+use strict;
+
+my %res = ();
+my %meas;
+while (<>) {
+  next unless s/^\[\d+\] \d+\.\d+\s+//;
+  if (s/^[^\@:]+:\d+\s+size (\d+) //) {
+    # size is always the first line of an output block
+    # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
+    my %tmp = %meas;
+    push @{$res{$meas{size}}}, \%tmp if %meas;
+    %meas = (size => $1,
+             rawxmitbw => 0, rawrecvbw => 0,
+             subrss => 0, pubrss => 0,
+             subcpu => 0, subrecv => 0,
+             pubcpu => 0, pubrecv => 0);
+    $meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//;
+    die unless /cnt \d+$/;
+  } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
+    my $side = defined $1 ? "pub" : "sub";
+    $meas{"${side}rss"}  = $2 / ($3 eq "k" ? 1024.0 : 1);
+    $meas{"${side}cpu"}  = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
+    $meas{"${side}recv"} = cpuload ("recvUC", $_);
+  } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
+    $meas{rawxmitbw} = $1;
+    $meas{rawrecvbw} = $2;
+  }
+}
+push @{$res{$meas{size}}}, \%meas if %meas;
+die "no data found" unless keys %res > 0;
+
+print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
+my @sizes = sort { $a <=> $b } keys %res;
+for my $sz (@sizes) {
+  my $ms = $res{$sz};
+  my $min = min ("min", $ms);
+  my $max = max ("max", $ms);
+  my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off
+  my $median = max ("50%", $ms); # also not quite correct ...
+  my $p90 = max ("90%", $ms);
+  my $p99 = max ("99%", $ms);
+  my $rawxmitbw = median ("rawxmitbw", $ms);
+  my $rawrecvbw = median ("rawrecvbw", $ms);
+  my $pubrss = max ("pubrss", $ms);
+  my $subrss = max ("subrss", $ms);
+  my $pubcpu = median ("pubcpu", $ms);
+  my $pubrecv = median ("pubrecv", $ms);
+  my $subcpu = median ("subcpu", $ms);
+  my $subrecv = median ("subrecv", $ms);
+  print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
+}
+
+sub cpuload {
+  my ($thread, $line) = @_;
+  $thread =~ s/\./\\./g;
+  if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
+    return $1+$2;
+  } else {
+    return 0;
+  }
+}
+
+sub max {
+  my $v;
+  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
+  return $v;
+}
+
+sub min {
+  my $v;
+  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; }
+  return $v;
+}
+
+sub mean {
+  my $v = 0;
+  my @xs = extract (@_);
+  $v += $_ for @xs;
+  return $v / @xs;
+}
+
+sub median {
+  my @xs = sort { $a <=> $b } (extract (@_));
+  return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
+}
+
+sub extract {
+  my ($key, $msref) = @_;
+  return map { $_->{$key} } @$msref;
+}
diff --git a/examples/perfscript/latency-test-plot b/examples/perfscript/latency-test-plot
new file mode 100755
index 0000000..d7217a2
--- /dev/null
+++ b/examples/perfscript/latency-test-plot
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+`dirname $0`/latency-test-extract "$@" > data.txt
+gnuplot <<\EOF
+set term pngcairo size 1024,768
+set output "latency-sync-listener.png"
+set st d lp
+set st li 1 lw 2
+set st li 2 lw 2
+set st li 3 lw 2
+set st li 4 lw 2
+set st li 5 lw 2
+
+set multiplot
+set logscale xy
+set title "Latency"
+set ylabel "[us]"
+set grid xtics ytics mytics
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
+unset logscale y
+unset xlabel
+unset ylabel
+unset title
+set grid nomytics
+set origin .1, .43
+set size .55, .5
+clear
+p [10:1000]  "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max"
+unset multiplot
+
+unset origin
+unset size
+
+unset logscale
+set logscale x
+set output "latency-sync-listener-bwcpu.png"
+set title "Latency: network bandwidth and CPU usage"
+set y2tics
+set ylabel "[Mbps]"
+set y2label "CPU [%]"
+set xlabel "payload size [bytes]"
+set key at graph 1, 0.7
+p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)"
+
+EOF
diff --git a/examples/perfscript/throughput-test b/examples/perfscript/throughput-test
old mode 100644
new mode 100755
index 36edaa3..d654056
--- a/examples/perfscript/throughput-test
+++ b/examples/perfscript/throughput-test
@@ -1,45 +1,53 @@
 #!/bin/bash
 
+export nwif=eth0
+bandwidth=1e9
+remotedir="$PWD"
+provision=false
+asynclist="sync async"
+modelist="listener polling waitset"
+sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000"
+timeout=30
+loopback=true
+resultdir="throughput-result"
+
 usage () {
     cat >&2 <<EOF
 usage: $0 [OPTIONS] user@remote [user@remote...]
 
 OPTIONS
-  -i IF        use network interface IF (default: eth0)
-  -b 100|1000  network bandwidth (100Mbps/1000Gbps) for calculating load
-               % given load in bytes/second (default: 1000)
+  -i IF        use network interface IF (default: $nwif)
+  -b 100|1000|B  network bandwidth (100Mbps/1000Gbps or B bits/sec) for
+               calculating load % given load in bytes/second (default: 1000)
   -d DIR       use DIR on remote (default: PWD)
-  -p           provision required binaries in DIR (default: false)
+  -p           provision required binaries in DIR (default: $provision)
                first ssh's in to try mkdir -p DIR, then follows up with scp
-  -t DUR       run for DUR seconds per size (default 20)
-  -a ASYNCLIST run for delivery async settings ASYNCLIST (default: "0 1")
-  -m MODELIST  run with subscriber mode settings MODELIST (default: "-1 0 1")
-  -s SIZELIST  run for sizes in SIZELIST (default: "0 16 32 64 128 256")
-  -l LOOPBACK  enable/disable multicast loopback (true/false, default: true)
-  -o DIR       store results in dir (default: throughput-result)
+  -t DUR       run for DUR seconds per size (default $timeout)
+  -a ASYNCLIST run for delivery async settings ASYNCLIST (default:
+               "$asynclist")
+  -m MODELIST  run with subscriber mode settings MODELIST (default: 
+               "$modelist")
+  -s SIZELIST  run for sizes in SIZELIST (default: "$sizelist")
+               if SIZELIST is empty, it uses ddsperf's OU topic instead
+  -l LOOPBACK  enable/disable multicast loopback (true/false, default: $loopback)
+  -o DIR       store results in dir (default: $resultdir)
 
-Local host runs ThroughputSubscriber, first remote runs ThroughputPublisher,
-further remotes also run ThroughputSubscriber.  It assumes these are
-available in DIR/bin.  It also assumes that ssh user@remote works without
-requiring a password.
+Local host runs "ddsperf" in subscriber mode, first remote runs it publisher
+mode, further remotes also run subcribers.  It assumes these are available in
+DIR/bin.  It also assumes that ssh user@remote works without requiring a
+password.
 EOF
     exit 1
 }
 
-export nwif=eth0
-bandwidth=1000
-remotedir="$PWD"
-provision=false
-asynclist="0 1"
-modelist="-1 0 1"
-sizelist="0 16 32 64 128 256"
-timeout=20
-loopback=true
-resultdir="throughput-result"
 while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
     case $opt in
         i) nwif="$OPTARG" ;;
-        b) bandwidth="$OPTARG" ;;
+        b) case "$OPTARG" in
+               100) bandwidth=1e8 ;;
+               1000) bandwidth=1e9 ;;
+               *) bandwidth="$OPTARG" ;;
+           esac ;;
         d) remotedir="$OPTARG" ;;
         p) provision=true ;;
         a) asynclist="$OPTARG" ;;
@@ -53,7 +61,6 @@ while getopts "i:b:d:pa:m:s:t:o:l:" opt ; do
 done
 shift $((OPTIND-1))
 if [ $# -lt 1 ] ; then usage ; fi
-ethload=`dirname $0`/ethload
 pubremote=$1
 shift
 
@@ -63,24 +70,27 @@ cat >$cfg <<EOF
   <Domain>
     <Id>17</Id>
   </Domain>
-  <DDSI2E>
-    <General>
-      <NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
-      <EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
-    </General>
-    <Internal>
-      <Watermarks>
-        <WhcHigh>500kB</WhcHigh>
-      </Watermarks>
-      <SynchronousDeliveryPriorityThreshold>${async:-0}</SynchronousDeliveryPriorityThreshold>
-      <LeaseDuration>3s</LeaseDuration>
-    </Internal>
-  </DDSI2E>
+  <General>
+    <NetworkInterfaceAddress>$nwif</NetworkInterfaceAddress>
+    <EnableMulticastLoopback>$loopback</EnableMulticastLoopback>
+    <MaxMessageSize>65500B</MaxMessageSize>
+    <FragmentSize>4000B</FragmentSize>
+  </General>
+  <Internal>
+    <Watermarks>
+      <WhcHigh>500kB</WhcHigh>
+    </Watermarks>
+    <SynchronousDeliveryPriorityThreshold>\${async:-0}</SynchronousDeliveryPriorityThreshold>
+    <LeaseDuration>3s</LeaseDuration>
+  </Internal>
+  <Tracing>
+    <Verbosity>config</Verbosity>
+  </Tracing>
 </CycloneDDS>
 EOF
 
-if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then
-    echo "some check for existence of a file failed on the local machine" >&2
+if [ ! -x bin/ddsperf ] ; then
+    echo "bin/ddsperf not found on the local machine" >&2
     exit 1
 fi
 
@@ -91,33 +101,35 @@ if $provision ; then
     for r in $pubremote "$@" ; do
         ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib
         scp lib/libddsc.so.0 $r:$remotedir/lib
-        scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin
+        scp bin/ddsperf $r:$remotedir/bin
     done
 fi
 
+topic=KS
+[ -z "$sizelist" ] && topic=OU
+
 export CYCLONEDDS_URI=file://$PWD/$cfg
 for r in $pubremote "$@" ; do
     scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; }
 done
 
-for async in $asynclist ; do
+for async_mode in $asynclist ; do
+    case "$async_mode" in
+        sync) async=0 ;;
+        async) async=1 ;;
+        *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;;
+    esac
     export async
-    for mode in $modelist ; do
-        echo "======== ASYNC $async MODE $mode ========="
+    for sub_mode in $modelist ; do
+        echo "======== ASYNC $async MODE $sub_mode ========="
         
         cat > run-publisher.tmp <<EOF
 export CYCLONEDDS_URI=file://$remotedir/$cfg
 export async=$async
 cd $remotedir
-rm -f pub-top.log
-for size in $sizelist ; do
+for size in ${sizelist:-0} ; do
   echo "size \$size"
-  bin/ThroughputPublisher \$size > pub.log & ppid=\$!
-  top -b -d1 -p \$ppid >> pub-top.log & tpid=\$!
-  sleep $timeout
-  kill \$tpid
-  kill -2 \$ppid
-  wait \$ppid
+  bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log
   sleep 5
 done
 wait
@@ -129,7 +141,7 @@ EOF
 export CYCLONEDDS_URI=file://$remotedir/$cfg
 export async=$async
 cd $remotedir
-nohup bin/ThroughputSubscriber 0 $mode > /dev/null &
+nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null &
 echo \$!
 EOF
             for r in "$@" ; do
@@ -138,22 +150,18 @@ EOF
                 killremotesubs="$killremotesubs ssh $r kill -9 $rsubpid &"
             done
         fi
-        
-        outdir=$resultdir/data-async$async-mode$mode
+
+        outdir=$resultdir/$async_mode-$sub_mode
         mkdir $outdir
 
-        rm -f sub-top.log
-        $ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$!
-        bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$!
-        top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$!
+        bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$!
         tail -f $outdir/sub.log & xpid=$!
         ssh $pubremote ". $remotedir/run-publisher.tmp"
-        kill $tpid
-        kill -2 $spid
+        kill $spid
         eval $killremotesubs
         sleep 1
-        kill $lpid $xpid
+        kill $xpid
         wait
-        scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir
+        scp $pubremote:$remotedir/pub.log $outdir
     done
 done
diff --git a/examples/perfscript/throughput-test-extract b/examples/perfscript/throughput-test-extract
index 9f16ff3..973f397 100755
--- a/examples/perfscript/throughput-test-extract
+++ b/examples/perfscript/throughput-test-extract
@@ -1,59 +1,76 @@
 #!/usr/bin/perl -w
 
+# Note: this is specialized for async delivery, listener mode because of the way it deals with
+# thread names
+
 use strict;
 
-my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1",
-            "async1-mode-1", "async1-mode0", "async1-mode1");
-
-my $dataset = 0;
-my $basedir = "throughput-result";
-$basedir = $ARGV[0] if @ARGV== 1;
-my $load_threshold = 20;
-for my $dir (@dirs) {
-  my @loads = ();
-
-  {
-    open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log";
-    my @curload = ();
-    while (<LH>) {
-      next unless /^r +([0-9.]+).*\( *(\d+)/;
-      push @curload, $2 if $1 > $load_threshold;
-      if (@curload && $1 < $load_threshold) {
-        push @loads, median (@curload);
-        @curload = ();
-      }
-    }
-    push @loads, median (@curload) if @curload;
-    close LH;
+my %res = ();
+my %meas;
+while (<>) {
+  next unless s/^\[\d+\] \d+\.\d+\s+//;
+  if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) {
+    # size is always the first line of an output block
+    # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero
+    my %tmp = %meas;
+    push @{$res{$meas{size}}}, \%tmp if %meas;
+    %meas = (size => $1, rate => $2, cookedbw => $3,
+             rawxmitbw => 0, rawrecvbw => 0,
+             subrss => 0, pubrss => 0,
+             subcpu => 0, subrecv => 0,
+             pubcpu => 0, pubrecv => 0);
+  } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) {
+    my $side = defined $1 ? "pub" : "sub";
+    $meas{"${side}rss"}  = $2 / ($3 eq "k" ? 1024.0 : 1);
+    $meas{"${side}cpu"}  = cpuload (($side eq "pub") ? "pub" : "dq.user", $_);
+    $meas{"${side}recv"} = cpuload ("recvUC", $_);
+  } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) {
+    $meas{rawxmitbw} = $1;
+    $meas{rawrecvbw} = $2;
   }
+}
+push @{$res{$meas{size}}}, \%meas if %meas;
+die "no data found" unless keys %res > 0;
 
-  open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log";
-  print "\n\n" if $dataset++;
-  print "# mode $dir\n";
-  print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n";
-  my $psz;
-  my @rate = ();
-  while (<FH>) {
-    next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/;
-    my $psz_cur = $1; my $rate_cur = $2;
-    $psz = $psz_cur unless defined $psz;
-    if ($psz != $psz_cur) {
-      my $load = shift @loads;
-      my $rate = median (@rate);
-      printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
-      @rate = ();
-    }
-    $psz = $psz_cur;
-    push @rate, ($rate_cur + 0.0);
+print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n";
+my @sizes = sort { $a <=> $b } keys %res;
+for my $sz (@sizes) {
+  my $ms = $res{$sz};
+  my $rate = median ("rate", $ms);
+  my $cookedbw = median ("cookedbw", $ms);
+  my $rawxmitbw = median ("rawxmitbw", $ms);
+  my $rawrecvbw = median ("rawrecvbw", $ms);
+  my $pubrss = max ("pubrss", $ms);
+  my $subrss = max ("subrss", $ms);
+  my $pubcpu = median ("pubcpu", $ms);
+  my $pubrecv = median ("pubrecv", $ms);
+  my $subcpu = median ("subcpu", $ms);
+  my $subrecv = median ("subrecv", $ms);
+  print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n";
+}
+
+sub cpuload {
+  my ($thread, $line) = @_;
+  $thread =~ s/\./\\./g;
+  if ($line =~ /$thread:(\d+)%\+(\d+)%/) {
+    return $1+$2;
+  } else {
+    return 0;
   }
-  my $load = shift @loads;
-  my $rate = median (@rate);
-  printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3;
-  close FH;
+}
+
+sub max {
+  my $v;
+  for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; }
+  return $v;
 }
 
 sub median {
-  my @xs = sort { $a <=> $b } @_;
+  my @xs = sort { $a <=> $b } (extract (@_));
   return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2;
 }
 
+sub extract {
+  my ($key, $msref) = @_;
+  return map { $_->{$key} } @$msref;
+}
diff --git a/examples/perfscript/throughput-test-plot b/examples/perfscript/throughput-test-plot
index 9840b07..0e0157f 100755
--- a/examples/perfscript/throughput-test-plot
+++ b/examples/perfscript/throughput-test-plot
@@ -1,14 +1,55 @@
 #!/bin/bash
 
-`dirname $0`/throughput-test-extract > data.txt
+`dirname $0`/throughput-test-extract "$@" > data.txt
 gnuplot <<\EOF
-set term png size 1024,768
-set output "throughput-polling.png"
-set st d l
-set title "Throughput (polling with 1ms sleeps)"
-set ylabel "M sample/s"
-set y2label "Mbps"
-set y2tics
+set term pngcairo size 1024,768
+set output "throughput-async-listener-rate.png"
+set st d lp
+set st li 1 lw 2
+set st li 2 lw 2
+set st li 3 lw 2
+
+set multiplot
+set logscale xyy2
+set title "Throughput"
+set ylabel "[Mbps]"
+set ytics (100,200,300,400,500,600,700,800,900,1000)
+set grid xtics ytics mytics
 set xlabel "payload size [bytes]"
-p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]"
+# sample rate in data.txt is in kS/s
+# GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps
+set key at graph 1, 0.9
+p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth"
+set ytics auto
+set key default
+
+unset xlabel
+unset title
+set grid nomytics
+set ylabel "[M sample/s]"
+set origin .3, .1
+set size .6, .6
+clear
+p "data.txt" u 1:($2/1e3) ti "rate"
+unset multiplot
+
+unset origin
+unset size
+
+unset logscale
+set logscale x
+set output "throughput-async-listener-memory.png"
+set title "Throughput: memory"
+set ylabel "RSS [MB]"
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber"
+
+unset logscale
+set logscale x
+set output "throughput-async-listener-cpu.png"
+set title "Throughput: CPU"
+set ylabel "CPU [%]"
+set xlabel "payload size [bytes]"
+p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)"
+
 EOF