From af19c5681c5992d96191b1494ed7fff459b0b973 Mon Sep 17 00:00:00 2001 From: Erik Boasson Date: Fri, 2 Aug 2019 09:06:39 +0200 Subject: [PATCH] Use ddsperf in perf scripts and update README * the scripted throughput test originally used for the throughput graph in the README now uses ddsperf; * a scripted latency test has been added; * updated the README with the results of these tests (and so now gives easy access not only to throughput, but also to latency and memory usage, as well as to latency over GbE. Signed-off-by: Erik Boasson --- README.md | 145 ++++++++++--------- examples/perfscript/latency-test | 153 ++++++++++++++++++++ examples/perfscript/latency-test-extract | 95 ++++++++++++ examples/perfscript/latency-test-plot | 46 ++++++ examples/perfscript/throughput-test | 136 +++++++++-------- examples/perfscript/throughput-test-extract | 109 ++++++++------ examples/perfscript/throughput-test-plot | 59 ++++++-- 7 files changed, 554 insertions(+), 189 deletions(-) create mode 100755 examples/perfscript/latency-test create mode 100755 examples/perfscript/latency-test-extract create mode 100755 examples/perfscript/latency-test-plot mode change 100644 => 100755 examples/perfscript/throughput-test diff --git a/README.md b/README.md index d8ac3e3..6f30c41 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,12 @@ # Eclipse Cyclone DDS -Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the -market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project +Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation. Cyclone DDS is developed completely in the open as an Eclipse IoT project (see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)). +* [Getting Started](#getting-started) +* [Performance](#performance) +* [Configuration](#configuration) + # Getting Started ## Building Eclipse Cyclone DDS @@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at the moment, OpenSSL). -## Configuration +## Documentation + +The documentation is still rather limited, and at the moment only available in the sources (in the +form of restructured text files in ``docs`` and Doxygen comments in the header files), or as +a +[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The +intent is to automate the process of building the documentation and have them available in more +convenient formats and in the usual locations. + +## Building and Running the Roundtrip Example + +We will show you how to build and run an example program that measures latency. The examples are +built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able +to run the program, it is merely to illustrate the process. + + $ cd cyclonedds/examples/roundtrip + $ mkdir build + $ cd build + $ cmake .. + $ make + +On one terminal start the application that will be responding to pings: + + $ ./RoundtripPong + +On another terminal, start the application that will be sending the pings: + + $ ./RoundtripPing 0 0 0 + # payloadSize: 0 | numSamples: 0 | timeOut: 0 + # Waiting for startup jitter to stabilise + # Warm up complete. + # Latency measurements (in us) + # Latency [us] Write-access time [us] Read-access time [us] + # Seconds Count median min 99% max Count median min Count median min + 1 28065 17 16 23 87 28065 8 6 28065 1 0 + 2 28115 17 16 23 46 28115 8 6 28115 1 0 + 3 28381 17 16 22 46 28381 8 6 28381 1 0 + 4 27928 17 16 24 127 27928 8 6 27928 1 0 + 5 28427 17 16 20 47 28427 8 6 28427 1 0 + 6 27685 17 16 26 51 27685 8 6 27685 1 0 + 7 28391 17 16 23 47 28391 8 6 28391 1 0 + 8 27938 17 16 24 63 27938 8 6 27938 1 0 + 9 28242 17 16 24 132 28242 8 6 28242 1 0 + 10 28075 17 16 23 46 28075 8 6 28075 1 0 + +The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From +these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17 +micro-seconds (used to be 25 micro-seconds) on this HW. + +# Performance + +Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100 +byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two +Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the +executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build. + +ThroughputThroughput + +This is with the subscriber in listener mode, using asynchronous delivery for the throughput +test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum +message size and fragment size, and an increased high-water mark for the reliability window on the +writer side. For details, see the [scripts](examples/perfscript) directory, +the +[environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and +the +[throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and +[latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data +underlying the graphs. These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png). + +# Configuration The out-of-the-box configuration should usually be fine, but there are a great many options that can be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to @@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a dump. Background information on configuring Cyclone DDS can be found [here](https://docs/manual/config.rst). -## Documentation - -The documentation is still rather limited, and at the moment only available in the sources (in the -form of restructured text files in ``docs`` and Doxygen comments in the header files), or as -a -[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The -intent is to automate the process of building the documentation and have them available in more -convenient formats and in the usual locations. - -## Performance - -Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R) -CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64, -connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is: - -Throughput - -This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the -message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about -750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration, -tweaked only to increase the high-water mark for the reliability window on the writer side. For -details, see the scripts in the ``performance`` directory and -the -[data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt). - -There is some data on roundtrip latency below. - -## Building and Running the Roundtrip Example - -We will show you how to build and run an example program that measures latency. The examples are -built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able -to run the program, it is merely to illustrate the process. - - $ cd cyclonedds/examples/roundtrip - $ mkdir build - $ cd build - $ cmake .. - $ make - -On one terminal start the application that will be responding to pings: - - $ ./RoundtripPong - -On another terminal, start the application that will be sending the pings: - - $ ./RoundtripPing 0 0 0 - # payloadSize: 0 | numSamples: 0 | timeOut: 0 - # Waiting for startup jitter to stabilise - # Warm up complete. - # Round trip measurements (in us) - # Round trip time [us] Write-access time [us] Read-access time [us] - # Seconds Count median min 99% max Count median min Count median min - 1 28065 17 16 23 87 28065 8 6 28065 1 0 - 2 28115 17 16 23 46 28115 8 6 28115 1 0 - 3 28381 17 16 22 46 28381 8 6 28381 1 0 - 4 27928 17 16 24 127 27928 8 6 27928 1 0 - 5 28427 17 16 20 47 28427 8 6 28427 1 0 - 6 27685 17 16 26 51 27685 8 6 27685 1 0 - 7 28391 17 16 23 47 28391 8 6 28391 1 0 - 8 27938 17 16 24 63 27938 8 6 27938 1 0 - 9 28242 17 16 24 132 28242 8 6 28242 1 0 - 10 28075 17 16 23 46 28075 8 6 28075 1 0 - -The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From -these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17 -micro-seconds (used to be 25 micro-seconds) on this HW. - # Trademarks * "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation. diff --git a/examples/perfscript/latency-test b/examples/perfscript/latency-test new file mode 100755 index 0000000..ead88ef --- /dev/null +++ b/examples/perfscript/latency-test @@ -0,0 +1,153 @@ +#!/bin/bash + +export nwif=eth0 +bandwidth=1e9 +remotedir="$PWD" +provision=false +asynclist="sync async" +modelist="listener waitset" +sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000" +timeout=30 +loopback=true +resultdir="latency-result" + +usage () { + cat >&2 <$cfg < + + 17 + + + $nwif + $loopback + 65500B + 4000B + + + + 500kB + + \${async:-0} + 3s + + + config + + +EOF + +if [ ! -x bin/ddsperf ] ; then + echo "bin/ddsperf not found on the local machine" >&2 + exit 1 +fi + +[ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; } + +if $provision ; then + echo "provisioning ..." + for r in $pubremote "$@" ; do + ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib + scp lib/libddsc.so.0 $r:$remotedir/lib + scp bin/ddsperf $r:$remotedir/bin + done +fi + +topic=KS +[ -z "$sizelist" ] && topic=OU + +export CYCLONEDDS_URI=file://$PWD/$cfg +for r in "$@" ; do + scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; } +done + +for async_mode in $asynclist ; do + case "$async_mode" in + sync) async=0 ;; + async) async=1 ;; + *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;; + esac + export async + for sub_mode in $modelist ; do + echo "======== ASYNC $async MODE $sub_mode =========" + + + cat > run-pong.tmp < /dev/null & +echo \$! +EOF + killpongs="" + for r in "$@" ; do + scp run-pong.tmp $r:$remotedir + rpongpid=`ssh $r ". $remotedir/run-pong.tmp"` + killpongs="$killpongs ssh $r kill -9 $rpongpid &" + done + + outdir=$resultdir/$async_mode-$sub_mode + mkdir $outdir + + touch $outdir/ping.log + tail -f $outdir/ping.log & xpid=$! + for size in ${sizelist:-0} ; do + echo "size $size" + bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log + sleep 5 + done + eval $killpongs + sleep 1 + kill $xpid + wait + done +done diff --git a/examples/perfscript/latency-test-extract b/examples/perfscript/latency-test-extract new file mode 100755 index 0000000..b665e88 --- /dev/null +++ b/examples/perfscript/latency-test-extract @@ -0,0 +1,95 @@ +#!/usr/bin/perl -w + +# Note: this is specialized for async delivery, listener mode because of the way it deals with +# thread names + +use strict; + +my %res = (); +my %meas; +while (<>) { + next unless s/^\[\d+\] \d+\.\d+\s+//; + if (s/^[^\@:]+:\d+\s+size (\d+) //) { + # size is always the first line of an output block + # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero + my %tmp = %meas; + push @{$res{$meas{size}}}, \%tmp if %meas; + %meas = (size => $1, + rawxmitbw => 0, rawrecvbw => 0, + subrss => 0, pubrss => 0, + subcpu => 0, subrecv => 0, + pubcpu => 0, pubrecv => 0); + $meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//; + die unless /cnt \d+$/; + } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) { + my $side = defined $1 ? "pub" : "sub"; + $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1); + $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_); + $meas{"${side}recv"} = cpuload ("recvUC", $_); + } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) { + $meas{rawxmitbw} = $1; + $meas{rawrecvbw} = $2; + } +} +push @{$res{$meas{size}}}, \%meas if %meas; +die "no data found" unless keys %res > 0; + +print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n"; +my @sizes = sort { $a <=> $b } keys %res; +for my $sz (@sizes) { + my $ms = $res{$sz}; + my $min = min ("min", $ms); + my $max = max ("max", $ms); + my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off + my $median = max ("50%", $ms); # also not quite correct ... + my $p90 = max ("90%", $ms); + my $p99 = max ("99%", $ms); + my $rawxmitbw = median ("rawxmitbw", $ms); + my $rawrecvbw = median ("rawrecvbw", $ms); + my $pubrss = max ("pubrss", $ms); + my $subrss = max ("subrss", $ms); + my $pubcpu = median ("pubcpu", $ms); + my $pubrecv = median ("pubrecv", $ms); + my $subcpu = median ("subcpu", $ms); + my $subrecv = median ("subrecv", $ms); + print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n"; +} + +sub cpuload { + my ($thread, $line) = @_; + $thread =~ s/\./\\./g; + if ($line =~ /$thread:(\d+)%\+(\d+)%/) { + return $1+$2; + } else { + return 0; + } +} + +sub max { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; } + return $v; +} + +sub min { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; } + return $v; +} + +sub mean { + my $v = 0; + my @xs = extract (@_); + $v += $_ for @xs; + return $v / @xs; +} + +sub median { + my @xs = sort { $a <=> $b } (extract (@_)); + return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2; +} + +sub extract { + my ($key, $msref) = @_; + return map { $_->{$key} } @$msref; +} diff --git a/examples/perfscript/latency-test-plot b/examples/perfscript/latency-test-plot new file mode 100755 index 0000000..d7217a2 --- /dev/null +++ b/examples/perfscript/latency-test-plot @@ -0,0 +1,46 @@ +#!/bin/bash + +`dirname $0`/latency-test-extract "$@" > data.txt +gnuplot <<\EOF +set term pngcairo size 1024,768 +set output "latency-sync-listener.png" +set st d lp +set st li 1 lw 2 +set st li 2 lw 2 +set st li 3 lw 2 +set st li 4 lw 2 +set st li 5 lw 2 + +set multiplot +set logscale xy +set title "Latency" +set ylabel "[us]" +set grid xtics ytics mytics +set xlabel "payload size [bytes]" +p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max" +unset logscale y +unset xlabel +unset ylabel +unset title +set grid nomytics +set origin .1, .43 +set size .55, .5 +clear +p [10:1000] "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max" +unset multiplot + +unset origin +unset size + +unset logscale +set logscale x +set output "latency-sync-listener-bwcpu.png" +set title "Latency: network bandwidth and CPU usage" +set y2tics +set ylabel "[Mbps]" +set y2label "CPU [%]" +set xlabel "payload size [bytes]" +set key at graph 1, 0.7 +p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)" + +EOF diff --git a/examples/perfscript/throughput-test b/examples/perfscript/throughput-test old mode 100644 new mode 100755 index 36edaa3..d654056 --- a/examples/perfscript/throughput-test +++ b/examples/perfscript/throughput-test @@ -1,45 +1,53 @@ #!/bin/bash +export nwif=eth0 +bandwidth=1e9 +remotedir="$PWD" +provision=false +asynclist="sync async" +modelist="listener polling waitset" +sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000" +timeout=30 +loopback=true +resultdir="throughput-result" + usage () { cat >&2 <$cfg < 17 - - - $nwif - $loopback - - - - 500kB - - ${async:-0} - 3s - - + + $nwif + $loopback + 65500B + 4000B + + + + 500kB + + \${async:-0} + 3s + + + config + EOF -if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then - echo "some check for existence of a file failed on the local machine" >&2 +if [ ! -x bin/ddsperf ] ; then + echo "bin/ddsperf not found on the local machine" >&2 exit 1 fi @@ -91,33 +101,35 @@ if $provision ; then for r in $pubremote "$@" ; do ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib scp lib/libddsc.so.0 $r:$remotedir/lib - scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin + scp bin/ddsperf $r:$remotedir/bin done fi +topic=KS +[ -z "$sizelist" ] && topic=OU + export CYCLONEDDS_URI=file://$PWD/$cfg for r in $pubremote "$@" ; do scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; } done -for async in $asynclist ; do +for async_mode in $asynclist ; do + case "$async_mode" in + sync) async=0 ;; + async) async=1 ;; + *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;; + esac export async - for mode in $modelist ; do - echo "======== ASYNC $async MODE $mode =========" + for sub_mode in $modelist ; do + echo "======== ASYNC $async MODE $sub_mode =========" cat > run-publisher.tmp < pub.log & ppid=\$! - top -b -d1 -p \$ppid >> pub-top.log & tpid=\$! - sleep $timeout - kill \$tpid - kill -2 \$ppid - wait \$ppid + bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log sleep 5 done wait @@ -129,7 +141,7 @@ EOF export CYCLONEDDS_URI=file://$remotedir/$cfg export async=$async cd $remotedir -nohup bin/ThroughputSubscriber 0 $mode > /dev/null & +nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null & echo \$! EOF for r in "$@" ; do @@ -138,22 +150,18 @@ EOF killremotesubs="$killremotesubs ssh $r kill -9 $rsubpid &" done fi - - outdir=$resultdir/data-async$async-mode$mode + + outdir=$resultdir/$async_mode-$sub_mode mkdir $outdir - rm -f sub-top.log - $ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$! - bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$! - top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$! + bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$! tail -f $outdir/sub.log & xpid=$! ssh $pubremote ". $remotedir/run-publisher.tmp" - kill $tpid - kill -2 $spid + kill $spid eval $killremotesubs sleep 1 - kill $lpid $xpid + kill $xpid wait - scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir + scp $pubremote:$remotedir/pub.log $outdir done done diff --git a/examples/perfscript/throughput-test-extract b/examples/perfscript/throughput-test-extract index 9f16ff3..973f397 100755 --- a/examples/perfscript/throughput-test-extract +++ b/examples/perfscript/throughput-test-extract @@ -1,59 +1,76 @@ #!/usr/bin/perl -w +# Note: this is specialized for async delivery, listener mode because of the way it deals with +# thread names + use strict; -my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1", - "async1-mode-1", "async1-mode0", "async1-mode1"); - -my $dataset = 0; -my $basedir = "throughput-result"; -$basedir = $ARGV[0] if @ARGV== 1; -my $load_threshold = 20; -for my $dir (@dirs) { - my @loads = (); - - { - open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log"; - my @curload = (); - while () { - next unless /^r +([0-9.]+).*\( *(\d+)/; - push @curload, $2 if $1 > $load_threshold; - if (@curload && $1 < $load_threshold) { - push @loads, median (@curload); - @curload = (); - } - } - push @loads, median (@curload) if @curload; - close LH; +my %res = (); +my %meas; +while (<>) { + next unless s/^\[\d+\] \d+\.\d+\s+//; + if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) { + # size is always the first line of an output block + # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero + my %tmp = %meas; + push @{$res{$meas{size}}}, \%tmp if %meas; + %meas = (size => $1, rate => $2, cookedbw => $3, + rawxmitbw => 0, rawrecvbw => 0, + subrss => 0, pubrss => 0, + subcpu => 0, subrecv => 0, + pubcpu => 0, pubrecv => 0); + } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) { + my $side = defined $1 ? "pub" : "sub"; + $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1); + $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_); + $meas{"${side}recv"} = cpuload ("recvUC", $_); + } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) { + $meas{rawxmitbw} = $1; + $meas{rawrecvbw} = $2; } +} +push @{$res{$meas{size}}}, \%meas if %meas; +die "no data found" unless keys %res > 0; - open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log"; - print "\n\n" if $dataset++; - print "# mode $dir\n"; - print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n"; - my $psz; - my @rate = (); - while () { - next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/; - my $psz_cur = $1; my $rate_cur = $2; - $psz = $psz_cur unless defined $psz; - if ($psz != $psz_cur) { - my $load = shift @loads; - my $rate = median (@rate); - printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3; - @rate = (); - } - $psz = $psz_cur; - push @rate, ($rate_cur + 0.0); +print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n"; +my @sizes = sort { $a <=> $b } keys %res; +for my $sz (@sizes) { + my $ms = $res{$sz}; + my $rate = median ("rate", $ms); + my $cookedbw = median ("cookedbw", $ms); + my $rawxmitbw = median ("rawxmitbw", $ms); + my $rawrecvbw = median ("rawrecvbw", $ms); + my $pubrss = max ("pubrss", $ms); + my $subrss = max ("subrss", $ms); + my $pubcpu = median ("pubcpu", $ms); + my $pubrecv = median ("pubrecv", $ms); + my $subcpu = median ("subcpu", $ms); + my $subrecv = median ("subrecv", $ms); + print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n"; +} + +sub cpuload { + my ($thread, $line) = @_; + $thread =~ s/\./\\./g; + if ($line =~ /$thread:(\d+)%\+(\d+)%/) { + return $1+$2; + } else { + return 0; } - my $load = shift @loads; - my $rate = median (@rate); - printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3; - close FH; +} + +sub max { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; } + return $v; } sub median { - my @xs = sort { $a <=> $b } @_; + my @xs = sort { $a <=> $b } (extract (@_)); return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2; } +sub extract { + my ($key, $msref) = @_; + return map { $_->{$key} } @$msref; +} diff --git a/examples/perfscript/throughput-test-plot b/examples/perfscript/throughput-test-plot index 9840b07..0e0157f 100755 --- a/examples/perfscript/throughput-test-plot +++ b/examples/perfscript/throughput-test-plot @@ -1,14 +1,55 @@ #!/bin/bash -`dirname $0`/throughput-test-extract > data.txt +`dirname $0`/throughput-test-extract "$@" > data.txt gnuplot <<\EOF -set term png size 1024,768 -set output "throughput-polling.png" -set st d l -set title "Throughput (polling with 1ms sleeps)" -set ylabel "M sample/s" -set y2label "Mbps" -set y2tics +set term pngcairo size 1024,768 +set output "throughput-async-listener-rate.png" +set st d lp +set st li 1 lw 2 +set st li 2 lw 2 +set st li 3 lw 2 + +set multiplot +set logscale xyy2 +set title "Throughput" +set ylabel "[Mbps]" +set ytics (100,200,300,400,500,600,700,800,900,1000) +set grid xtics ytics mytics set xlabel "payload size [bytes]" -p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]" +# sample rate in data.txt is in kS/s +# GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps +set key at graph 1, 0.9 +p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth" +set ytics auto +set key default + +unset xlabel +unset title +set grid nomytics +set ylabel "[M sample/s]" +set origin .3, .1 +set size .6, .6 +clear +p "data.txt" u 1:($2/1e3) ti "rate" +unset multiplot + +unset origin +unset size + +unset logscale +set logscale x +set output "throughput-async-listener-memory.png" +set title "Throughput: memory" +set ylabel "RSS [MB]" +set xlabel "payload size [bytes]" +p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber" + +unset logscale +set logscale x +set output "throughput-async-listener-cpu.png" +set title "Throughput: CPU" +set ylabel "CPU [%]" +set xlabel "payload size [bytes]" +p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)" + EOF