diff --git a/README.md b/README.md index d8ac3e3..6f30c41 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,12 @@ # Eclipse Cyclone DDS -Eclipse Cyclone DDS is by far the most performant and robust DDS implementation available on the -market. Moreover, Cyclone DDS is developed completely in the open as an Eclipse IoT project +Eclipse Cyclone DDS is a very performant and robust open-source DDS implementation. Cyclone DDS is developed completely in the open as an Eclipse IoT project (see [eclipse-cyclone-dds](https://projects.eclipse.org/projects/iot.cyclonedds)). +* [Getting Started](#getting-started) +* [Performance](#performance) +* [Configuration](#configuration) + # Getting Started ## Building Eclipse Cyclone DDS @@ -106,7 +109,76 @@ also need to add switches to select the architecture and build type, e.g., ``con arch=x86_64 -s build_type=Debug ..`` This will automatically download and/or build CUnit (and, at the moment, OpenSSL). -## Configuration +## Documentation + +The documentation is still rather limited, and at the moment only available in the sources (in the +form of restructured text files in ``docs`` and Doxygen comments in the header files), or as +a +[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The +intent is to automate the process of building the documentation and have them available in more +convenient formats and in the usual locations. + +## Building and Running the Roundtrip Example + +We will show you how to build and run an example program that measures latency. The examples are +built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able +to run the program, it is merely to illustrate the process. + + $ cd cyclonedds/examples/roundtrip + $ mkdir build + $ cd build + $ cmake .. + $ make + +On one terminal start the application that will be responding to pings: + + $ ./RoundtripPong + +On another terminal, start the application that will be sending the pings: + + $ ./RoundtripPing 0 0 0 + # payloadSize: 0 | numSamples: 0 | timeOut: 0 + # Waiting for startup jitter to stabilise + # Warm up complete. + # Latency measurements (in us) + # Latency [us] Write-access time [us] Read-access time [us] + # Seconds Count median min 99% max Count median min Count median min + 1 28065 17 16 23 87 28065 8 6 28065 1 0 + 2 28115 17 16 23 46 28115 8 6 28115 1 0 + 3 28381 17 16 22 46 28381 8 6 28381 1 0 + 4 27928 17 16 24 127 27928 8 6 27928 1 0 + 5 28427 17 16 20 47 28427 8 6 28427 1 0 + 6 27685 17 16 26 51 27685 8 6 27685 1 0 + 7 28391 17 16 23 47 28391 8 6 28391 1 0 + 8 27938 17 16 24 63 27938 8 6 27938 1 0 + 9 28242 17 16 24 132 28242 8 6 28242 1 0 + 10 28075 17 16 23 46 28075 8 6 28075 1 0 + +The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From +these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17 +micro-seconds (used to be 25 micro-seconds) on this HW. + +# Performance + +Reliable message throughput is over 1MS/s for very small samples and is roughly 90% of GbE with 100 +byte samples, and latency is about 30us when measured using [ddsperf](src/tools/ddsperf) between two +Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Ubuntu 16.04, with the +executables built on Ubuntu 18.04 using gcc 7.4.0 for a default (i.e., "RelWithDebInfo") build. + +ThroughputThroughput + +This is with the subscriber in listener mode, using asynchronous delivery for the throughput +test. The configuration is a marginally tweaked out-of-the-box configuration: an increased maximum +message size and fragment size, and an increased high-water mark for the reliability window on the +writer side. For details, see the [scripts](examples/perfscript) directory, +the +[environment details](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/config.txt) and +the +[throughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/sub.log) and +[latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/ping.log) data +underlying the graphs. These also include CPU usage ([thoughput](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-cpu.png) and [latency](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/latency-sync-listener-bwcpu.png)) and [memory usage](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/20190730/throughput-async-listener-memory.png). + +# Configuration The out-of-the-box configuration should usually be fine, but there are a great many options that can be tweaked by creating an XML file with the desired settings and defining the ``CYCLONEDDS_URI`` to @@ -161,73 +233,6 @@ The configurator tool ``cycloneddsconf`` can help in discovering the settings, a dump. Background information on configuring Cyclone DDS can be found [here](https://docs/manual/config.rst). -## Documentation - -The documentation is still rather limited, and at the moment only available in the sources (in the -form of restructured text files in ``docs`` and Doxygen comments in the header files), or as -a -[PDF](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/pdf/CycloneDDS-0.1.0.pdf). The -intent is to automate the process of building the documentation and have them available in more -convenient formats and in the usual locations. - -## Performance - -Median small message throughput measured using the Throughput example between two Intel(R) Xeon(R) -CPU E3-1270 V2 @ 3.50GHz (that's 2012 hardware ...) running Linux 3.8.13-rt14.20.el6rt.x86_64, -connected via a quiet GbE and when using gcc-6.2.0 for a default (i.e., "RelWithDebInfo") build is: - -Throughput - -This is with the subscriber in polling mode. Listener mode is marginally slower; using a waitset the -message rate for minimal size messages drops to 600k sample/s in synchronous delivery mode and about -750k samples/s in asynchronous delivery mode. The configuration is an out-of-the-box configuration, -tweaked only to increase the high-water mark for the reliability window on the writer side. For -details, see the scripts in the ``performance`` directory and -the -[data](https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/assets/performance/throughput.txt). - -There is some data on roundtrip latency below. - -## Building and Running the Roundtrip Example - -We will show you how to build and run an example program that measures latency. The examples are -built automatically when you build Cyclone DDS, so you don't need to follow these steps to be able -to run the program, it is merely to illustrate the process. - - $ cd cyclonedds/examples/roundtrip - $ mkdir build - $ cd build - $ cmake .. - $ make - -On one terminal start the application that will be responding to pings: - - $ ./RoundtripPong - -On another terminal, start the application that will be sending the pings: - - $ ./RoundtripPing 0 0 0 - # payloadSize: 0 | numSamples: 0 | timeOut: 0 - # Waiting for startup jitter to stabilise - # Warm up complete. - # Round trip measurements (in us) - # Round trip time [us] Write-access time [us] Read-access time [us] - # Seconds Count median min 99% max Count median min Count median min - 1 28065 17 16 23 87 28065 8 6 28065 1 0 - 2 28115 17 16 23 46 28115 8 6 28115 1 0 - 3 28381 17 16 22 46 28381 8 6 28381 1 0 - 4 27928 17 16 24 127 27928 8 6 27928 1 0 - 5 28427 17 16 20 47 28427 8 6 28427 1 0 - 6 27685 17 16 26 51 27685 8 6 27685 1 0 - 7 28391 17 16 23 47 28391 8 6 28391 1 0 - 8 27938 17 16 24 63 27938 8 6 27938 1 0 - 9 28242 17 16 24 132 28242 8 6 28242 1 0 - 10 28075 17 16 23 46 28075 8 6 28075 1 0 - -The numbers above were measured on Mac running a 4.2 GHz Intel Core i7 on December 12th 2018. From -these numbers you can see how the roundtrip is very stable and the minimal latency is now down to 17 -micro-seconds (used to be 25 micro-seconds) on this HW. - # Trademarks * "Eclipse Cyclone DDS" and "Cyclone DDS" are trademarks of the Eclipse Foundation. diff --git a/examples/perfscript/latency-test b/examples/perfscript/latency-test new file mode 100755 index 0000000..ead88ef --- /dev/null +++ b/examples/perfscript/latency-test @@ -0,0 +1,153 @@ +#!/bin/bash + +export nwif=eth0 +bandwidth=1e9 +remotedir="$PWD" +provision=false +asynclist="sync async" +modelist="listener waitset" +sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000" +timeout=30 +loopback=true +resultdir="latency-result" + +usage () { + cat >&2 <$cfg < + + 17 + + + $nwif + $loopback + 65500B + 4000B + + + + 500kB + + \${async:-0} + 3s + + + config + + +EOF + +if [ ! -x bin/ddsperf ] ; then + echo "bin/ddsperf not found on the local machine" >&2 + exit 1 +fi + +[ -d $resultdir ] || { echo "output directory $resultdir doesn't exist" >&2 ; exit 1 ; } + +if $provision ; then + echo "provisioning ..." + for r in $pubremote "$@" ; do + ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib + scp lib/libddsc.so.0 $r:$remotedir/lib + scp bin/ddsperf $r:$remotedir/bin + done +fi + +topic=KS +[ -z "$sizelist" ] && topic=OU + +export CYCLONEDDS_URI=file://$PWD/$cfg +for r in "$@" ; do + scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; } +done + +for async_mode in $asynclist ; do + case "$async_mode" in + sync) async=0 ;; + async) async=1 ;; + *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;; + esac + export async + for sub_mode in $modelist ; do + echo "======== ASYNC $async MODE $sub_mode =========" + + + cat > run-pong.tmp < /dev/null & +echo \$! +EOF + killpongs="" + for r in "$@" ; do + scp run-pong.tmp $r:$remotedir + rpongpid=`ssh $r ". $remotedir/run-pong.tmp"` + killpongs="$killpongs ssh $r kill -9 $rpongpid &" + done + + outdir=$resultdir/$async_mode-$sub_mode + mkdir $outdir + + touch $outdir/ping.log + tail -f $outdir/ping.log & xpid=$! + for size in ${sizelist:-0} ; do + echo "size $size" + bin/ddsperf -d $nwif:$bandwidth -c -D $timeout -T $topic ping size $size $sub_mode >> $outdir/ping.log + sleep 5 + done + eval $killpongs + sleep 1 + kill $xpid + wait + done +done diff --git a/examples/perfscript/latency-test-extract b/examples/perfscript/latency-test-extract new file mode 100755 index 0000000..b665e88 --- /dev/null +++ b/examples/perfscript/latency-test-extract @@ -0,0 +1,95 @@ +#!/usr/bin/perl -w + +# Note: this is specialized for async delivery, listener mode because of the way it deals with +# thread names + +use strict; + +my %res = (); +my %meas; +while (<>) { + next unless s/^\[\d+\] \d+\.\d+\s+//; + if (s/^[^\@:]+:\d+\s+size (\d+) //) { + # size is always the first line of an output block + # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero + my %tmp = %meas; + push @{$res{$meas{size}}}, \%tmp if %meas; + %meas = (size => $1, + rawxmitbw => 0, rawrecvbw => 0, + subrss => 0, pubrss => 0, + subcpu => 0, subrecv => 0, + pubcpu => 0, pubrecv => 0); + $meas{$1} = $2 while s/^(mean|min|max|\d+%)\s+(\d+\.\d+)us\s*//; + die unless /cnt \d+$/; + } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) { + my $side = defined $1 ? "pub" : "sub"; + $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1); + $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_); + $meas{"${side}recv"} = cpuload ("recvUC", $_); + } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) { + $meas{rawxmitbw} = $1; + $meas{rawrecvbw} = $2; + } +} +push @{$res{$meas{size}}}, \%meas if %meas; +die "no data found" unless keys %res > 0; + +print "#size mean min 50% 90% 99% max rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n"; +my @sizes = sort { $a <=> $b } keys %res; +for my $sz (@sizes) { + my $ms = $res{$sz}; + my $min = min ("min", $ms); + my $max = max ("max", $ms); + my $mean = mean ("mean", $ms); # roughly same number of roundtrips, so not too far off + my $median = max ("50%", $ms); # also not quite correct ... + my $p90 = max ("90%", $ms); + my $p99 = max ("99%", $ms); + my $rawxmitbw = median ("rawxmitbw", $ms); + my $rawrecvbw = median ("rawrecvbw", $ms); + my $pubrss = max ("pubrss", $ms); + my $subrss = max ("subrss", $ms); + my $pubcpu = median ("pubcpu", $ms); + my $pubrecv = median ("pubrecv", $ms); + my $subcpu = median ("subcpu", $ms); + my $subrecv = median ("subrecv", $ms); + print "$sz $mean $min $median $p90 $p99 $max $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n"; +} + +sub cpuload { + my ($thread, $line) = @_; + $thread =~ s/\./\\./g; + if ($line =~ /$thread:(\d+)%\+(\d+)%/) { + return $1+$2; + } else { + return 0; + } +} + +sub max { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; } + return $v; +} + +sub min { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ < $v; } + return $v; +} + +sub mean { + my $v = 0; + my @xs = extract (@_); + $v += $_ for @xs; + return $v / @xs; +} + +sub median { + my @xs = sort { $a <=> $b } (extract (@_)); + return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2; +} + +sub extract { + my ($key, $msref) = @_; + return map { $_->{$key} } @$msref; +} diff --git a/examples/perfscript/latency-test-plot b/examples/perfscript/latency-test-plot new file mode 100755 index 0000000..d7217a2 --- /dev/null +++ b/examples/perfscript/latency-test-plot @@ -0,0 +1,46 @@ +#!/bin/bash + +`dirname $0`/latency-test-extract "$@" > data.txt +gnuplot <<\EOF +set term pngcairo size 1024,768 +set output "latency-sync-listener.png" +set st d lp +set st li 1 lw 2 +set st li 2 lw 2 +set st li 3 lw 2 +set st li 4 lw 2 +set st li 5 lw 2 + +set multiplot +set logscale xy +set title "Latency" +set ylabel "[us]" +set grid xtics ytics mytics +set xlabel "payload size [bytes]" +p "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max" +unset logscale y +unset xlabel +unset ylabel +unset title +set grid nomytics +set origin .1, .43 +set size .55, .5 +clear +p [10:1000] "data.txt" u 1:3 ti "min", "" u 1:4 ti "median", "" u 1:5 ti "90%", "" u 1:6 ti "99%", "" u 1:7 ti "max" +unset multiplot + +unset origin +unset size + +unset logscale +set logscale x +set output "latency-sync-listener-bwcpu.png" +set title "Latency: network bandwidth and CPU usage" +set y2tics +set ylabel "[Mbps]" +set y2label "CPU [%]" +set xlabel "payload size [bytes]" +set key at graph 1, 0.7 +p "data.txt" u 1:(10*$8) ti "GbE transmit bandwidth (left)", "" u 1:(10*$9) ti "GbE receive bandwidth (left)", "" u 1:13 axes x1y2 ti "ping CPU (right)", "" u 1:15 axes x1y2 ti "pong CPU (right)" + +EOF diff --git a/examples/perfscript/throughput-test b/examples/perfscript/throughput-test old mode 100644 new mode 100755 index 36edaa3..d654056 --- a/examples/perfscript/throughput-test +++ b/examples/perfscript/throughput-test @@ -1,45 +1,53 @@ #!/bin/bash +export nwif=eth0 +bandwidth=1e9 +remotedir="$PWD" +provision=false +asynclist="sync async" +modelist="listener polling waitset" +sizelist="0 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000" +timeout=30 +loopback=true +resultdir="throughput-result" + usage () { cat >&2 <$cfg < 17 - - - $nwif - $loopback - - - - 500kB - - ${async:-0} - 3s - - + + $nwif + $loopback + 65500B + 4000B + + + + 500kB + + \${async:-0} + 3s + + + config + EOF -if [ ! -x bin/ThroughputPublisher -o ! -x bin/ThroughputSubscriber -o ! -x $ethload ] ; then - echo "some check for existence of a file failed on the local machine" >&2 +if [ ! -x bin/ddsperf ] ; then + echo "bin/ddsperf not found on the local machine" >&2 exit 1 fi @@ -91,33 +101,35 @@ if $provision ; then for r in $pubremote "$@" ; do ssh $r mkdir -p $remotedir $remotedir/bin $remotedir/lib scp lib/libddsc.so.0 $r:$remotedir/lib - scp bin/ThroughputPublisher bin/ThroughputSubscriber $r:$remotedir/bin + scp bin/ddsperf $r:$remotedir/bin done fi +topic=KS +[ -z "$sizelist" ] && topic=OU + export CYCLONEDDS_URI=file://$PWD/$cfg for r in $pubremote "$@" ; do scp $cfg $r:$remotedir || { echo "failed to copy $cfg to $remote:$PWD" >&2 ; exit 1 ; } done -for async in $asynclist ; do +for async_mode in $asynclist ; do + case "$async_mode" in + sync) async=0 ;; + async) async=1 ;; + *) echo "$async_mode: invalid setting for ASYNC" >&2 ; continue ;; + esac export async - for mode in $modelist ; do - echo "======== ASYNC $async MODE $mode =========" + for sub_mode in $modelist ; do + echo "======== ASYNC $async MODE $sub_mode =========" cat > run-publisher.tmp < pub.log & ppid=\$! - top -b -d1 -p \$ppid >> pub-top.log & tpid=\$! - sleep $timeout - kill \$tpid - kill -2 \$ppid - wait \$ppid + bin/ddsperf -D $timeout -T $topic pub size \$size > pub.log sleep 5 done wait @@ -129,7 +141,7 @@ EOF export CYCLONEDDS_URI=file://$remotedir/$cfg export async=$async cd $remotedir -nohup bin/ThroughputSubscriber 0 $mode > /dev/null & +nohup bin/ddsperf -T $topic sub $sub_mode > /dev/null & echo \$! EOF for r in "$@" ; do @@ -138,22 +150,18 @@ EOF killremotesubs="$killremotesubs ssh $r kill -9 $rsubpid &" done fi - - outdir=$resultdir/data-async$async-mode$mode + + outdir=$resultdir/$async_mode-$sub_mode mkdir $outdir - rm -f sub-top.log - $ethload $nwif $bandwidth > $outdir/sub-ethload.log & lpid=$! - bin/ThroughputSubscriber 0 $mode > $outdir/sub.log & spid=$! - top -b -d1 -p $spid >> $outdir/sub-top.log & tpid=$! + bin/ddsperf -d $nwif:$bandwidth -c -T $topic sub $sub_mode > $outdir/sub.log & spid=$! tail -f $outdir/sub.log & xpid=$! ssh $pubremote ". $remotedir/run-publisher.tmp" - kill $tpid - kill -2 $spid + kill $spid eval $killremotesubs sleep 1 - kill $lpid $xpid + kill $xpid wait - scp $pubremote:$remotedir/{pub-top.log,pub.log} $outdir + scp $pubremote:$remotedir/pub.log $outdir done done diff --git a/examples/perfscript/throughput-test-extract b/examples/perfscript/throughput-test-extract index 9f16ff3..973f397 100755 --- a/examples/perfscript/throughput-test-extract +++ b/examples/perfscript/throughput-test-extract @@ -1,59 +1,76 @@ #!/usr/bin/perl -w +# Note: this is specialized for async delivery, listener mode because of the way it deals with +# thread names + use strict; -my @dirs = ("async0-mode-1", "async0-mode0", "async0-mode1", - "async1-mode-1", "async1-mode0", "async1-mode1"); - -my $dataset = 0; -my $basedir = "throughput-result"; -$basedir = $ARGV[0] if @ARGV== 1; -my $load_threshold = 20; -for my $dir (@dirs) { - my @loads = (); - - { - open LH, "< $basedir/data-$dir/sub-ethload.log" or next; # die "can't open $basedir/data-$dir/sub-ethload.log"; - my @curload = (); - while () { - next unless /^r +([0-9.]+).*\( *(\d+)/; - push @curload, $2 if $1 > $load_threshold; - if (@curload && $1 < $load_threshold) { - push @loads, median (@curload); - @curload = (); - } - } - push @loads, median (@curload) if @curload; - close LH; +my %res = (); +my %meas; +while (<>) { + next unless s/^\[\d+\] \d+\.\d+\s+//; + if (/^size (\d+) .* rate (\d+\.\d+)\s*kS\/s\s+(\d+\.\d+)\s*Mb\/s/) { + # size is always the first line of an output block + # ddsperf doesn't print CPU loads, RSS, bandwidth if it is zero + my %tmp = %meas; + push @{$res{$meas{size}}}, \%tmp if %meas; + %meas = (size => $1, rate => $2, cookedbw => $3, + rawxmitbw => 0, rawrecvbw => 0, + subrss => 0, pubrss => 0, + subcpu => 0, subrecv => 0, + pubcpu => 0, pubrecv => 0); + } elsif (s/^(\@[^:]+:\d+\s+)?rss:(\d+\.\d+)([kM])B//) { + my $side = defined $1 ? "pub" : "sub"; + $meas{"${side}rss"} = $2 / ($3 eq "k" ? 1024.0 : 1); + $meas{"${side}cpu"} = cpuload (($side eq "pub") ? "pub" : "dq.user", $_); + $meas{"${side}recv"} = cpuload ("recvUC", $_); + } elsif (/xmit\s+(\d+)%\s+recv\s+(\d+)%/) { + $meas{rawxmitbw} = $1; + $meas{rawrecvbw} = $2; } +} +push @{$res{$meas{size}}}, \%meas if %meas; +die "no data found" unless keys %res > 0; - open FH, "< $basedir/data-$dir/sub.log" or next; # die "can't open $basedir/data-$dir/sub.log"; - print "\n\n" if $dataset++; - print "# mode $dir\n"; - print "# payloadsize rate[samples/s] appl.bandwidth[Mb/s] raw.bandwidth[Mb/s]\n"; - my $psz; - my @rate = (); - while () { - next unless /Payload size: ([0-9]+).*Transfer rate: ([0-9.]+)/; - my $psz_cur = $1; my $rate_cur = $2; - $psz = $psz_cur unless defined $psz; - if ($psz != $psz_cur) { - my $load = shift @loads; - my $rate = median (@rate); - printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3; - @rate = (); - } - $psz = $psz_cur; - push @rate, ($rate_cur + 0.0); +print "#size rate cookedbw rawxmitbw rawrecvbw pubrss subrss pubcpu pubrecv subcpu subrecv\n"; +my @sizes = sort { $a <=> $b } keys %res; +for my $sz (@sizes) { + my $ms = $res{$sz}; + my $rate = median ("rate", $ms); + my $cookedbw = median ("cookedbw", $ms); + my $rawxmitbw = median ("rawxmitbw", $ms); + my $rawrecvbw = median ("rawrecvbw", $ms); + my $pubrss = max ("pubrss", $ms); + my $subrss = max ("subrss", $ms); + my $pubcpu = median ("pubcpu", $ms); + my $pubrecv = median ("pubrecv", $ms); + my $subcpu = median ("subcpu", $ms); + my $subrecv = median ("subrecv", $ms); + print "$sz $rate $cookedbw $rawxmitbw $rawrecvbw $pubrss $subrss $pubcpu $pubrecv $subcpu $subrecv\n"; +} + +sub cpuload { + my ($thread, $line) = @_; + $thread =~ s/\./\\./g; + if ($line =~ /$thread:(\d+)%\+(\d+)%/) { + return $1+$2; + } else { + return 0; } - my $load = shift @loads; - my $rate = median (@rate); - printf "%d %f %f %f\n", $psz, $rate, $rate * (8 + $psz) / 125e3, $load / 125e3; - close FH; +} + +sub max { + my $v; + for (extract (@_)) { $v = $_ unless defined $v; $v = $_ if $_ > $v; } + return $v; } sub median { - my @xs = sort { $a <=> $b } @_; + my @xs = sort { $a <=> $b } (extract (@_)); return (@xs % 2) ? $xs[(@xs - 1) / 2] : ($xs[@xs/2 - 1] + $xs[@xs/2]) / 2; } +sub extract { + my ($key, $msref) = @_; + return map { $_->{$key} } @$msref; +} diff --git a/examples/perfscript/throughput-test-plot b/examples/perfscript/throughput-test-plot index 9840b07..0e0157f 100755 --- a/examples/perfscript/throughput-test-plot +++ b/examples/perfscript/throughput-test-plot @@ -1,14 +1,55 @@ #!/bin/bash -`dirname $0`/throughput-test-extract > data.txt +`dirname $0`/throughput-test-extract "$@" > data.txt gnuplot <<\EOF -set term png size 1024,768 -set output "throughput-polling.png" -set st d l -set title "Throughput (polling with 1ms sleeps)" -set ylabel "M sample/s" -set y2label "Mbps" -set y2tics +set term pngcairo size 1024,768 +set output "throughput-async-listener-rate.png" +set st d lp +set st li 1 lw 2 +set st li 2 lw 2 +set st li 3 lw 2 + +set multiplot +set logscale xyy2 +set title "Throughput" +set ylabel "[Mbps]" +set ytics (100,200,300,400,500,600,700,800,900,1000) +set grid xtics ytics mytics set xlabel "payload size [bytes]" -p "data.txt" i 5 u 1:($2/1e6) ti "rate [M sample/s]", "" i 5 u 1:3 axes x1y2 ti "app bandwidth [Mbps]", "" i 5 u 1:4 axes x1y2 ti "GbE bandwidth [Mbps]" +# sample rate in data.txt is in kS/s +# GbE bandwidth in data.txt is in %, so 100% => 1000 Mbps +set key at graph 1, 0.9 +p "data.txt" u 1:3 ti "payload", "" u 1:(10*$5) ti "GbE bandwidth" +set ytics auto +set key default + +unset xlabel +unset title +set grid nomytics +set ylabel "[M sample/s]" +set origin .3, .1 +set size .6, .6 +clear +p "data.txt" u 1:($2/1e3) ti "rate" +unset multiplot + +unset origin +unset size + +unset logscale +set logscale x +set output "throughput-async-listener-memory.png" +set title "Throughput: memory" +set ylabel "RSS [MB]" +set xlabel "payload size [bytes]" +p "data.txt" u 1:6 ti "publisher", "" u 1:7 ti "subscriber" + +unset logscale +set logscale x +set output "throughput-async-listener-cpu.png" +set title "Throughput: CPU" +set ylabel "CPU [%]" +set xlabel "payload size [bytes]" +p "data.txt" u 1:8 ti "publisher (pub thread)", "" u 1:9 ti "publisher (recvUC thread)", "" u 1:10 ti "subscriber (dq.user thread)", "" u 1:11 ti "subscriber (recvUC thread)" + EOF