From 5da7b8eac856b61e47214b29491f1d547acac3f1 Mon Sep 17 00:00:00 2001 From: cathugger Date: Sun, 7 Jan 2018 22:20:38 +0000 Subject: [PATCH] optimisation tips --- OPTIMISATION.txt | 128 +++++++++++++++++++++++++++++++++++++++++++++++ README.txt | 1 + 2 files changed, 129 insertions(+) create mode 100644 OPTIMISATION.txt diff --git a/OPTIMISATION.txt b/OPTIMISATION.txt new file mode 100644 index 0000000..52516af --- /dev/null +++ b/OPTIMISATION.txt @@ -0,0 +1,128 @@ +This document describes configuration options which may help one to generate onions faster. +First of all, default configuration options are tuned for portability, not performance. +User is expected to pick optimal settings depending on hardware mkp224o will run on and ammount of filters. + + +ED25519 implementations: +mkp224o includes multiple implementations of ed25519 code, tuned for different processors. +Default is ref10 implementation from SUPERCOP, which is suboptimal in many cases. +Implementation is selected at configuration time, when running `./configure` script. +If one already configured/compiled code and wants to change options, just re-run +`./configure` and also run `make clean` to clear compiled files, if any. +At the time of writing, these implementations are present: ++----------------+-----------------------+-------------------------------------------------+ +| implementation | enable flag | notes | +|----------------+-----------------------+-------------------------------------------------+ +| ref10 | --enable-ref10 | SUPERCOP' ref10, pure C, very portable, default | +| amd64-51-30k | --enable-amd64-51-30k | SUPERCOP' amd64-51-30k, amd64 assembler, | +| | | only works in x86_64 architecture | +| amd64-64-24k | --enable-amd64-64-24k | SUPERCOP' amd64-64-24k, amd64 assembler, | +| | | only works in x86_64 architecture | +| ed25519-donna | --enable-donna | portable, based on amd64-51-30k, but C, not asm | +| ed25519-donna | --enable-donna-sse2 | uses SSE2, needs x86 architecture | ++----------------+-----------------------+-------------------------------------------------+ +When to use what: + - on 32-bit x86 architecture "--enable-donna" will probably be fastest, but one should try + using "--enable-donna-sse2" too + - on 64-bit x86 architecture, it really depends on your processor; "--enable-amd64-51-30k" + worked best for me, but you should really benchmark on your own machine + - on ARM "--enable-donna" will probably work best + - otherwise you should benchmark, but "--enable-donna" will probably win + +Please note, that these recomendations may become out of date if more implementations +are added in the future; use `./configure --help` to obtain all avaiable options. +When in doubth, benchmark. + + +Onion filtering settings: +mkp224o supports multiple algorithms and data types for filtering. +Depending on your use case, picking right settings may increase performance. +At the time of writing, mkp224o supports 2 algorithms for filter searching: +sequential and binary search. Sequential search is default, and will probably +be faster with small ammount of filters. If you have lots of filters (lets say >100), +then picking binary search algorithm is the right way. +mkp224o also supports multiple filter types: filters can be represented as integers +instead of being binary strings, and that can allow better compiler's optimizations +and faster code (dealing with fixed-size integers instead of variable-length strings is simpler). +On the other hand, fixed size integers limit length of filters, therefore +binary strings are used by default. + +Current options, at the time of writing: + --enable-binsearch enable binary search algoritm; MUCH faster if there + are a lot of filters. by default, if this isn't enabled, + sequential search is used + + --enable-intfilter[=(32|64|128|native)] + use integers of specific size (in bits) [default=64] + for filtering. faster but limits filter length to: + 6 for 32-bit, 12 for 64-bit, 24 for 128-bit. by default, + if this option is not enabled, binary strings are used, + which are slower, but not limited in length. + + --enable-binfilterlen=VAL + set binary string filter length (if you don't use intfilter). + default is 32 (bytes), which is maximum key length. + this may be useful for decreasing memory usage if you + have a lot of short filters, but then using intfilter + may be better idea. + + --enable-besort force intfilter binsearch case to use big endian + sorting and not omit masks from filters; useful if + your filters aren't of same length. + let me elaborate on this one. + by default, when binary search algorithm is used with integer + filters, we actually omit filter masks and use global mask variable, + because otherwise we couldn't reliably use integer comparision operations + combined with per-filter masks, as sorting order there is unclear. + this is because majority of processors we work with are little-endian. + therefore, to achieve proper filtering in case where filters + aren't of same length, we flatten them by inserting more filters. + binary searching should balance increased overhead here to some extent, + but this is definitelly not optimal and can bloat filtering table + very heavily in some cases (for example if there exists say 1-char filter + and 8-char filter, it will try to flatten 1-char filterto 8 chars + and add 32*32*32*32*32*32*32 filters to table which isn't really good). + this option makes us use big-endian way of integer comparision, which isn't + native for current little-endian processors but should still work much better + than binary strings. we also then are able to have proper per-filter masks, + and don't do stupid flattening tricks which may backfire. + + TL;DR: its quite good idea to use this if you do "--enable-binsearch --enable-intfilter" + and have some random filters which may have different length. + + +Benchmarking: +It's always good idea to see if your settings give you desired effect. +There currently isn't any automated way to benchmark different configuration options, but it's pretty simple to do by hand. +For example: +# prepare configuration script +./autogen.sh +# try default configuration +./configure +# compile +make +# benchmark implementation speed +./mkp224o -s -d res1 neko +# wait for a while, copy statistics to some text editor +^C # stop experiment when you've collected enough data +# try with different settings now +./configure --enable-amd64-64-24k --enable-intfilter +# clean old compiled files +make clean +# recompile +make +# benchmark again +./mkp224o -s -d res2 neko +# wait for a while, copy statistics to some text editor +^C # stop experiment when you've collected enough data +# configure again, make clean, make, run test again....... +# until you've got enough data to make decisions + +when benchmarking filtering settings, remember to actually use filter files you're going to work with. + + +What options I use: +For my lappy with old-ish i5 I do `./configure --enable-amd64-51-30k --enable-intfilter` incase I want single onion, +and `./configure --enable-amd64-51-30k --enable-intfilter --enable-binsearch --enable-besort` when playing with dictionaries. +For my raspberry pi 2, `./configure --enable-donna --enable-intfilter` +(and also +=" --enable-binsearch --enable-besort" for dictionaries). diff --git a/README.txt b/README.txt index 20d6303..25e1015 100644 --- a/README.txt +++ b/README.txt @@ -23,6 +23,7 @@ directory, but that can be overridden with -d switch. Use -s switch to enable printing of statistics, which may be useful when benchmarking different ed25519 implementations on your machine. Use -h switch to obtain all avaiable options. +I highly recommend reading OPTIMISATION.txt for performance-related tips. CONTACT: For bug reports/questions/whatever else, email cathugger at cock dot li.