mentalisttraceur / esceval Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD Zero Clause License
License: BSD Zero Clause License
Just jotting this down so that there's a record of it somewhere.
Years ago (pretty sure it was before COVID 19), I did some performance testing of different shell implementations of esceval.
This was the code I used (the way I grabbed the timing information is really bad practice if you need high precision, but with large-enough test inputs like in this case, the actual implementation performance dominated, so it was fine). This was rather manual - I was basically just sourcing these definitions and then calling the functions.
setup_test_data()
{
d1=`cat "$1"`
d2="$d1 $d1"
d4="$d2 $d2"
d8="$d4 $d4"
d16="$d8 $d8"
d32="$d16 $d16"
d64="$d32 $d32"
d128="$d64 $d64"
d256="$d128 $d128"
d512="$d256 $d256"
d1024="$d512 $d512"
d2048="$d1024 $d1024"
}
timed_test_run()
{
command=$1
file=$2
shift 2
printf '%s :: ' "$command"
start_time=`date +%s`
"$command" "$@" >"$file"
end_time=`date +%s`
printf '%s :: ' "$? $((end_time - start_time))"
start_time=`date +%s`
captured_output=`"$command" "$@"`
end_time=`date +%s`
printf '%s\n' "$? $((end_time - start_time))"
}
esceval0()
{
case $# in 0) return 0; esac
(
b='\\'
while :
do
escaped=`
printf '%s\n' "'$1" \
| sed "
s/'/'$b''/g
1 s/^'$b''/'/
$ s/$/'/
"
`
shift
case $# in 0) break; esac
printf '%s ' "$escaped"
done
printf '%s\n' "$escaped"
)
}
escevalp()
{
sed "
s/'/'\\\\''/g
1 s/^/'/
$ s/$/'/
"
}
esceval1()
{
case $# in 0) return 0; esac
(
set -e
while :
do
escaped=`printf '%s\n' "$1" | escevalp`
shift
case $# in 0) break; esac
printf '%s ' "$escaped"
done
printf '%s\n' "$escaped"
)
}
esceval2()
{
case $# in 0) return 0; esac
(
set -e
while :
do
printf \'
unescaped=$1
while :
do
case $unescaped in
*\'*)
printf %s "${unescaped%%\'*}""'\''"
unescaped=${unescaped#*\'}
;;
*)
break
esac
done
printf %s "$unescaped"
shift
case $# in 0) break; esac
printf "' "
done
printf "'\n"
)
}
esceval3()
{
case $# in 0) return 0; esac
(
set -e
while :
do
escaped=\'
unescaped=$1
while :
do
case $unescaped in
*\'*)
escaped=$escaped${unescaped%%\'*}"'\''"
unescaped=${unescaped#*\'}
;;
*)
break
esac
done
escaped=$escaped$unescaped\'
shift
case $# in 0) break; esac
printf '%s ' "$escaped"
done
printf '%s\n' "$escaped"
)
}
esceval4()
{
case $# in 0) return 0; esac
(
set -e
escaped=\'
while :
do
unescaped=$1
while :
do
case $unescaped in
*\'*)
escaped=$escaped${unescaped%%\'*}"'\''"
unescaped=${unescaped#*\'}
;;
*)
break
esac
done
escaped=$escaped$unescaped"' '"
shift
case $# in 0) break; esac
done
escaped=${escaped%" '"}
printf '%s\n' "$escaped"
)
}
esceval5()
{
case $# in 0) return 0; esac
(
set -e
while :
do
escaped=\'${1//\'/"'\''"}\'
shift
case $# in 0) break; esac
printf "%s " "$escaped"
done
printf "%s\n" "$escaped"
)
}
The random data files were created by reading from /dev/urandom
. Note that if you just create a big data file from /dev/urandom
, you can end up a few very large shell words or very tiny ones, but on average you'll get words roughly ~85 characters long. (With the default three IFS
characters, roughly every 3/256 bytes will be word-splitters for the the shell.) So we might do head -c 4096 </dev/urandom >test-input.dat
, and then setup_test_data 'test-input.dat'
. But if you have small words, you can then create large words by quoting one of the d{{number}}
variables created by setup_test_data
.
The biggest finding was that this basically doesn't matter, since the performance difference doesn't become significant until you're working with huge inputs - megabyte of data, etc. Even on a 600Mhz single-code ARMv7 CPU with 256MiB RAM.
The most interesting finding was that somehow on zsh
the naive sed
-based implemention out-performed the shell's native variable substitution expansion (esceval0
vs esceval5
), despite the opposite being true in bash
. Or something like that, it's been years - I definitely remember almost questioning if zsh
somehow optimized invoking sed
or provided its own built-in of it, it was that surprising.
The obvious expected finding was that there's an inflection point where for large-enough words, the sed
-based implementation starts beating the in-shell implementation - process forking overhead dominates for small words, but once the word gets long enough, doing it in-shell spends a bunch of time mucking with entire copies of the word in memory, probably doing a bunch of allocations and deallocations and unoptimized string copies (all while looping at the level of the relatively unoptimized interpreter), while sed
can stream the word and do it in constant space (all while looping in machine code).
I don't remember what else I found, if anything. It was all micro-optimizations beyond that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.