Giter Club home page Giter Club logo

esceval's People

Contributors

mentalisttraceur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

esceval's Issues

Old performance test scraps

Just jotting this down so that there's a record of it somewhere.

Years ago (pretty sure it was before COVID 19), I did some performance testing of different shell implementations of esceval.

This was the code I used (the way I grabbed the timing information is really bad practice if you need high precision, but with large-enough test inputs like in this case, the actual implementation performance dominated, so it was fine). This was rather manual - I was basically just sourcing these definitions and then calling the functions.

[click to expand]
setup_test_data()
{
 d1=`cat "$1"`
 d2="$d1 $d1"
 d4="$d2 $d2"
 d8="$d4 $d4"
 d16="$d8 $d8"
 d32="$d16 $d16"
 d64="$d32 $d32"
 d128="$d64 $d64"
 d256="$d128 $d128"
 d512="$d256 $d256"
 d1024="$d512 $d512"
 d2048="$d1024 $d1024"
}

timed_test_run()
{
 command=$1
 file=$2
 shift 2
 printf '%s :: ' "$command"
 
 start_time=`date +%s`
 "$command" "$@" >"$file"
 end_time=`date +%s`
 printf '%s :: ' "$? $((end_time - start_time))"
 
 start_time=`date +%s`
 captured_output=`"$command" "$@"`
 end_time=`date +%s`
 printf '%s\n' "$? $((end_time - start_time))"
}

esceval0()
{
 case $# in 0) return 0; esac
 (
  b='\\'
  while :
  do
   escaped=`
    printf '%s\n' "'$1" \
    | sed "
     s/'/'$b''/g
     1 s/^'$b''/'/
     $ s/$/'/
    "
   `
   shift
   case $# in 0) break; esac
   printf '%s ' "$escaped"
  done
  printf '%s\n' "$escaped"
 )
}

escevalp()
{
 sed "
  s/'/'\\\\''/g
  1 s/^/'/
  $ s/$/'/
 "
}

esceval1()
{
 case $# in 0) return 0; esac
 (
  set -e
  while :
  do
   escaped=`printf '%s\n' "$1" | escevalp`
   shift
   case $# in 0) break; esac
   printf '%s ' "$escaped"
  done
  printf '%s\n' "$escaped"
 )
}

esceval2()
{
 case $# in 0) return 0; esac
 (
  set -e
  while :
  do
   printf \'
   unescaped=$1
   while :
   do
    case $unescaped in
    *\'*)
     printf %s "${unescaped%%\'*}""'\''"
     unescaped=${unescaped#*\'}
    ;;
    *)
     break
    esac
   done
   printf %s "$unescaped"
   shift
   case $# in 0) break; esac
   printf "' "
  done
  printf "'\n"
 )
}

esceval3()
{
 case $# in 0) return 0; esac
 (
  set -e
  while :
  do
   escaped=\'
   unescaped=$1
   while :
   do
    case $unescaped in
    *\'*)
     escaped=$escaped${unescaped%%\'*}"'\''"
     unescaped=${unescaped#*\'}
     ;;
    *)
     break
    esac
   done
   escaped=$escaped$unescaped\'
   shift
   case $# in 0) break; esac
   printf '%s ' "$escaped"
  done
  printf '%s\n' "$escaped"
 )
}

esceval4()
{
 case $# in 0) return 0; esac
 (
  set -e
  escaped=\'
  while :
  do
   unescaped=$1
   while :
   do
    case $unescaped in
    *\'*)
     escaped=$escaped${unescaped%%\'*}"'\''"
     unescaped=${unescaped#*\'}
     ;;
    *)
     break
    esac
   done
   escaped=$escaped$unescaped"' '"
   shift
   case $# in 0) break; esac
  done
  escaped=${escaped%" '"}
  printf '%s\n' "$escaped"
 )
}

esceval5()
{
 case $# in 0) return 0; esac
 (
  set -e
  while :
  do
   escaped=\'${1//\'/"'\''"}\'
   shift
   case $# in 0) break; esac
   printf "%s " "$escaped"
  done
  printf "%s\n" "$escaped"
 )
}

The random data files were created by reading from /dev/urandom. Note that if you just create a big data file from /dev/urandom, you can end up a few very large shell words or very tiny ones, but on average you'll get words roughly ~85 characters long. (With the default three IFS characters, roughly every 3/256 bytes will be word-splitters for the the shell.) So we might do head -c 4096 </dev/urandom >test-input.dat, and then setup_test_data 'test-input.dat'. But if you have small words, you can then create large words by quoting one of the d{{number}} variables created by setup_test_data.

The biggest finding was that this basically doesn't matter, since the performance difference doesn't become significant until you're working with huge inputs - megabyte of data, etc. Even on a 600Mhz single-code ARMv7 CPU with 256MiB RAM.

The most interesting finding was that somehow on zsh the naive sed-based implemention out-performed the shell's native variable substitution expansion (esceval0 vs esceval5), despite the opposite being true in bash. Or something like that, it's been years - I definitely remember almost questioning if zsh somehow optimized invoking sed or provided its own built-in of it, it was that surprising.

The obvious expected finding was that there's an inflection point where for large-enough words, the sed-based implementation starts beating the in-shell implementation - process forking overhead dominates for small words, but once the word gets long enough, doing it in-shell spends a bunch of time mucking with entire copies of the word in memory, probably doing a bunch of allocations and deallocations and unoptimized string copies (all while looping at the level of the relatively unoptimized interpreter), while sed can stream the word and do it in constant space (all while looping in machine code).

I don't remember what else I found, if anything. It was all micro-optimizations beyond that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.