Giter Club home page Giter Club logo

barsort's Introduction

Barsort

A speed optimised general purpose and stable numeric sort, specialised to work on numeric input only (integers and reals) and return a clone or sort index.

Barsort utilises a specialised algorithm similar to 'counting sort' originaly made to place array elements into groups of equal size with similar magnitudes. It is combined here with insert and merge sorts, and with edge case processing to create a fast numeric sort.

Testing across a good range of possible input distributions and sizes shows barsort is many times faster than node 2016's native sort and competitive with a proficient javascript implementation of Pythons optimised 'Timsort'.

Alas, the source code is a beastly private exertion beyond redemption.

Usage

  
//return a sorted clone of array
sorted_arr = Barsort.sort( array [,"descend"] )      

//return a sorted index to array ([optional params])
index_arr = Barsort.sortorder( array [,index_arr][,"descend"] )  
  

Summary of speedtests:

The following tables are for generating a sort index of arrays. Timsort is considerably faster doing light sorting in place.

Pre-sorted input, Lengths: 100 10,000 1,000,000
Barsort sort 100 % 100 % 100 %
Native sort 5 % 2 % 2 %
Timsort sort 100 % 100 % 100 %
Gaussian distribution 100 10,000 1,000,000
Barsort sort 100 % 100 % 100 %
Native sort 2 % 2 % 2 %
Timsort sort 100 % 60 % 60 %
Tough distribution 100 10,000 1,000,000
Barsort sort 100 % 100 % 100 %
Native sort 10 % 10 % 10 %
Timsort sort 100 % 65 % 65 %

See also

Timsort is a popular multipurpose in-place sort.

LSD Radix Sort is 3x Barsort speed but is limited to unsigned integers and can not arrange an index.

Barsort algorithm basics - a "counting sort"

The input numbers are first tallied into bins as though calculating a histogram (by dividing by a suitable factor and casting to integer to get a bin number). Like this:

  for(var i=0; i<e; i++) 
  { kysbin[i]=(binperval*(kysval[i]-minv))>>0  } 

These "counting bins" are subsequently indexed by a fewer number of "placement bins". The core algorithm was developed to sort data roughly into histogram bars ( without sorting within the bars). The "counting bins" were subdivisions of the bars to reduce spillage between the bars. So, the cumulative sum of the populations of the placement bins is calculated so that for each placement bin an anchor position in the sorting index (output) is known (for values of bins range).

Like this:

  for(var bin=0; bin<nbin; bin++){
    
    barofbin[bin]=fillbar            //fillbar is the bar to fill currently
    barsfill[fillbar]+=cntofbin[bin] //here it is being allocated a bins tally 

    while(barsfill[fillbar]>=fcap){   //when bar is full... 
      barsfill[fillbar+1]+=barsfill[fillbar]-fcap
      barsfill[fillbar]=fcap
      fillbar++                //...fill next bar
      nxtcap+=kysperbar-fcap   //nxtcap and kysperbar are floats
      fcap=nxtcap >>>0         //fcap is integer (it differs for each bar)
    }
  } 

Some multi-indirected lookup and updating is done for each input to use the base placement info to assign inputs their position in the sorting index. Here is that final 'curious' code:

  var bapos=new Array(nbar); bapos[0]=0 //( before_anchor_pos )
  for(var i=0;i<nbar-1;i++){ bapos[i+1]=bapos[i]+barsfill[i] }

  for(var i=st; i<ov; i++){
    var binofel=kysbin[i] 
    
    //(change barofbin if barsfill is empty)
    while( barsfill[barofbin[binofel]]===0 ){ 
      barofbin[binofel]++ 
    }
    barsfill[barofbin[binofel]]--          
    sortix[ bapos[barofbin[binofel]]++ ]=i //sort index gets ordered by bar
  }
  // (this is not the unpresentable part...)

The counting sort is used to get elements quite close to where thay should be but they need to be fine-sorted afterward. The classic "insertion sort" is perfect for fine sorting as long it never has to move any elements too far. It can fall back on mergesort to cope with rare problem cases.

Version History

  • 0.5.0 - pre release, in use and testing ...
  • 0.6.0 - repo fixed, pre release in use and testing ...
  • 0.9.0 - much developed and testing ...
  • 0.13.0 - ...much more developed and tested.

barsort's People

Contributors

strainer avatar

Stargazers

 avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.