Giter Club home page Giter Club logo

bucket-sort's Introduction

Bucket sort

In this exercise we will learn how to implement a more efficient bucket sort. It has the same amortised complexity as the one you have already seen, O(n + m) where n is the number of elements to sort and m is the number of keys (values) you have, but it avoids a lot of data structure overhead and uses less memory. And the good news is that it is only slightly harder to implement than what you have already seen.

To fully understand the distinction between "keys" and "values" when we are sorting--something that we won't have much use for in this class, but you will have to trust me is very important--we consider two cases:

  • Sorting a list of integers x: list[int] where each is in the range 0 <= x[i] < m. This is the case where we have only "keys"; and you can probably guess that the values in x must then be the keys.
  • Sorting a list of pairs x: list[tuple[int,Any]] where the first element in the pairs are the keys, in the same range [0,m) as the first case, and the second can be any object we associate with that key.

Count sort

The first case looks simpler than the second, and it is. Here, we can use count sort, as you have already seen it. First, you count how many of each key you see. If you go to src/bsort.py you will find a template for count_keys(x) function. Go and implement it.

With that in hand, you can run through the keys in increasing order, and for each k you output counts[k] ks. That is how many times x had a k, so you are outputting the right number of them, and if you output them in order, you output the keys in sorted order.

To do this, you can create a block of counts[k] ks with [k] * counts[k] and add them to a list with out.extend([k] * counts[k]). So count sort can be as simple as this:

def count_sort(x: list[int]) -> list[int]:
    counts = count_keys(x)
    out = []
    for k, count in enumerate(counts):
        out.extend([k] * count)
    return out

Growing a list can be more expensive than pre-allocating one of the right length, though, and in some languages you don't have lists that you can append, so we will make it just a little more difficult. Implement a count_sort() function (see src/bsort.py) where you allocate the output list in one go, and then insert the keys in it.

def count_sort(x: list[int]) -> list[int]:
    counts = count_keys(x)
    out = [0] * len(x) # the output has some length as x
    ... # fill out out
    return out

This isn't actually going to be more efficient in Python, (it will be a little slower), but if you can do that, you can do a count sort in any language and not just Python.

Now you can do a count sort, and you have most of what you need to implement a bucket sort as well.

Bucket sort

The kind of bucket sort you will implement resembles the count_sort() above in that we will allocate an out list for the output and that and the key counts is the only extra list we will use. No list of lists to append to or any such thing, just a simple list of integers.

Our input is a list of pairs x = [(k0,x0), (k1,x1), ..., ] and we want to rearrange it such that the pairs are sorted with respect to their keys, but in a stable manner, such that pairs with the same key will come out in the same order as they came in.

but how do we move the pairs to their right position?

We will count the keys

counts = count_keys([k for k, _ in x])

as before, but we cannot simply output a count[k] number of ks. We need to copy the right pair to each position.

The key observation is now that the counts tell us how large each bucket in the output is. If our keys are [1, 2, 1, 2, 4] the counts will be

0 => 0  # we don't have any zeros
1 => 2  # we have two 1s
2 => 2  # we have two 2s
3 => 0  # we have zero 3s
4 => 1  # we have one 4

which tells us that the output will have a bucket with size 0 for the 0s, then a bucket of size 2 for the 1s, then a bucket of size 2 for the 2s, a bucket of size zero for the 3s, and finally a bucket of size 1 for the 4s.

      0s,1s start here
         |     2s start here
         |     |      3,4s start here
         |     |      |
         v     v      v
out = [  1 1   2 2    4  ]
        |-2-| |-2-| |-1-|

If you do a cumulative sum of the key counts

[0, 2, 2, 0, 1] => [0, 0, 2, 4, 4]

(write the function cumsum() to do that) then you can get

buckets = cumsum(count_keys([k for k, _ in x]))

and bucket[k] will tell you at which index in the output the k keys will start.

If you now run through x and for each pair (k,v) you put that pair at out[bucket[k]] and increment bucket[k] by one, you will have placed all the pairs in their right position.

That is how bucket sort is really done.

Try to implement it in the function bucket_sort().

bucket-sort's People

Contributors

mailund avatar jojovin avatar

Watchers

 avatar

bucket-sort's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.