Giter Club home page Giter Club logo

1brc-go's Introduction

1brc-go

The One Billion Row Challenge (1BRC) is a fun exploration of how far modern languages can be pushed for aggregating one billion rows from a text file. The Dataset for angregation is around 13GB.

The Challenge was originally coined for Java based compilers but here i have implemented in golang .

The original repo can be found at https://github.com/gunnarmorling/1brc

The Implementation is in different methods running for machine with specs below

Specifications:

Processor: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz, 1800 Mhz, 4 Core(s), 8 Logical Processor(s)
Physical Memory(RAM): 8GB

Method 1: Hashmap but no Parallization

In this method hashmap is used where key is hashed to index in a array. No go routine is used the program executes serially. The program takes 7m 35s to execute approximately

Method 2: Go's Maps with pointers but no Parallization

This method is similar to first but instead of using hashmap golang's inbuilt map is used which maps key to pointer of values. The execution time reduced to around 5m

Method 3: Parallization(Batch Processing) with Hashmaps

Here the execution is parellized by batch processing along with custom hashmaps. Lines are chunked together in different sizes of multiple 1024 bytes. The execution time is reduced to 4m 45s

Method 4: Parallization(Batch Processing) with Go Maps

This method is same as above only maps are used. Execution time is tested for different chunk lengths along with the cores in use.
chunk len: 64x1024, 128x1024, 256x1024
| 8 Cores | 3m54s | 3m22s | 3m23s |
| 6 Cores | 3m54s | 3m21s | 3m49s |

Further Optimizations

  1. Using Buffers instead of scanner.Scan()
  2. Different methods to convert to tempratures to Int
  3. Profiling the program
  4. Improving Hashing algorithm to use instead of go maps
  5. Execution time can be further reduced by increasing CPU Cores to increase parallization

1brc-go's People

Contributors

ayushmd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.