Giter Club home page Giter Club logo

go-directio-prototype's Introduction

Using DirectIO with Go

The purpose of this playground is to evaluate the optios of Nick Craig-Wood's directio library.

Features

This section describes the features implemented by both parties: Producer and Consumer.

Configuration

The configuration items used by both parties are stored in .env file. You'll find there further details about each item's purpose.

Producer

Accoring to the max file size (defined in IO_FILE_MAX_SIZE config item), Producer is:

  • appending to the latest written file, if that file's size didn't reach the max size
  • writing to a new file, if there is no written file or if the size of the latest one exceeded the max size

Consumer

Consumer:

  • reads (consumes) files - one by one - from the same path (define in IO_PATH config item)
  • saves the state (aka ConsumerState in the consumer's code), so that it can resume the work any time

Todos

  • If writer fails to init properly, main should get notified, stop the producer, and end itself.

  • Replace the remaining usages of ioutil.ReadDir with this better option
    (basically, use fnames, err = f.Readdirnames(0) then do sort.Strings(fnames))

Tests

Reading Directory

Since I discovered that ioutil.ReadDir takes aprox 1.2 sec when there are +100K files in the directory, switched to using os.File.Readdirnames. Initially, raised this issue on Go's GitHub issue tracker.

But that call is not listing the files in the order they were created or by file name. So the result must be sorted. But the overall exec time is considerably better than of ioutil.ReadDir's one.

read_dir_eval/readdir_eval.go is a relevant example. Running it in a directory containing 499099 files, here are the figures (output):

>>> ioutil.ReadDir exec time: 1.311758808s
>>> os.File.Readdirnames exec time: 144.131977ms
>>> os.File.Readdirnames result has 499099 entries.
>>> sort exec time: 163.322515ms

For such a huge number of files, the standard rm -f *.dat does not work and the option is to use find . -name "*.dat" -print0 | xargs -0 rm.

go-directio-prototype's People

Contributors

dxps avatar

Watchers

 avatar

go-directio-prototype's Issues

Use data size driven blocksize

Currently, both producer and consumer are aware of the blocksize. This is very restrictive.

A better design should include:

  • Producer:
    • first writes the length of the data as a uint64(len(data))
    • then it writes the data itself.
  • Consumer:
    • first reads this data length (size as number of bytes)
    • then it reads the number for bytes previously found to read the entire data entry

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.