Giter Club home page Giter Club logo

algo_data.table's Issues

Change repo name please

How about algo_data.table instead of parallel_data.table ?
e.g. the truelength-clobber isn't related to parallelism.

groupby optimization

explained once by @mattdowle, so providing here

Q: does GForce allocate mem for biggest group, then copy there values of a group, to aggregate, so it can benefit from being contiguous in memory and will be more cache efficient? if so, do we check if groups aren't sorted already? so we can avoid doing allocation and copy?

gforce (gsum) assigns to many group results at once; it doesn't gather the groups together. You're describing non-gforce (dogroup.c) which copies to the largest group. See the branch in dogroups.c which knows whether groups are already grouped: it swithes to a memcpy. The memcpy is very fast (contiguous, pre-fetch) so it's pretty good already. We must copy because R's DATAPTR is not a pointer we can repoint, it's an offset from SEXP.

Structure of the document

I think it's too early to start the fork / PR workflow, so i'll take down and share a couple of notes and suggestion here. Hopefully it helps the discussion.

If looked at the original SO question and the discussion why it was closed in particular. Also based on what I could find about the original OP (Marianna) and what I could find about her background (calls herself a carpenter, not an eginneer), I guess this whole thing here is a chance to make data.table even more inclusive and accessible to a broader group of people.

So...

  • how about a bookdown project /w working title: Why is data.table so !#$@!! fast?
    (that doesn't mean the hard core stuff can go to an article)
  • which are the things data.table is fast at?
    I remember a discussion on twitter about how data.table got overlooked in articles on fast i/o despite the fact that fwrite / fread is the best out there. On the other hand it's obviously good at those typical aggregation / group by things. So maybe one approach to to structure this would be to break the algos down by their usage. Referencing across section is always
    possible of course....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.