asantucci / algo_data.table Goto Github PK
View Code? Open in Web Editor NEWA place to work on documenting recent algorithmic improvements to data.table.
A place to work on documenting recent algorithmic improvements to data.table.
Will use this for now:
amstat tex template
How about algo_data.table
instead of parallel_data.table
?
e.g. the truelength-clobber isn't related to parallelism.
And benchmark it!
sort
and sort.list
.rprof
for profiling.explained once by @mattdowle, so providing here
Q: does GForce allocate mem for biggest group, then copy there values of a group, to aggregate, so it can benefit from being contiguous in memory and will be more cache efficient? if so, do we check if groups aren't sorted already? so we can avoid doing allocation and copy?
gforce
(gsum) assigns to many group results at once; it doesn't gather the groups together. You're describing non-gforce (dogroup.c) which copies to the largest group. See the branch in dogroups.c which knows whether groups are already grouped: it swithes to a memcpy. The memcpy is very fast (contiguous, pre-fetch) so it's pretty good already. We must copy because R's DATAPTR is not a pointer we can repoint, it's an offset from SEXP.
I think it's too early to start the fork / PR workflow, so i'll take down and share a couple of notes and suggestion here. Hopefully it helps the discussion.
If looked at the original SO question and the discussion why it was closed in particular. Also based on what I could find about the original OP (Marianna) and what I could find about her background (calls herself a carpenter, not an eginneer), I guess this whole thing here is a chance to make data.table even more inclusive and accessible to a broader group of people.
So...
Why is data.table so !#$@!! fast?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.