Giter Club home page Giter Club logo

Comments (16)

OvermindDL1 avatar OvermindDL1 commented on May 19, 2024 1

spawn_opt

You may not be able to disable the GC, but you can prealloc memory and tune various things with spawn options.

from benchee.

PragTob avatar PragTob commented on May 19, 2024 1

Yes as mentioned in good ol issue #1 - thanks nonetheless :)

from benchee.

PragTob avatar PragTob commented on May 19, 2024 1

Yeah I prefer to leave tracing out of it if I can - although I recall watching a presentation and they said that tracing overhead on the BEAM was extremely low - I might be misremembering and it might be worth benchmarking.

I guess if we're taking memory consumption of the whole BEAM and just always around the function invocations it should be good enough (as long as not running in parallel) for starters - while per process might be more accurate. So there's definitely options and levels to it but it might take some time to implement them all - first version would be simple and then we go from there :)

from benchee.

PragTob avatar PragTob commented on May 19, 2024 1

Just landed in #180 thanks to @devonestes and @michalmuskala 🎉

from benchee.

devonestes avatar devonestes commented on May 19, 2024

So I've done some research on this, and I think we could do this! Here are my two ideas:

  1. Take a snapshot of allocated memory before and after each run of the benchmarked function, and compare averages of those snapshots to do your comparisons, and

  2. Take a snapshot of allocated memory before and after the complete benchmarking for a given function, and compare those between the different functions that we're testing.

Idea 1 gives us more data, so we could calculate more statistics about memory use (like average, median, average deviation, etc.). The downside is it'll be slower, and probably more of a complex implementation.

Idea 2 will be simpler to implement and faster in execution, but it would limit what we could report. It would still be a viable comparison, and if you notice a significant difference that would be trustworthy, though.

What are your thoughts on this? I know what I'm leaning towards, but I'd like to get your honest opinion first. Or maybe you have some other ideas that I haven't covered here?

from benchee.

devonestes avatar devonestes commented on May 19, 2024

Oh, yeah, and while walking my dog I also realized that this might only be really useful if we can figure out a way to disable GC. Otherwise there's no way we can be sure a GC sweep won't clean up allocated memory during the function execution... Fun!

from benchee.

PragTob avatar PragTob commented on May 19, 2024

@devonestes thanks for the input and yay for dogs! It seems impossible to deactivate GC, but I think one can start a new process with more memory to make GC less likely (see the deactivate GC issue). Even if it does GC, it's just a bad value then and it happens. We could also just measure the maximum memory usage, that wouldn't be too bad either (or so Charlie Nutter told me) :)

from benchee.

devonestes avatar devonestes commented on May 19, 2024

So, with that in mind, it sounds like implementation number 1 might be the best way to go. That way it'll raise the likelihood of getting more runs without GC, therefore giving us a better idea of the actual maximum memory usage (I like that as the "go to" metric). We could still calculate things like mean and median, but unless folks are benchmarking non-deterministic code (which is a bad idea on their part anyway), the memory usage should be almost identical between each run of the given functions unless GC gets in the way, so taking the max makes sense to me.

Ok, that all said, I think I have an idea on how to tackle this now... 🎉

Also, being able to tune or disable GC seems like a pretty important thing to have. Have you considered opening an issue in Erlang to try and get that added in the future? Are they just against giving users APIs for that level of access to the GC? Seems kind of weird to me...

from benchee.

PragTob avatar PragTob commented on May 19, 2024

I asked on elixir-lang-talk once about deactivating GC for benchmarks and it was basically a rush of "why would you do this" (I explained that) "you shouldn't do that" and "doing that is stupid" until I got an actual useful answer.

From that I took the impression that it's not something the Erlang/Elixir community likes to look at.

Also we should probably warn if running in parallel because that'd mess up measurements.

Plus, I guess that it's not always the same. Some people might gather some state in processes so it might increase in total. Statistics on memory seem fine.

Seeing how much of an impact memory measurement has on how many benchmarks can be run (i.e. benchee overhead) to see if it's a default enabled or disabled.

Thanks for looking into this! As always, best small chunks first :)

from benchee.

devonestes avatar devonestes commented on May 19, 2024

Oh, yeah, there are going to be some really complicated edge cases around this. Can't turn off GC + unlimited concurrency = fun! I'll start small for sure 😄

from benchee.

PragTob avatar PragTob commented on May 19, 2024

Imo it's fine to just spit out a out and annoying warning when parallelism is activated while memory should be measured. If we can't reasonably fix something, then don't :D

from benchee.

michalmuskala avatar michalmuskala commented on May 19, 2024

Another thing to consider is that, if we're measuring something like Ecto (and I have some simple benches), there are multiple processes involved. How do we measure memory use in such a situation?

from benchee.

PragTob avatar PragTob commented on May 19, 2024

@michalmuskala great point :| I don't know. Can you get the grand sum of a process and all child processes or whatever? Or maybe really just whole erlang vm in those cases and make it configurable per benchmark if you wanna use the one of the process itself or the whole VM

from benchee.

michalmuskala avatar michalmuskala commented on May 19, 2024

What if there was a setting like track_processes: [pid, pid, pid] or something? Then we could use Process.info & stuff to also gather data about those processes (memory, reductions, etc).

from benchee.

PragTob avatar PragTob commented on May 19, 2024

Only helps if we know the processes upfront - which probably isn't always the case. It's well worth considering I think :)

from benchee.

michalmuskala avatar michalmuskala commented on May 19, 2024

In case of ecto, we do, so it would solve it for me 😆 But yes, it's not 100% foolproof. Maybe we could leverage tracing to gather intermediary processes started by the runner process (and those processes that are additionally tracked)? But tracing can affect performance, so it can be weird.

from benchee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.