Giter Club home page Giter Club logo

Comments (7)

dradetsky avatar dradetsky commented on May 30, 2024

Fun fact: I tried adding a method dpath.util.set_one which is like set but breaks immediately after making the first change. For some reason that's not immediately obvious to me, this makes the test job take about 20% longer. Modern processors are weird.

from prance.

jfinkhaeuser avatar jfinkhaeuser commented on May 30, 2024

First off, great analysis! I obviously did not use it on specs this size!
Second, yes, I will accept PRs - for some reason the Windows build seems to fail, but offhand that does not look like it's this PR's fault.
Third, now that I'm aware, I will look into it, yes. But I don't have a ton of time in the near future; I will have more time come April. Let's see what I can do?

from prance.

jfinkhaeuser avatar jfinkhaeuser commented on May 30, 2024

So I was doing some profiling now on the current master, and the bad news is that there are no obvious bottleneck spots. That is, any performance improvement will be a series of smaller changes as opposed to one or two big ones.
The other bad news is that a lot of the performance seems to be eaten up in regular expression matching in the parsers I'm using - that's not something I'm going to change easily. I've no real intention of re-implementing a swagger parser from scratch.
However, I do see a bunch of weirdness coming from the dpath usage. That is, the paths function you identified does seem to cause the most calls back into other code.
I'll look into reducing this usage. It may also be possible to ditch dpath entirely. I used it primarily because it does what I need, but that was an early decision. Now that prance is seeing some stability, it's possible that we can optimize the use-case with some custom code.
Anyway... that's the avenue I'm pursuing at the moment, as time permits.

from prance.

jfinkhaeuser avatar jfinkhaeuser commented on May 30, 2024

It would be interesting to know how you're doing with the current master.

Turns out in order to look something up in a nested structure, dpath would use a generator to generate all possible paths into the structure, and then compare the search path to the generated path. For every lookup. Of course that can grow to be quite expensive for large nested structures.

On master, I've replaced this with a simple pathed getter and setter implementation, both of which work recursively and consequently have their own potential pitfalls. In my tests, I managed to reduce the run time to somewhere around 80% of the previous implementation. And I was resolving externals hosted on the 'net, so around 30% or so of overall time was spent on I/O anyway.

After this, it seems as if the next big performance eater is actually regular expression matching in the swagger parsers prance is using, so not something I can fix particularly easily.

It'd be great to know how you're doing with this - it may be that you're hitting recursion limits and finding new issues.

from prance.

dradetsky avatar dradetsky commented on May 30, 2024

@jfinkhaeuser i actually haven't been using it for the moment. I took some time away from microservice development, for which this was relevant, because of team-wide push on some frontend changes. Hopefully, back to usvc next week. Will report results when I do.

I generally like the sound of this; I imagine that dpath is not great for anything performance-critical because of the high degree of generality it needs (I still use it for merging cfgs, but those are smaller, so should be ok).

I would advise you against any kind of benchmarking involving externals, since it doesn't really tell you what you want to know.

If regexp perf is a serious issue (seems surprising; pyyaml doesn't seem to use it much), I have an idea for this; stay tuned!

from prance.

jfinkhaeuser avatar jfinkhaeuser commented on May 30, 2024

Well, I don't really have data sets for benchmarking. I was just profiling a simple configuration with all kinds of features to see where time was most spent. After network I/O (which is obvious and not necessarily useful for you), it turned out to be dpath's path matching.

from prance.

jfinkhaeuser avatar jfinkhaeuser commented on May 30, 2024

I think I have to close this for the moment.

After some more development, I'm still at the stage where prance's own code is not taking a lot of time, and is in fact pretty conservative about what validate with one of the validation backends.
The further development has made a tradeoff for a number of reasons:

  • On the one hand, referenced objects are deep copied now, which is relatively expensive, but avoids recursive structures (also good for JSON dumps; swagger understands recursion).
  • On the other hand, when loading an external reference, only the referenced subtree is dereferenced. However, thanks to the deep copy, this now happens each time the reference is resolved in order to apply recursion limits properly.

The TL;DR is that for halfway normal use cases, the replacement of dpath that's already been released for a while should solve most of your issues.

I can look into this again if you'd like to provide me with a test data set, and then use that to further profile what's happening. I can also keep that confidential and not make it part of the repository; feel free to send me an email!

from prance.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.