Hi Andrew, currently I'm in the process of performance tweaking quan

Tweak performance of quantify lib about quantify HOT 6 OPEN

spatchcock commented on August 25, 2024

Tweak performance of quantify lib

from quantify.

Comments (6)

cblock commented on August 25, 2024

Just a thought: Why not defining all units an app may use in a yml file, which gets parsed exactly once during startup. The parser then creates Unit singletons for all defined units...

from quantify.

cblock commented on August 25, 2024

Just finished refactoring Dimensions class with its notoriously slow #to_hash method: https://github.com/cblock/quantify/

This brings down spec runtime from ~16 secs to ~5 secs already. The benchmark is even better (Quantity#to_si uses #to_hash internally):

       user     system      total        real
[ Quantity#to_si ]
  11.860000   0.100000  11.960000 ( 11.969939)

         user     system      total        real
[ Quantity#to_si ]
   0.970000   0.000000   0.970000 (  0.972983)

Let me know what you think. Should I send you a pull request for the main branch?

from quantify.

spatchcock commented on August 25, 2024

Hey Carsten. Sounds great, certainly send a pull request.

I like the idea of specifying units once on initialization. That was my intention really when using the config files, although given the 'default' configuration (currently, all units), a custom configuration might involve dumping a load of previously defined units which I don't think is very nice. Trouble is, there has to be some default units (and dimensions), for example, the SI units against which all other units are referenced.

I think I would prefer a system where a smaller set of required units is always instantiated and others can be pulled in if desired, perhaps in a modular fashion (require 'cgs', require 'imperial', etc.), or individually defined. YAML format sounds fine. Currently, you can override the default config by running additional config in an initializer (or elsewhere). I've basically never been able to decide on the best solution or anticipate what would suit others best (convention or configuration?).

The Dimension#to_hash method is mostly used in operations (multiply, divide, etc) between two dimensions. I think in most cases it is passed in to the private #enumerate_base_quantities method which alters the base quantities on the receiver. The memoized hash would probably need refreshing at the end of that method.

I have thought about using singletons to represent units but never got to a satisfactory point. At present, an instance of each method is held in the Unit.units 'module' variable and then cloned anytime it is requested. So we get many instances of each unit existing. Are you suggesting that we have only one (singleton) instance of, say, a kg unit, which is used in all operations involving a kg? That would be nice, but I think there would be complications where compound units and unit operations come in to play, so it might take a bit of work.

There is a lot of fairly messy code in Unit.rb which provides the versatile functionality for accessing units via names, pluralised names and symbols for single and compound units and units within strings. Unit::escape_multi_word_units is part of this functionality. It basically allow multi word units to be recognised within larger strings. I have never been very happy with most of this parsing and regex code. Although it works fairly well, it just seems way to messy. One implication of removing the ability to configure units at runtime would be that a number of these methods (which basically build lists and regexen) would be static and could therefore be memoized.

Cheers

Andrew

from quantify.

cblock commented on August 25, 2024

Hi Andrew,

refactoring dimensions.rb was quite a challenge for me because there are so many back and forth references between Unit, BaseUnit, Dimensions etc. With that experience in mind and after having read your comments above I would suggest the following:

1.) I like your idea of always loading a minimal set of default units (convention) and optionally adding more units through configuration. I guess the minimal set should be unity + si_units, right? Then lets make these ones (immutable) singletons.

2.) Make all units immutable. A unit instance should never change its dimensions during its life.
If we multiply two units the result should be a new (immutable) unit and not an updated original unit. That shouldn't be too hard to realize as you already have this nice concept of compound units and you mostly clone the units right now already. Note: Consolidating units (e.g. m^3/m^2) may then result in a reference to the original base unit instance again. I don't see any reason why this could be a problem.

3.) Rethink some of the class concepts: For example, Dimensions.rb mixes the concept of a single dimension with that of a set of dimensions. Breaking this down into a Dimension.rb and a Dimensions.rb would make both comprehension and testing easier for outsiders like me. Another thing could be the extraction of unit name parsing into a parser class on its own. This would concentrate all parsing code into one class and make Unit.rb more lightweight as it then only has to represent 'a unit'.

For my current web project (where quantify_mongoid is used) I suffer heavily from Unit.for() being really, really slow. The scenario is a batch import of thousands of records, each holding several quantities. To ensure that all imported quantities come with valid units, I have a Unit.for() call for each entry. My idea right now is to speed things up using (cached) Unit#unit_label_regex, Unit#unit_symbol_regex etc. If that's not enough I will digg into Unit.rb next to speed it up too.

Cheers,
Carsten

from quantify.

cblock commented on August 25, 2024

One more thing that comes to my mind: If there's a UnitParser class this class could also maintain a cache of recently parsed units. Like this, complex unit parsing (think 'm² kg/s³ A') is costly only once.

from quantify.

cblock commented on August 25, 2024

And another thing:
Why not bootstrap quantify lib through a configurator class, This is responsible for reading (yml) config and to instantiate the system of known units. After having initialized all units it then initializes (resets) the UnitParser, which triggers (re-)building of the complete set of regexes required to validate and parse units. Like this building regexes (currently one of the most expensive operations - see https://github.com/cblock/quantify/blob/f99b93128ec38ab99c87b25cd432c3fb6b80c1e3/perf/quantity_to_si.profile.pdf) is done only once during (re-) initialization.

from quantify.

Tweak performance of quantify lib about quantify HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent