I've recently started thinking about how I could abstract a technique I've started usi

Self-initializing workflow data,about deanishe/alfred-workflow

Comments (38)

deanishe commented on July 28, 2024

IMO, all those data are actually volatile, cache data. They're a copy of the canonical data, which are held by Zotero/some other app, and are easily regenerated from the canonical data store(s).

If I were you, I'd be using the existing cached_data() API for that.

Is there a property generating decorator?

You probably want to look into type and metaclasses. It's how SQLAlchemy and other ORMs work (dynamically generating classes based on their configuration).

IMO, that's way too complex for the potential benefits. It works well enough for ORMs because SQL databases have limited and clearly defined data types and relationships. The same is not true of Python's native data types.

I don't think you'd be able to build anything that would make implementing your data model noticeably simpler than just implementing your data model directly.

You could perhaps build a metaclass+class that creates objects with properties that are all connected to calls to cached_data(), but in my experience the right way to use cached_data() is to do updates via run_in_background(). That doesn't fit very well with using properties.

from alfred-workflow.

deanishe commented on July 28, 2024

I'm closing this issue seeing as it appears dormant. Feel free to reopen.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Well when I push the beta version of ZotQuery, you can see the solution I made. I use a base class and a decorator. It is completely abstracted, so it could conceivably be added to Alfred Workflow, though it is more complex than any of the API features currently. Anyways, you can comment on the code when I push it to Github.

EDIT: I've just pushed a sample Gist of my setup here. Please do let me know what you think. Regardless of whether this is something that could aid Alfred-Workflow, I'd like to get the code as rock solid as possible.

from alfred-workflow.

deanishe commented on July 28, 2024

I don't really understand the purpose of the code.

Why bother serialising the object to disk/Keychain if you can't update the value of the properties? The values loaded from disk/Keychain will always be the same as the ones hard-coded in the class definition.

What am I missing?

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

The MyClass class is a sample. In my real code, the properties are searching for data/preferences. That is, the possible results are dynamic. The point of the example code in the gist is to show the primary benefits: syntactical simplicity and speed. The time delay in the second property is meant to mimic the processing time of some code. On run one, it takes 2.2 s to print the properties. On run two, it takes 0.1 s. For any data that you generate and access more than twice, this speed increase is worth considering.

from alfred-workflow.

deanishe commented on July 28, 2024

How does it handle updates to the underlying data?

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

If by "updates to the underlying data" you mean setting the property to another value, then I have just finished a (somewhat hacky) solution. The basic idea was to use this for data that wouldn't change. That is, it would only have 3 states: [1] not processed (the code to find the data hasn't been run), [2] processed but no value found (e.g. looking up a preference in Zotero that hasn't been set yet), and [3] processed, found, stored. The code was originally written only to handle these three states. However, I have expanded it to allow for some mutable data. This, however, is hacky and particular to ZotQuery, so I think we leave this closed and let it pass. I have code that works really well for exactly what I need. There is no need to push it any further (tho I will update the Gist here momentarily so that you can at least read the final expanded version).

from alfred-workflow.

deanishe commented on July 28, 2024

In principle, I like the idea of a smart data model that automatically persists itself. The trick is finding a model that is both powerful and simple.

I think the solution is almost certainly using metaclasses (it's how ORMs do all their magic).

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Well, hopefully I can get back to this at some point. For now, my code works, so I need to ship.

from alfred-workflow.

deanishe commented on July 28, 2024

That's the spirit!

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

In working on another project, I came back upon the need for this functionality, but didn't like my old implementation. Having a base class and a decorator is too much work and too finicky to implement. I also think the API should be more narrow in functionality.

As I see it, the primary use case is to persist the data returned by a function/method that the workflow author deems to be generally non-volatile (by this I mean, the data is unlikely to change if the function/method were run again) and/or time-intensive to generate (by this I mean, the function/method takes a non-trivial amount of time to run). Obviously, any data that is both time-intensive to generate and generally non-volatile would be helped by persistence.

I propose, at least at first, a very simple API to handle only this scenario (that is, not a full-fledged smart data model). It is comprised of two functions: @persist and refresh() (the naming might not be perfect, I've considered @persist, @persistent, @cache, and @cached for the decorator name; refresh(), reset(), and reload() for the function name).

You simply decorate a method or a function with the @persist decorator and it uses Workflow.cached_data() under the covers to cache the data if it doesn't exist, or read read the data if it does:

def persist(func):
    @functools.wraps(func)
    def func_wrapper(*args, **kwargs):
        # Get dictionary of args and kwargs
        args_map = inspect.getcallargs(func, *args, **kwargs) if args or kwargs else {}
        # If `self` one of the args, `func` is a bound method
        if 'self' in args_map:
            class_name = args_map['self'].__class__.__name__
        # Else, `func` is simply an unbound function
        else:
            class_name = ''
        func_name = func.__name__
        cache_name = ':'.join([class_name, func_name])
        return WF.cached_data(cache_name,
                              lambda: func(*args, **kwargs),
                              max_age=0)
    return func_wrapper

So, on first run, the function/method is run and the return value is saved to a cache file. On any subsequent run, the value is simply read from the cache value and returned.

If the workflow author wants, at any point to force the function/method to re-run (that is, regenerate the data and not read the old data from the cache), he can use the refresh() function.

def refresh(func, *args, **kwargs):
    try:
        class_name = func.__self__.__class__.__name__
    except AttributeError:
        class_name = ''
    meth_name = func.__name__
    cache_name = ':'.join([class_name, meth_name])
    WF.clear_cache(lambda f: cache_name in f)
    return func(*args, **kwargs)

This simply deletes the old cache file and reruns the function (thus creating a new cache file with the returned value).

Thoughts? To specific to be helpful in the library? Don't like the implementation? Don't see it as generally helpful?

from alfred-workflow.

deanishe commented on July 28, 2024

I don't really understand what advantage this has over wf.cached_data() and wf.cache_data(data=None).

I see a couple of minor issues with the implementation:

It's better not to use : in the filenames, as colon is the classic Mac path separator. A colon will be shown as : in a shell, but as / in Finder. It shouldn't cause any issues, but can be confusing.
Your implementation relies on they're being a global Workflow instance named WF. If it needs a Workflow instance, it'd probably be safer implemented as methods on the Workflow class. That would seriously restrict its usefulness, however.

It might be worth considering for v2 (where the caching API won't be tied to a Workflow instance) as an alternative for wf.cached_data() etc., but the decorator would have to support a max_age argument.

FWIW, this technique is often called "memoization".

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

The primary advantage, to my mind, is code readability. I have so many properties in classes that serve as the information backend for a workflow that rarely change, but require a good bit of computing (whether that be measured by computing time or lines of code). If the logic requires numerous lines of code, in order to use the simply wf.cached_data() design, I would have something like this:

@property
def first_property()
    def wrapper():
        if some_conditional:
            do some logic
        elif other_conditional:
            do other logic
        else
            do default logic

    return wf.cached_data('first_property', wrapper, max_age=0)

My thinking was to make the property method itself the "wrapper" and use the decorator to reduce complexity and actually increase semantic clarity (e.g. saying this property or function should be "persisted" is shorter and clearer than writing an internal wrapper function and passing that to wf.cached_data()).

Other points:

The delimiter can be anything.
It would certainly be more useful in V2 when cached_data() is no longer tied to a Workflow instance, I agree.

from alfred-workflow.

deanishe commented on July 28, 2024

In that case, I guess it would save a few lines of code per property.

Obviously, the delimiter can be anything. I was just pointing out that colon is a bad choice.

As noted, I don't think there's a place for it in v1 of Alfred-Workflow due its needing a global Workflow instance with a specific name. Too brittle, imo.

Having a @cached decorator that takes optional max_age and name arguments might be interesting for v2, however.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Will keep in mind for v2 then.

from alfred-workflow.

deanishe commented on July 28, 2024

The more I think about it, the more I like the idea of a @cached decorator.

The default cache name would have to be wisely chosen to prevent clashes if a user calls every data retrieval function get_data(). Probably based on filepaths, as I think module.func_name won't work, as the current script is always called __main__.

The question is how to explicitly delete a cache. Perhaps that would only be possible for caches with explicit names, e.g. @cached(name='stuff').

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

We could simply use the function name, filename, and line number to create the cache name. Inside of a decorator, you can get this info via the inspect module with attributes like this:

func.func_code.co_firstlineno
func.func_code.co_filename
# could get just filename via os.path.basename(func.func_code.co_filename)
func.func_code.co_name

So, we could do something like <file>.<function>.<line #> as the cache name. This would ensure unique names for all functions.

I would then re-recommend having some function like refresh() that takes the function object and deletes the old cache before re-running the decorated function.

One trick I've discovered in playing around with this though, is that any refresh() like function that accepts a decorated function has to do a little bit of work to get the correct line number. If you were to print(func.func_code.co_firstlineno) from inside refresh(), you would get the line number of the wrapper inside the decorator. In order to get the line-number of the function, you need to use next(c.cell_contents.func_code.co_firstlineno for c in func.func_closure). You need to use an analogous call to get function name and file name. However, you can get all of that information and it will match.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Here's my old sample code rewritten using this new methodology:

#!/usr/bin/python
# encoding: utf-8
from os.path import basename
import time
import functools
from workflow import Workflow


def refresh(func, *args, **kwargs):
    decorated_func = next(c.cell_contents.func_code
                          for c in func.func_closure)
    file_name = basename(decorated_func.co_filename).replace('.py', '')
    func_name = decorated_func.co_name
    line_num = str(decorated_func.co_firstlineno)
    cache_name = '.'.join([file_name, func_name, line_num])
    WF.clear_cache(lambda f: cache_name in f)
    return func(*args, **kwargs)


def persist(func):
    @functools.wraps(func)
    def func_wrapper(*args, **kwargs):
        file_name = basename(func.func_code.co_filename).replace('.py', '')
        func_name = func.func_code.co_name
        line_num = str(func.func_code.co_firstlineno)
        cache_name = '.'.join([file_name, func_name, line_num])
        return WF.cached_data(cache_name,
                              lambda: func(*args, **kwargs),
                              max_age=0)
    return func_wrapper


class MyClass(object):
    def __init__(self):
        pass

    @persist
    def first_property(self):
        # some code to get data
        return 'this is my first property'

    @persist
    def second_property(self):
        time.sleep(2)
        return 'this is my second property'


@persist
def f1():
    time.sleep(2)
    return 'this is a simple function'


@persist
def f2(x):
    time.sleep(2)
    return 'this is another function. It takes a param: {}'.format(x)


if __name__ == '__main__':
    WF = Workflow()
    m = MyClass()
    print(m.second_property())
    print(refresh(m.second_property))
    print(f2('hello'))
    print(refresh(f2, 'world'))

One thing I've realized is that using the line number will break things, obviously, if the line number changes. This is a problem. We probably shouldn't use the line number. So, the question is, how do we distinguish between functions of the same name?

from alfred-workflow.

deanishe commented on July 28, 2024

Came here to say "no line numbers", so yeah.

If you use filepath.classname.methodname or filepath.funcname, they must be unique.

The refresh() function is pushing quite hard on my "too much magic" limit, truth be told. I think I'd prefer to require named caches for that.

Perhaps if you used a more straightforward option and just added a _cache_name attribute to the decorated function, the code would be much less hairy.

from alfred-workflow.

deanishe commented on July 28, 2024

Also, looking at the code above, it doesn't really fit. I mean, you've decorated a property, which appears to mean it now needs to be called. Is there no way for it to continue to work like a property?

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Yeah, that sample code was written back when I was only wanting to decorate properties. I have since broadened the scope, so I should go and rename those.

I confess to not fully following your logic here:

Perhaps if you used a more straightforward option and just added a _cache_name attribute to the decorated function, the code would be much less hairy.

But I see your point about refresh() being too magical.

from alfred-workflow.

deanishe commented on July 28, 2024

The first decorator (@persist or @cached or whatever) can attach the cache name to the function when it wraps it. There's then no need for any subsequent functions to try to figure out the cache name, as it's stored in an attribute of the function passed to it.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Wow. That actually kinda blew my mind. Once I thought about it for a second, it made total sense, but I've never thought of functions having attributes. That's kinda awesome.

The problem, in this scenario, is that I can't quite see how to set the _cache_name attribute when the function is a bound method. Let me explain, as far as I've been able to gather, getting the name of the class for a method within a decorator on that method is quite hard; primarily because that method has not been bound to the class yet, at the moment the decorator does its thing. The one solution I have found is to use inspect.getcallargs(func, *args, **kwargs) (I found this methodology on this blog post). This will grant you access to the self parameter of a method, from which you can get the class name. So, to focus on this issue directly, you would need a decorator like this:

def cached(name=None):
    def decorator(func):
        file_name = basename(func.func_code.co_filename).replace('.py', '')
        func_name = func.__name__
        def wrapper(*args, **kwargs):
            args_map = inspect.getcallargs(func, *args, **kwargs) or {}
            class_name = args_map['self'].__class__.__name__
            cache_name = '.'.join([file_name, class_name, func_name])
            # if user passed in a cache name, use that no matter what
            if name:
                cache_name = name
            return WF.cached_data(cache_name,
                                  lambda: func(*args, **kwargs),
                                  max_age=0)
        return wrapper
    return decorator

You have to be inside the wrapper to have access to the *args and **kwargs needed for inspect.getcallargs(). Now, the problem with setting an attribute on the function is that you need to set the attribute on the wrapper() function, since this is replacing the wrapped function (Martijn Pieters' SO answer was quite clarifying to me on this point). But, how does the decorator() get access to the wrapper()s cache_name var? In order to set the attribute, you would need to have wrapper._cache_name = cache_name right before return wrapper, but at that point, cache_name is undefined.

So, either you know more and can solve this, or we need to drop the class name from the _cache_name generating logic (would this allow for name collisions?)

from alfred-workflow.

deanishe commented on July 28, 2024

What about a class-based decorator? You can use instance variables to store cache_name.

Alternatively, use a non-local variable via the dict trick (you can alter non-local vars, just not assign them):

def decorator(func):
    d = {}
    def wrapper(*args, **kwargs):
        d['cache_name'] = …
        …
    wrapper.cache_name = d['cache_name']
    return wrapper

To be perfectly honest, I think I'd still prefer a decorator that requires an explicit name. Much less magical, and if anything goes wrong, it's the user's fault 😉

from alfred-workflow.

deanishe commented on July 28, 2024

Also FWIW, if you're after caching a whole buttload of data, it might be smarter to have specific data classes and use a metaclass to take care of the caching. That's how most ORMs do their thing, and I sense you like magic.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Well, I'm all for stronger param enforcement. In fact, it might be a little prone to user-breaking if we only have one param (name) and it's optional, because using the decorator like this:

@cached
def function():
    pass

will result in some AttributeError somewhere along the stack, since you would need to call the decorator, even without the optional name param, as @cached().

Plus, that requiring the name param more closely mirrors the cached_data() functionality that this is essentially wrapping.

Also, I strive for a nice balance of clarity and magic. I've just read too many fantasy books, so I tend toward the magic end of that balancing act :)

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Would this work as the decorator then?

def cached(name):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            return WF.cached_data(name,
                                  lambda: func(*args, **kwargs),
                                  max_age=0)
        wrapper._cache_name = name
        return wrapper
    return decorator

And this would be the refresh() method:

def refresh(func, *args, **kwargs):
    WF.clear_cache(lambda f: func._cache_name in f)
    return func(*args, **kwargs)

Much simpler, much cleaner code.

I guess we could also allow for max_age to be set via the decorator, using an optional param.

from alfred-workflow.

deanishe commented on July 28, 2024

max_age has to be an option, or it's a functional regression vs v1.

To be honest, I'd prefer a clear(cache_name) to refresh(). It's more flexible (you can clear a cache without also updating it). This already exists with wf.cache_data(cache_name, None), however.

Calling refresh(func) just looks weird to me. Maybe it's the name.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Ok. So if you want to re-generate any data via function/method, you simply do

clear(cache_name)
function()

with the explicit two-liner?

I can get down with that.

So that would be:

def clear(cache_name):
    WF.clear_cache(lambda f: cache_name in f)

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Should we change the decorator name to @cached_as(cache_name) to make it more explicit?

from alfred-workflow.

deanishe commented on July 28, 2024

Theoretically, it would be wf.cache_data(name, None) to clear one specific cache, but we're talking about v2 and the caching API isn't done yet (I'm stuck on the implementation of the serialisers, which has to come first).

I think @cached('blah') is clear enough. I would associate *_as() with conversion, personally.

BTW, have you tried out the Dash docset for AW? The docs are going to need a new theme to work properly with Dash, but I think a Dash docset would be a cool thing to have.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

I actually just redownloaded Dash on my new work computer two days ago, and got AW docs in there yesterday. You are right tho, having a Dash docset is great. I'm starting to get more consistent with heading there first, tho my old SO habits do die hard. I like the new logo BTW. Didn't notice when it was added, but I like it. Tho in Dash, given its smaller size, it looks more like a WWI helmet than a butler's hat.

from alfred-workflow.

deanishe commented on July 28, 2024

Also, would a corresponding @stored(name, format=?) be necessary? I suppose it should, all things considered.

Perhaps the two could be combined by having a volatile=False option. Though "volatile" has always struck me as not particularly descriptive.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

Since it would be trivial to implement, and some asshole like me is bound to come demanding it 😉, it's probably worth putting @stored(name, format) in v2. I think keeping them separate is probably best:

It mirrors the current API, which most workflow authors will probably already be familiar with
@cached() doesn't allow for changing the serializer via param, like @stored() does.

from alfred-workflow.

deanishe commented on July 28, 2024

And after 30 seconds' more thought, sod it.

@cached() should just be a convenience for the simplest cases. It'd be no use, for example, for caching data from the same function under different names.

from alfred-workflow.

deanishe commented on July 28, 2024

Demanding people like you will just have to write their own wrapper functions 😉

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on July 28, 2024

👍

from alfred-workflow.

deanishe commented on July 28, 2024

I like the new logo BTW. Didn't notice when it was added, but I like it. Tho in Dash, given its smaller size, it looks more like a WWI helmet than a butler's hat.

I added the logo while trying to build a Dash docset. It definitely doesn't look so good in Dash, but it's the best I can do. I tried a bunch of things, and this looked the least bad in Dash.

My normal strategy with icons is to hope that Jono takes a shine to the workflow and whips up his own icon for it, and then sends it to me, the talented bastard.

from alfred-workflow.

Self-initializing workflow data about alfred-workflow HOT 38 CLOSED

Comments (38)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent