Giter Club home page Giter Club logo

cachew's Introduction

What is Cachew?

TLDR: cachew lets you cache function calls into an sqlite database on your disk in a matter of single decorator (similar to functools.lru_cache). The difference from functools.lru_cache is that cached data is persisted between program runs, so next time you call your function, it will only be a matter of reading from the cache. Cache is invalidated automatically if your function's arguments change, so you don't have to think about maintaining it.

In order to be cacheable, your function needs to return a simple data type, or an Iterator over such types.

A simple type is defined as:

That allows to automatically infer schema from type hints (PEP 526) and not think about serializing/deserializing. Thanks to type hints, you don't need to annotate your classes with any special decorators, inherit from some special base classes, etc., as it's often the case for serialization libraries.

Motivation

I often find myself processing big chunks of data, merging data together, computing some aggregates on it or extracting few bits I'm interested at. While I'm trying to utilize REPL as much as I can, some things are still fragile and often you just have to rerun the whole thing in the process of development. This can be frustrating if data parsing and processing takes seconds, let alone minutes in some cases.

Conventional way of dealing with it is serializing results along with some sort of hash (e.g. md5) of input files, comparing on the next run and returning cached data if nothing changed.

Simple as it sounds, it is pretty tedious to do every time you need to memorize some data, contaminates your code with routine and distracts you from your main task.

Examples

Processing Wikipedia

Imagine you're working on a data analysis pipeline for some huge dataset, say, extracting urls and their titles from Wikipedia archive. Parsing it (extract_links function) takes hours, however, as long as the archive is same you will always get same results. So it would be nice to be able to cache the results somehow.

With this library your can achieve it through single @cachew decorator.

>>> from typing import NamedTuple, Iterator
>>> class Link(NamedTuple):
...     url : str
...     text: str
...
>>> @cachew
... def extract_links(archive_path: str) -> Iterator[Link]:
...     for i in range(5):
...         # simulate slow IO
...         # this function runs for five seconds for the purpose of demonstration, but realistically it might take hours
...         import time; time.sleep(1)
...         yield Link(url=f'http://link{i}.org', text=f'text {i}')
...
>>> list(extract_links(archive_path='wikipedia_20190830.zip')) # that would take about 5 seconds on first run
[Link(url='http://link0.org', text='text 0'), Link(url='http://link1.org', text='text 1'), Link(url='http://link2.org', text='text 2'), Link(url='http://link3.org', text='text 3'), Link(url='http://link4.org', text='text 4')]

>>> from timeit import Timer
>>> res = Timer(lambda: list(extract_links(archive_path='wikipedia_20190830.zip'))).timeit(number=1)
... # second run is cached, so should take less time
>>> print(f"call took {int(res)} seconds")
call took 0 seconds

>>> res = Timer(lambda: list(extract_links(archive_path='wikipedia_20200101.zip'))).timeit(number=1)
... # now file has changed, so the cache will be discarded
>>> print(f"call took {int(res)} seconds")
call took 5 seconds

When you call extract_links with the same archive, you start getting results in a matter of milliseconds, as fast as sqlite reads it.

When you use newer archive, archive_path changes, which will make cachew invalidate old cache and recompute it, so you don't need to think about maintaining it separately.

Incremental data exports

This is my most common usecase of cachew, which I'll illustrate with example.

I'm using an environment sensor to log stats about temperature and humidity. Data is synchronized via bluetooth in the sqlite database, which is easy to access. However sensor has limited memory (e.g. 1000 latest measurements). That means that I end up with a new database every few days, each of them containing only a slice of data I need, e.g.:

...
20190715100026.db
20190716100138.db
20190717101651.db
20190718100118.db
20190719100701.db
...

To access all of historic temperature data, I have two options:

  • Go through all the data chunks every time I wan to access them and 'merge' into a unified stream of measurements, e.g. something like:

    def measurements(chunks: List[Path]) -> Iterator[Measurement]:
        for chunk in chunks:
            # read measurements from 'chunk' and yield unseen ones
    

    This is very easy, but slow and you waste CPU for no reason every time you need data.

  • Keep a 'master' database and write code to merge chunks in it.

    This is very efficient, but tedious:

    • requires serializing/deserializing data -- boilerplate
    • requires manually managing sqlite database -- error prone, hard to get right every time
    • requires careful scheduling, ideally you want to access new data without having to refresh cache

Cachew gives the best of two worlds and makes it both easy and efficient. The only thing you have to do is to decorate your function:

@cachew      
def measurements(chunks: List[Path]) -> Iterator[Measurement]:
    # ...
  • as long as chunks stay same, data stays same so you always read from sqlite cache which is very fast

  • you don't need to maintain the database, cache is automatically refreshed when chunks change (i.e. you got new data)

    All the complexity of handling database is hidden in cachew implementation.

How it works

  • first your objects get converted into a simpler JSON-like representation
  • after that, they are mapped into byte blobs via orjson.

When the function is called, cachew computes the hash of your function's arguments and compares it against the previously stored hash value.

  • If they match, it would deserialize and yield whatever is stored in the cache database
  • If the hash mismatches, the original function is called and new data is stored along with the new hash

Features

Performance

Updating cache takes certain overhead, but that would depend on how complicated your datatype in the first place, so I'd suggest measuring if you're not sure.

During reading cache all that happens is reading blobls from sqlite/decoding as JSON, and mapping them onto your target datatype, so the overhead depends on each of these steps.

It would almost certainly make your program faster if your computations take more than several seconds.

You can find some of my performance tests in benchmarks/ dir, and the tests themselves in src/cachew/tests/marshall.py.

Using

See docstring for up-to-date documentation on parameters and return types. You can also use extensive unit tests as a reference.

Some useful (but optional) arguments of @cachew decorator:

  • cache_path can be a directory, or a callable that returns a path and depends on function's arguments.

    By default, settings.DEFAULT_CACHEW_DIR is used.

  • depends_on is a function which determines whether your inputs have changed, and the cache needs to be invalidated.

    By default it just uses string representation of the arguments, you can also specify a custom callable.

    For instance, it can be used to discard cache if the input file was modified.

  • cls is the type that would be serialized.

    By default, it is inferred from return type annotations, but can be specified explicitly if you don't control the code you want to cache.

Installing

Package is available on pypi.

pip3 install --user cachew

Developing

I'm using tox to run tests, and Github Actions for CI.

Implementation

  • why NamedTuples and dataclasses?

    NamedTuple and dataclass provide a very straightforward and self documenting way to represent data in Python. Very compact syntax makes it extremely convenient even for one-off means of communicating between couple of functions.

    If you want to find out more why you should use more dataclasses in your code I suggest these links:

  • why not pandas.DataFrame?

    DataFrames are great and can be serialised to csv or pickled. They are good to have as one of the ways you can interface with your data, however hardly convenient to think about it abstractly due to their dynamic nature. They also can't be nested.

  • why not ORM?

    ORMs tend to be pretty invasive, which might complicate your scripts or even ruin performance. It's also somewhat an overkill for such a specific purpose.

    • E.g. SQLAlchemy requires you using custom sqlalchemy specific types and inheriting a base class. Also it doesn't support nested types.
  • why not pickle or marshmallow or pydantic?

    Pickling is kinda heavyweigh for plain data class, it's slower just using JSON. Lastly, it can only be loaded via Python, whereas JSON + sqlite has numerous bindings and tools to explore and interface.

    Marshmallow is a common way to map data into db-friendly format, but it requires explicit schema which is an overhead when you have it already in the form of type annotations. I've looked at existing projects to utilize type annotations, but didn't find them covering all I wanted:

    I wrote up an extensive review of alternatives I considered: see doc/serialization.org. So far looks like only cattrs comes somewhere close to the feature set I need, but still not quite.

  • why sqlite database for storage?

    It's pretty efficient and iterables (i.e. sequences) map onto database rows in a very straightforward manner, plus we get some concurrency guarantees.

    There is also a somewhat experimental backend which uses a simple file (jsonl-like) for storage, you can use it via @cache(backend='file'), or via settings.DEFAULT_BACKEND. It's slightly faster than sqlite judging by benchmarks, but unless you're caching millions of items this shouldn't really be noticeable.

    It would also be interesting to experiment with in-RAM storages.

    I had a go at Redis as well, but performance for writing to cache was pretty bad. That said it could still be interesting for distributed caching if you don't care too much about performance.

Tips and tricks

Optional dependency

You can benefit from cachew even if you don't want to bloat your app's dependencies. Just use the following snippet:

def mcachew(*args, **kwargs):
    """
    Stands for 'Maybe cachew'.
    Defensive wrapper around @cachew to make it an optional dependency.
    """
    try:
        import cachew
    except ModuleNotFoundError:
        import warnings

        warnings.warn('cachew library not found. You might want to install it to speed things up. See https://github.com/karlicoss/cachew')
        return lambda orig_func: orig_func
    else:
        return cachew.cachew(*args, **kwargs)

Now you can use @mcachew in place of @cachew, and be certain things don't break if cachew is missing.

Settings

cachew.settings exposes some parameters that allow you to control cachew behaviour:

  • ENABLE: set to False if you want to disable caching for without removing the decorators (useful for testing and debugging). You can also use cachew.extra.disabled_cachew context manager to do it temporarily.
  • DEFAULT_CACHEW_DIR: override to set a different base directory. The default is the "user cache directory" (see appdirs docs).
  • THROW_ON_ERROR: by default, cachew is defensive and simply attemps to cause the original function on caching issues. Set to True to catch errors earlier.
  • DEFAULT_BACKEND: currently supported are sqlite and file (file is somewhat experimental, although should work too).

Updating this readme

This is a literate readme, implemented as a Jupiter notebook: README.ipynb. To update the (autogenerated) README.md, use generate-readme script.

cachew's People

Contributors

karlicoss avatar seanbreckenridge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cachew's Issues

Doesn't serialize custom types in list/dict

Only recently realized that I could put Dict/List items as values on a NamedTuple cached by cachew, previously was doing some weird stuff

So, in the process of switching more things to support cachew, I ran into an issue here, when one of the items in the List being serialized by cachew couldn't be converted to json

from datetime import datetime
from typing import List, NamedTuple, Iterator
from cachew import cachew

class action(NamedTuple):
    dt: datetime
    val: int


class wrapped(NamedTuple):
    actions: List[action]  # list causes this to be serialized with json.dumps
    val: str


@cachew
def values() -> Iterator[wrapped]:
    yield wrapped(actions=[action(datetime.now(), 5)], val="something")

list(values())
list(values())

generates quite the error:

cachew: error while setting up cache, falling back to non-cached version
(builtins.TypeError) Object of type datetime is not JSON serializable
[SQL: INSERT INTO "table" (actions, val) VALUES (?, ?)]
Traceback (most recent call last):
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 839, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/type_api.py", line 1232, in process
    return process_param(value, dialect)
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 163, in process_bind_param
    return json.dumps(value)
  File "/usr/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 1006, in cachew_wrapper
    flush()
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 993, in flush
    conn.execute(values_table.insert().values(chunk))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1206, in _execute_context
    self._handle_dbapi_exception(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 839, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/type_api.py", line 1232, in process
    return process_param(value, dialect)
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 163, in process_bind_param
    return json.dumps(value)
  File "/usr/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
sqlalchemy.exc.StatementError: (builtins.TypeError) Object of type datetime is not JSON serializable
[SQL: INSERT INTO "table" (actions, val) VALUES (?, ?)]
cachew: error while setting up cache, falling back to non-cached version
(builtins.TypeError) Object of type datetime is not JSON serializable
[SQL: INSERT INTO "table" (actions, val) VALUES (?, ?)]
Traceback (most recent call last):
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 839, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/type_api.py", line 1232, in process
    return process_param(value, dialect)
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 163, in process_bind_param
    return json.dumps(value)
  File "/usr/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 1006, in cachew_wrapper
    flush()
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 993, in flush
    conn.execute(values_table.insert().values(chunk))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1206, in _execute_context
    self._handle_dbapi_exception(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 839, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/home/sean/.local/lib/python3.9/site-packages/sqlalchemy/sql/type_api.py", line 1232, in process
    return process_param(value, dialect)
  File "/home/sean/.local/lib/python3.9/site-packages/cachew/__init__.py", line 163, in process_bind_param
    return json.dumps(value)
  File "/usr/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
sqlalchemy.exc.StatementError: (builtins.TypeError) Object of type datetime is not JSON serializable
[SQL: INSERT INTO "table" (actions, val) VALUES (?, ?)]

I'm not expecting this to be fixed, as I understand the problems with serializing/deserializing from JSON, I was just stuck here for a while as the readme says it should work for lists and datetime, so I wasn't sure where this error why there was an error (its being thrown here)

Makes sense that its using JSON, as otherwise you would have to maintain an intersection table and map ids from another table onto list/dict items, which sounds like a pain

Anyways, as far as any changes, either a better warning message could be raised, and/or the documentation could be updated to better reflect this pitfall, so that no one else runs into the cryptic error in the future?

infer_type does not work for return type Optional[]

It seems that an optional return type does not work as expected. Extending the test case in def test_optional(tdir): results in an error:

    @cachew(tdir)
    def data() -> Optional[Job]:
        return None

Error:

...
        @cachew(tdir)
>       def data() -> Optional[Job]:
test_cachew.py:497:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../__init__.py:694: in <lambda>
    return lambda realf: f(realf, *args, **kwargs)
../__init__.py:789: in cachew
    inferred = infer_type(func)
../__init__.py:667: in infer_type
    if not issubclass(rtype.__origin__, Iterable):
/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/typing.py:835: in __subclasscheck__
    return issubclass(cls, self.__origin__)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'collections.abc.Iterable'>, subclass = typing.Union

    def __subclasscheck__(cls, subclass):
        """Override for issubclass(subclass, cls)."""
>       return _abc_subclasscheck(cls, subclass)
E       TypeError: issubclass() arg 1 must be a class

disable cachew for particular modules

not sure if this is possible without a slight API change (or maybe some sys._get_frame hackery like:

diff --git a/my/core/cachew.py b/my/core/cachew.py
index 7dd62d2..08b7d41 100644
--- a/my/core/cachew.py
+++ b/my/core/cachew.py
@@ -108,6 +108,10 @@ def _mcachew_impl(cache_path=_cache_path_dflt, **kwargs):
         warnings.warn('cachew library not found. You might want to install it to speed things up. See https://github.com/karlicoss/cachew')
         return lambda orig_func: orig_func
     else:
+        # get the actual module name
+        mod = sys._getframe(1).f_locals['__name__']
+        print(f'module: {mod}')
+        # check if its disabled...
         kwargs['cache_path'] = cache_path
         return cachew.cachew(**kwargs)
 

but would be cool to be able to quickly enable/disable cachew for a particular module like done with the LOGGING_LEVEL_ prefixes

can work on this if you're cool with adding it (let me know if you have a better idea...)

Add Redis support

Add Redis support as an alternative to sqlite

This would be a great feature as it will make this solution easier to use in an enterprise production environment as getting a redis instance shared amonst multiple instances of your app is very easy and cost effective to use.

support pathlib.Path

Path is a trivial wrapper around str. I guess generally think of a good way to allow adhoc mapping of simple types.
Perhaps current Exception makes sense.

keep full schema in cachew instead of just outer type

After recent google_takeout_parser update getting this:

...
  File "/home/hpi/.local/lib/python3.12/site-packages/cachew/marshall/cachew.py", line 143, in load
    tidx, val = dct
                └ 62.81359

TypeError: cannot unpack non-iterable float object

Seems that this is because we switched some PlaceVisit fields to Optional -- so now it tries to unpack a float as a Union (which Optional is a special case of).
Normally cachew would invalidate the cache in this case (since we keep the previous schema in the database)

schema = str(self.cls_)

However seems that it is

 "schema": "typing.Union[google_takeout_parser.models.PlaceVisit, Exception]"`

, so if any of fields of PlaceVisit changed, this wouldn't have any impact.

I think we need to dump the datatype recursively in schema (with all field names etc -- doesn't hurt?) instead.

In the meantime for google_takeout_parser should be solvable by bumping google_takeout_version since it's also included in the cache key. cc @seanbreckenridge just in case you encounter this

my.location.google_takeout fails to load from cache

Not sure if you can reproduce and havent tried to make a minimal example yet, but I thought would at least report the error:

Calling hpi doctor -S my.location.google_takeout fails to load from cache?

I tried clearing the entire cachew dir and just restarting from nothing but same issue

$ hpi --debug doctor -S my.location.google_takeout
[DEBUG   2023-09-19 19:16:21,995 my.google.takeout.parser __init__.py:411 ] [my.google.takeout.parser:events] using inferred type multiple typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]
[DEBUG   2023-09-19 19:16:21,998 my.location.google_takeout __init__.py:411 ] [my.location.google_takeout:locations] using inferred type multiple <class 'my.location.common.Location'>
✅ OK  : my.location.google_takeout                        
[DEBUG   2023-09-19 19:16:22,000 my.location.google_takeout __init__.py:602 ] [my.location.google_takeout:locations] using sqlite:/home/sean/.cache/cachew/my.location.google_takeout:locations for cache
[DEBUG   2023-09-19 19:16:22,001 my.location.google_takeout __init__.py:727 ] [my.location.google_takeout:locations] new hash: {"cachew": "0.14.20230920", "schema": "<class 'my.location.common.Location'>", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:16:22,006 my.location.google_takeout __init__.py:733 ] [my.location.google_takeout:locations] old hash: None
[DEBUG   2023-09-19 19:16:22,006 my.location.google_takeout __init__.py:740 ] [my.location.google_takeout:locations] hash mismatch: computing data and writing to db
[DEBUG   2023-09-19 19:16:22,007 my.google.takeout.parser __init__.py:602 ] [my.google.takeout.parser:events] using sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events for cache
[DEBUG   2023-09-19 19:16:22,008 my.google.takeout.parser __init__.py:727 ] [my.google.takeout.parser:events] new hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:16:22,011 my.google.takeout.parser __init__.py:733 ] [my.google.takeout.parser:events] old hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:16:22,011 my.google.takeout.parser __init__.py:736 ] [my.google.takeout.parser:events] hash matched: loading from cache
[INFO    2023-09-19 19:16:22,129 my.google.takeout.parser __init__.py:707 ] [my.google.takeout.parser:events] loading 1040860 objects from cachew (sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events)
[ERROR   2023-09-19 19:16:24,122 my.location.google_takeout __init__.py:282 ] [my.location.google_takeout:locations] error while setting up cache, falling back to non-cached version
[ERROR   2023-09-19 19:16:24,122 my.location.google_takeout __init__.py:283 ] [my.location.google_takeout:locations] shouldn't happen!
Traceback (most recent call last):
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/__init__.py", line 753, in cachew_wrapper
    yield from written_to_cache()
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/__init__.py", line 694, in written_to_cache
    dct = marshall.dump(obj)
          ^^^^^^^^^^^^^^^^^^
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/marshall/cachew.py", line 39, in dump
    return self.schema.dump(obj)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/marshall/cachew.py", line 98, in dump
    return {
           ^
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/marshall/cachew.py", line 102, in <dictcomp>
    k: ks.dump(getattr(obj, k))
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sean/.local/lib/python3.11/site-packages/cachew/marshall/cachew.py", line 137, in dump
    assert False, "shouldn't happen!"
AssertionError: shouldn't happen!
[DEBUG   2023-09-19 19:16:24,123 my.google.takeout.parser __init__.py:602 ] [my.google.takeout.parser:events] using sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events for cache
[DEBUG   2023-09-19 19:16:24,124 my.google.takeout.parser __init__.py:727 ] [my.google.takeout.parser:events] new hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:16:24,126 my.google.takeout.parser __init__.py:733 ] [my.google.takeout.parser:events] old hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:16:24,127 my.google.takeout.parser __init__.py:736 ] [my.google.takeout.parser:events] hash matched: loading from cache
[INFO    2023-09-19 19:16:24,239 my.google.takeout.parser __init__.py:707 ] [my.google.takeout.parser:events] loading 1040860 objects from cachew (sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events)
✅     - stats: {'locations': {'count': 555271, 'last': datetime.datetime(2023, 7, 6, 1, 37, 18, 420000, tzinfo=datetime.timezone.utc)}}
[DEBUG   2023-09-19 19:16:30,617 my.core.structure structure.py:167 ] at exit warning: Found leftover files in temporary directory '[PosixPath('/tmp/HPI-tempdir/tmpam1foziw')]'. this may be because you have multiple hpi processes running -- if so this can be ignored

This is fine though:

$ hpi --debug doctor -S my.google.takeout.parser      
[DEBUG   2023-09-19 19:21:09,560 my.google.takeout.parser __init__.py:411 ] [my.google.takeout.parser:events] using inferred type multiple typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]
✅ OK  : my.google.takeout.parser                          
[DEBUG   2023-09-19 19:21:09,562 my.google.takeout.parser __init__.py:602 ] [my.google.takeout.parser:events] using sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events for cache
[DEBUG   2023-09-19 19:21:09,563 my.google.takeout.parser __init__.py:727 ] [my.google.takeout.parser:events] new hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:21:09,567 my.google.takeout.parser __init__.py:733 ] [my.google.takeout.parser:events] old hash: {"cachew": "0.14.20230920", "schema": "typing.Union[google_takeout_parser.models.Activity, google_takeout_parser.models.LikedYoutubeVideo, google_takeout_parser.models.PlayStoreAppInstall, google_takeout_parser.models.Location, google_takeout_parser.models.ChromeHistory, google_takeout_parser.models.YoutubeComment, google_takeout_parser.models.PlaceVisit, Exception]", "dependencies": "['google_takeout_version: 0.1.3', '/home/sean/data/google_takeout/Takeout-1599315526.zip', '/home/sean/data/google_takeout/Takeout-1599728222.zip', '/home/sean/data/google_takeout/Takeout-1616796262.zip', '/home/sean/data/google_takeout/Takeout-1634828138.zip', '/home/sean/data/google_takeout/Takeout-1644744478.zip', '/home/sean/data/google_takeout/Takeout-1659510101.zip', '/home/sean/data/google_takeout/Takeout-1667366543.zip', '/home/sean/data/google_takeout/Takeout-1674845469.zip', '/home/sean/data/google_takeout/Takeout-1680326990.zip', '/home/sean/data/google_takeout/Takeout-1688611240.zip']"}
[DEBUG   2023-09-19 19:21:09,567 my.google.takeout.parser __init__.py:736 ] [my.google.takeout.parser:events] hash matched: loading from cache
[INFO    2023-09-19 19:21:09,660 my.google.takeout.parser __init__.py:707 ] [my.google.takeout.parser:events] loading 1040860 objects from cachew (sqlite:/home/sean/.cache/cachew/my.google.takeout.parser:events)
✅     - stats: {'events': {'count': 1040860, 'last': datetime.datetime(2020, 1, 4, 7, 24, 19, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.