Giter Club home page Giter Club logo

gokart's Introduction

gokart

Test Python Versions

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.

Documentation for the latest release is hosted on readthedocs.

About gokart

Here are some good things about gokart.

  • The following meta data for each Task is stored separately in a pkl file with hash value
    • task output data
    • imported all module versions
    • task processing time
    • random seed in task
    • displayed log
    • all parameters set as class variables in the task
  • Automatically rerun the pipeline if parameters of Tasks are changed.
  • Support GCS and S3 as a data store for intermediate results of Tasks in the pipeline.
  • The above output is exchanged between tasks as an intermediate file, which is memory-friendly
  • pandas.DataFrame type and column checking during I/O
  • Directory structure of saved files is automatically determined from structure of script
  • Seeds for numpy and random are automatically fixed
  • Can code while adhering to SOLID principles as much as possible
  • Tasks are locked via redis even if they run in parallel

All the functions above are created for constructing Machine Learning batches. Provides an excellent environment for reproducibility and team development.

Here are some non-goal / downside of the gokart.

  • Batch execution in parallel is supported, but parallel and concurrent execution of task in memory.
  • Gokart is focused on reproducibility. So, I/O and capacity of data storage can become a bottleneck.
  • No support for task visualize.
  • Gokart is not an experiment management tool. The management of the execution result is cut out as Thunderbolt.
  • Gokart does not recommend writing pipelines in toml, yaml, json, and more. Gokart is preferring to write them in Python.

Getting Started

Within the activated Python environment, use the following command to install gokart.

pip install gokart

Quickstart

A minimal gokart tasks looks something like this:

import gokart

class Example(gokart.TaskOnKart):
    def run(self):
        self.dump('Hello, world!')

task = Example()
output = gokart.build(task)
print(output)

gokart.build return the result of dump by gokart.TaskOnKart. The example will output the following.

Hello, world!

This is an introduction to some of the gokart. There are still more useful features.

Please See Documentation .

Have a good gokart life.

Achievements

Gokart is a proven product.

Thanks

gokart is a wrapper for luigi. Thanks to luigi and dependent projects!

gokart's People

Contributors

5n7 avatar argonism avatar dasoran avatar dependabot[bot] avatar dn070017 avatar e-mon avatar enokid avatar hi-king avatar hirosassa avatar hirotosuzuki avatar kitagry avatar kuri8ive avatar ma2gedev avatar mamo3gr avatar maronuu avatar mski-iksm avatar nishiba avatar pn11 avatar ryusuketa avatar saya-kawakami avatar snowhork avatar swen128 avatar tayleruva avatar tkda-h3 avatar ujiuji1259 avatar vaaaaanquish avatar yamasakih avatar yokomotod avatar yukinagae avatar yuta100101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gokart's Issues

slack response don't have response_metadata

I found following error.

slack/slack_api.py", line 37, in _get_channels
    if response['response_metadata']['next_cursor']:
TypeError: 'NoneType' object is not subscriptable

add file load generator

Hi.

I want the following load_generator features.

class Hoge(gokart.TaskOnKart):
    
    ...

    def run(self):
        for x in self.load_generator('piyo'):
            proccess_file(x)

gorkart is the pipeline for data processing. I think load_generator is effective for sequential processing and large file that couldn't ride on memory. How do you think?
Thanks.

serialized_task_definition_check generates different hashes for same class

I found serialized_task_definition_check (introduced by #205) option creates different hashes if a task depends on imported object.

To reproduce the bug, run this scripts many times.

import gokart

from pkg import func


class Example(gokart.TaskOnKart):
    def run(self):
        print(func())
        self.dump('Hello, world!')


gokart.build(Example(serialized_task_definition_check=True)) 

I really apologize for the PR (#205), and if you do not mind, please revert the PR or drop the tag.

Hashing tasks (or any classes) is a harder problem than I thought, because classes depends on some packages and the packages also.
So your workaround with version tags is very reasonable for the issue I mentioned in the PR

I think inspect.getsource can be a solution for this since we do not need serialized task but just hash values.
However, I do not have a good idea to retrieve codes recursively to reflect all the dependencies.

Anyway, the current implementation is invalid and I am sorry again for the inconvenience

AttributeError in _get_module_versions

ERROR: Error in event callback for 'event.core.start'
Traceback (most recent call last):
  File "python3.6/site-packages/luigi-2.8.9-py3.6.egg/luigi/task.py", line 276, in trigger_event
    callback(*args, **kwargs)
  File "python3.6/site-packages/gokart/task.py", line 281, in _dump_module_versions
    self.dump(self._get_module_versions(), self._get_module_versions_target())
  File "python3.6/site-packages/gokart/task.py", line 291, in _get_module_versions
    module_versions.append(f'{x}=={module.__version__.split(" ")[0]}')
AttributeError: 'tuple' object has no attribute 'split'

can't dump json file

self.output().dump({pandas DataFrame object})

The above code raise this error:

ValueError: 'index=False' is only valid when 'orient' is 'split' or 'table'

Redundant warning messages about redis_host, redis_port.

Hi, thank you for this project.

I faced displaying warning messages when running tasks after updating gokart recently.

  • gokart==0.3.23
  • luigi==3.0.2

tasks.py (almost do nothing task)

import gokart
import pandas as pd


class MyTask(gokart.TaskOnKart):
    def output(self):
        return self.make_target('output.csv')

    def run(self):
        df = pd.DataFrame()
        self.dump(df)
$ python3 -m luigi --local-scheduler --module tasks MyTask --log-level=INFO
.../site-packages/luigi/parameter.py:279: UserWarning: Parameter "redis_host" with value "None" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
.../site-packages/luigi/parameter.py:279: UserWarning: Parameter "redis_port" with value "None" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
INFO: Informed scheduler that task   MyTask__99914b932b   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
...
This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

This is caused by using luigi.Paremeter class for redis_host andredis_port, and default value is None.

I think, it is better to use luigi.OptionalParamter when default value is None instead of luigi.Parameter.

In the implementation of the displaying warning, luigi.OptionalParamter has a check for the value being None, but luigi.Paramter does not.

  • luigi.OptionalParamter._warn_on_wrong_param_type

https://github.com/spotify/luigi/blob/7b7ff901480d9a02402037e0cb54e4508b5a94a2/luigi/parameter.py#L338-L343

  • luigi.Paramter._warn_on_wrong_param_type

https://github.com/spotify/luigi/blob/7b7ff901480d9a02402037e0cb54e4508b5a94a2/luigi/parameter.py#L275-L279

dump module version

We need to save the module version for reproducibility.

Some say it can be kept in pipenv, requirements.txt and more.
But It's hard to require all gokart modules.

solution:

  • request Pipfile.lock for all gokart user
  • pip freeze > lib_ver.log
  • lookup hoge.__version__

dealing columns in task.load_data_frame

If param required_columns is None and input is an empty df, task.load_data_frame() returns an empty df with no columns. In some cases, a KeyError may occur in the next task.
So in this case, it should set the original columns as they are.

Feature Request: load_dumps_of_requirements

When I debug TaskOnKart, I often download dumps of requires.
But it's a little bit time consuming to search which dump is actual one.

Assuming below class structure, we need to find actual dumps of TaskFoo and TaskBar to reproduce bugs on local.
But it's hard to find exact dump of TaskFoo from TaskFoo_dkajflksajflasjd, TaskFoo_kfkajfklsajljfalkfa, TaskFoo_gjioaquijoree ...

class TaskA:
   a = Parameter()
   def requires():
        return dict(
             foo=TaskFoo(),
             bar=TaskBar(a=self.a)
        )

Below API might help us rapid debugging of TaskOnKart

def load_dumps_of_requirements(class, task_params_path) -> dict[str, Object]

$ load_dumps_of_requirements(TaskA, 'gs://foo/logs/task_params/TaskA_jfdafj;sajfsa.pkl')
-> {
  'foo': DataFrame(...),
  'bar': DataFrame(...)
}

gokart.build() cannot process task parameter

Task parameters can be set at gokart.run() as following example.

gokart.run(['SampleTask', '--local-scheduler', '--param=hello'])

When using gokart.build() this is not possible, which makes changing parameters bothersome.

Change unique_id when code contents are changed

Currently, when dependent tasks (tasks defined in requires()) and parameters are unchanged, unique_id of task output will be same.
This is confusing since the task will not re-run when only implementation of run() is changed.

#205 tried to solve this problem, however there was a problem and therefore reverted. #207

Default path for make_large_data_frame_target

In the same manner as output()、it might be great if make_large_data_frame_target() can produce default path.

        file_path = self.__module__.replace('.', '/')
        return self.make_large_data_frame_target(os.path.join(file_path, f'{type(self).__name__}.zip'))

RuntimeError: Unfulfilled dependency at run time

Luigi (also gokart) implicitly assumes that the file is output when the Task run is executed, and if it is not, an error Unfulfilled dependency at run time is raised like following sample code.

This behavior is a bit confusing, so how about making sure that the file is output when the run completes?

import gokart
import luigi

class TaskA(gokart.TaskOnKart):
    def run(self):
        pass

class TaskB(gokart.TaskOnKart):
    def requires(self):
        return TaskA()

    def run(self):
        pass

if __name__ == '__main__':
    gokart.run(['TaskB', '--local-scheduler'], set_retcode=True)

error:

Traceback (most recent call last):
  File "/Users/e-mon/.pyenv/versions/3.7.5/lib/python3.7/site-packages/luigi/worker.py", line 176, in run
    raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: TaskA__99914b932b

support requires function

There's behaves differently.

# This is moves normally
class Hoge(gokart.TaskOnKart):
    def requires(self):
        return Piyo()
# parameter cannot be passed.
@require(Piyo)
class Hoge(gokart.TaskOnKart):
    ...

When this issue happened, luigi.configuration.LuigiConfigParser.add_config_path('model.ini') before gokart.run().
It can't use parameter in task Piyo from model.ini.

`FileNotFoundError` is raised when task running with `gokart.build()` failed

The following code with error in run() will exit with FileNotFoundError instead of NotImplementedError.

class Example(gokart.TaskOnKart):        
    def run(self):
        raise NotImplementedError("not implemented")

task = Example(rerun=True)
gokart.build(task)

>>> FileNotFoundError: [Errno 2] No such file or directory: './resources/__main__/Example_8441c59b5ce0113396d53509f19371fb.pkl'

Jupyter Notebooks causes unique_id caching

When using Juptyer Notebooks in combination with gokart.build, I found that self.task_unique_id gets cached unless I restart the kernel or drop the task class from memory.

This is an issue because if a task's dependencies' params are updated without clearing memory, the task will not generate a new task_unique_id (hash_id), meaning it won't run with the changed params.

gokart/gokart/task.py

Lines 261 to 263 in 575207f

def make_unique_id(self):
self.task_unique_id = self.task_unique_id or self._make_hash_id()
return self.task_unique_id

I got around this by forcing the hash_id to be generated each time, but I do not believe this to be ideal.

def make_unique_id(self):    
    return self._make_hash_id() 

Support structured logging

For daily operations on public cloud environment, gokart should support structured logging.
It enables us efficient searching, easy monitoring and alert configuration on logs.
On the other hand, structured logging is not for human, so it needs to switch by an environmental variable or a configuration file.

example: https://github.com/hirosassa/gokart_structured_logging

Support local file and path.

It would be nice to have a fixed local data directory with gokart.
For example,

import os

def get_auxiliary_data_path(file_name: str):
    d = os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir, 'auxiliary'))
    if not os.path.exists(d):
        os.mkdir(d)
    return os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir, 'auxiliary', file_name))
.
├── main.py
├── hoge
│   ├── auxiliary
│   │   ├── train.txt
get_auxiliary_data_path('train.txt')

Splitting CSV into chunked DataFrame

Cannot find how to split a large CSV into chunked dataframe.
I want to make a pipeline loading data (into chunked dataframe), sequencially inferencing, and output data tasks

I found TaskOnKart.load_generator() in document, but I could not understand how to use this method.
My code below did not work well.

  • Python: 3.7.7
  • gokart==0.3.15
import gokart
import luigi
import pandas as pd


class LoadData(gokart.TaskOnKart):
    data_path = 'iris.csv'
    chunksize = 2

    def run(self):
        reader = pd.read_csv(self.data_path, chunksize=self.chunksize)
        self.dump(reader)


class Predict(gokart.TaskOnKart):
    def requires(self):
        return LoadData()
    
    def run(self):
        for df in self.load_generator():
            print(df)
            # Loading some models, predict, output result


if __name__ == "__main__":
    luigi.build([Predict()], local_scheduler=True)

Traceback:

Traceback (most recent call last):
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "sample.py", line 12, in run
    self.dump(reader)
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/gokart/task.py", line 211, in dump
    self._get_output_target(target).dump(obj)
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/gokart/target.py", line 30, in dump
    self._dump(obj)
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/gokart/target.py", line 80, in _dump
    self._processor.dump(obj, f)
  File "/home/ec2-user/.local/share/virtualenvs/sandbox-pUjtOCjt/lib/python3.7/site-packages/gokart/file_processor.py", line 66, in dump
    self._write(pickle.dumps(obj, protocol=4), file)
AttributeError: Can't pickle local object '_make_date_converter.<locals>.converter'

※actually, chunksize will be 10000, 100000, or larger.

Is there a way to do the above?

More parameters are required to reproduce the experiment

Hi. @nishiba @hirosassa

Now, we are writing the parameter log in ~/task_param/* as follows.

 {"from_date": "2019-07-04", "to_date": "2019-07-05", "type": "hoge"}

But, only its own parameters ,so other parameters are required to reproduce the experiment.
We also need the parameters of the task being luigi.task.flatten(self.requires()).

Ideal for example...

 {"from_date": "2019-07-04", "to_date": "2019-07-05", "type": "hoge", "interval": 1, "item_type": "piyo", ...}

We should edit here!
https://github.com/m3dev/gokart/blob/master/gokart/task.py#L247

`python setup.py test` is deprecated

Since CI message says following, we should be moving to tox (but it's not strictly necessary)

WARNING: Testing via this command is deprecated and will be removed in a future version. Users looking for a generic test entry point independent of test runner are encouraged to use tox.

Slack notification is broken

Problem

When I enable the Slack notification feature, it gives TypeError after the completion of tasks.

Steps to reproduce the problem

  1. Get the OAuth token for a Slack App.
  2. Create main.py:
import gokart

class DummyTask(gokart.TaskOnKart):
    pass

if __name__ == "__main__":
    gokart.run()
  1. Create luigi.cfg:
[SlackConfig]
channel = general
to_user = @username
  1. Execute the following commands:
$ export SLACK_TOKEN={your token}
$ python main.py DummyTask --local-scheduler

Traceback

Traceback (most recent call last):
  File "main.py", line 7, in <module>
    gokart.run()
  File "/path/to/gokart/gokart/run.py", line 125, in run
    _try_to_send_event_summary_to_slack(slack_api, event_aggregator, cmdline_args)
  File "/path/to/gokart/gokart/run.py", line 101, in _try_to_send_event_summary_to_slack
    slack_api.send_snippet(comment=comment, title='event.txt', content=content)
  File "/path/to/gokart/gokart/slack/slack_api.py", line 48, in send_snippet
    title=title)
TypeError: api_call() got an unexpected keyword argument 'channels'

Environment

  • Python 3.7.3
  • gokart==0.2.3
  • slackclient==2.1.0

compress log data

Current log:

-TaskA(12)
  -TaskB(34)
    -TaskC(56)
  -TaskB(34)
    -TaskC(56)

Ideal log:

-TaskA(12)
  -TaskB(34)
    -TaskC(56)
  -TaskB(34)
    - ...

[bug] ListTaskInstanceParameter does not move when not specific `requires` method

import gokart

class Piyo(gokart.TaskOnKart):
    def run(self):
        self.dump('piyo')

class Hoge(gokart.TaskOnKart):
    hoge = gokart.ListTaskInstanceParameter()

    def run(self):
        hoge = self.load_data_frame('hoge')


gokart.build(Hoge(hoge=[Piyo(), Piyo()]))
Traceback (most recent call last):
  File "python3.7/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "python3.7/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "hoge.py", line 17, in run
    df = self.load_data_frame('hoge')
  File "python3.7/site-packages/gokart/task.py", line 236, in load_data_frame
    dfs = self.load(target=target)
  File "python3.7/site-packages/gokart/task.py", line 211, in load
    return _load(self._get_input_targets(target))
  File "python3.7/site-packages/gokart/task.py", line 276, in _get_input_targets
    return self.input()[target]
TypeError: list indices must be integers or slices, not str

gokart.build hides Exceptions occured in run()

import gokart
class A(gokart.TaskOnKart):
    def run(self):
        raise Exception()
        self.dump("done")
gokart.build(A())

FileNotFoundError: [Errno 2] No such file or directory: './resources/main/A_c72fa9f02fa5c6f019adbf9fe425c9eb.pkl'

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-34-6bff8168acc9> in <module>()
      5         raise Exception()
      6         self.dump("done")
----> 7 gokart.build(A())

~/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/gokart/build.py in build(task, verbose, return_value, reset_register)
     49     with HideLogger(verbose):
     50         luigi.build([task], local_scheduler=True)
---> 51     return _get_output(task) if return_value else None

~/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/gokart/build.py in _get_output(task)
     31     if type(output) == list:
     32         return [x.load() for x in output]
---> 33     return output.load()
     34 
     35 

~/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/gokart/target.py in load(self)
     26 
     27     def load(self) -> Any:
---> 28         return self.wrap_with_lock(self._load)()
     29 
     30     def dump(self, obj, lock_at_dump: bool = True) -> None:

~/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/gokart/target.py in _load(self)
     93 
     94     def _load(self) -> Any:
---> 95         with self._target.open('r') as f:
     96             return self._processor.load(f)
     97 

~/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/luigi/local_target.py in open(self, mode)
    163 
    164         elif rwmode == 'r':
--> 165             fileobj = FileWrapper(io.BufferedReader(io.FileIO(self.path, mode)))
    166             return self.format.pipe_reader(fileobj)
    167 

FileNotFoundError: [Errno 2] No such file or directory: './resources/__main__/A_c72fa9f02fa5c6f019adbf9fe425c9eb.pkl'

don't know why it fail

Traceback (most recent call last):
  File "main.py", line 11, in <module>
    gokart.run()
  File "/usr/local/lib/python3.6/site-packages/gokart/run.py", line 118, in run
    _try_to_delete_unnecessary_output_file(cmdline_args)
  File "/usr/local/lib/python3.6/site-packages/gokart/run.py", line 62, in _try_to_delete_unnecessary_output_file
    task = cp.get_task_obj()  # type: gokart.TaskOnKart
  File "/usr/local/lib/python3.6/site-packages/luigi/cmdline_parser.py", line 116, in get_task_obj
    return self._get_task_cls()(**self._get_task_kwargs())
  File "/usr/local/lib/python3.6/site-packages/luigi/cmdline_parser.py", line 133, in _get_task_kwargs
    res.update(((param_name, param_obj.parse(attr)),))
  File "/usr/local/lib/python3.6/site-packages/luigi/parameter.py", line 592, in parse
    return int(s)
ValueError: invalid literal for int() with base 10: '2019-10-01'

[future]not specification requires target

class foo(gokart.TaskOnKart):
    target = gokart.TaskInstancePrameter()

    def run(self):
        self.load_data_frame(required_columns={'foo'})    # AttributeError: 'dict' object has no attribute 'empty'

When the length of the dict is 1, what to load of require task.

Compression random seed number

Machine learning algorithms use many random numbers.
For example, np.random.seed, random.seed, torch.random, ... , and more.
Preserving that number is very useful in Kaggle and other random seed unsumble-enabled competitions.

I think I can support this on gokart :)

Return of TravisCI

Because change GitHub rate structure, we need to go back to TravisCI from GitHub Actions.

_人人人人人人人人人人人_
> Deadline is 11/13. <
 ̄Y^Y^Y^Y^Y^Y^Y^Y^YY^ ̄

Easy load specific data

class A(gokart.TaskOnKart):
    task = gokart.TaskInstanceParameter()

    def run(self):
        a = self.load()

This is uniquely determined, but Not implemented.

Cannot save model using `make_model_target`.

When I tried to save models via make_model_target , I got the following error message.

FileNotFoundError: [Errno 2] No such file or directory: './resources/tmp/9f6d90b30586cf05d74e9ad34ef98cd5/model.pkl.wv.vectors_ngrams.npy'

self.clone(cls, rerun=True) don't work

class A(gokart.TaskOnKart):
    a = luigi.Parameter(default="a")
    def run(self):
        print(self.__class__)
        print(f"rerun: {self.rerun}")
        print(f"a: {self.a}")
        print(f"fail_on_empty_dump: {self.fail_on_empty_dump}")
        self.dump(pd.DataFrame([1]))

class B(A):
    pass

class C(A):
    pass
    
class D(A):
    def requires(self):
        return [
            A(rerun=False),
            B(),
            self.clone(C),
        ]
gokart.build(D(rerun=True, a="b", fail_on_empty_dump=True), verbose=True)
<class '__main__.C'>
rerun: False
a: b
fail_on_empty_dump: True
<class '__main__.A'>
rerun: False
a: a
fail_on_empty_dump: False
<class '__main__.B'>
rerun: False
a: a
fail_on_empty_dump: False
<class '__main__.D'>
rerun: True
a: b
fail_on_empty_dump: True

can't loaded parameter with correct type from config file

Hi.
I want load parameter from config file.
But, we can't loaded parameter with correct type from config file.

Example script:

# hoge.py
import luigi
import gokart

class Hoge(gokart.TaskOnKart):
    sample_param = luigi.FloatParameter()
    
    ...

    def run(self):
        print(type(self.sample_param))

if __name__ == '__main__':
    luigi.configuration.LuigiConfigParser.add_config_path('sample.ini')
    luigi.run(main_task_cls=Hoge)
# sample.ini
[Hoge]
sample_param=0.1

  
param from cli (This is correct behavior):

$ python hoge.py --sample-param=0.1 --local-scheduler
<class 'float'>

param from sample.ini:

$ python hoge.py --local-scheduler
<class 'str'>

In luigi, It will be parsed even from config.
This issue also happens outside of FloatParameter.
Parameter modules: https://luigi.readthedocs.io/en/stable/api/luigi.parameter.html

Feature request: Set logging level for gokart.build

Allow setting of logging level in gokart.build. I would like to be able to see info logs I've created in my tasks without having to see gokart debug messages. This is useful for Juptyer Notebooks.

gokart/gokart/build.py

Lines 41 to 51 in 575207f

def build(task: TaskOnKart, verbose: bool = False, return_value: bool = True, reset_register: bool = True) -> Optional[Any]:
"""
Run gokart task for local interpreter.
"""
if reset_register:
_reset_register()
read_environ()
check_config()
with HideLogger(verbose):
luigi.build([task], local_scheduler=True)
return _get_output(task) if return_value else None

Feature Request: Tasks hash ids update when run function definition changes

When the code signature of the run function changes in a task, this should update the hash value since the hash is different than its last run.

class TaskA(gokart.TaskOnKart):

    param = luigi.IntParameter()

    def run():
        sum = self. param + 3
        self.dump(sum)
class TaskA(gokart.TaskOnKart):

    param = luigi.IntParameter()

    def run():
        sum = self. param + 2
        self.dump(sum)

These should have different hash ids because the internal logic has changed.

[Bug] Failure when def output returns dict and using build

An attribute error is raised when creating a def output() that returns a dictionary and the task is run with gokart build.

gokart/gokart/build.py", line 35, in _get_output
    return output.load()
AttributeError: 'dict' object has no attribute 'load'
def _get_output(task: TaskOnKart) -> Any:
    output = task.output()
    if type(output) == list:
        return [x.load() for x in output]
    return output.load()

_get_output only checks for a list but output should also support tuple and dict

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.