Giter Club home page Giter Club logo

Comments (9)

theofidry avatar theofidry commented on May 26, 2024 1

so obviously one can use static variables to exchange payload between the master and the child processes, which kind of surprised me. Was that supposed to work or is it just a coincidence that the result is correctly? May there be any issues with using static variables?

That sounds weird indeed, are you sure you tested with parallelized process (i.e. at least 2)? Might be that it's shared in-memory otherwise. I'm not super familiar with it but I recall some possible optimization done with "simple" arrays (as just arrays of scalar values or nested arrays of those).


Looking at the code again I would like to change a couple of things, but that's likely going to be in 2.x. So if that's what you need let's open it up then

from console-parallelization.

theofidry avatar theofidry commented on May 26, 2024

The parent handles exceptions: it logs them & reset the container. The rationale is that the errors that are thrown by runSingleCommand should make the process fail.

The reason for that behaviour is that say you are processing 50 items in the segment by batches of 10, and the exception is thrown on the 17th item: if the parent carries on it would skip 33 items.

A solution would be to yield the item number somehow which could be given back to the master process in order to carry on at the 18th item, but it's more complex and so far we did not really need it either.

from console-parallelization.

andreas-gruenwald avatar andreas-gruenwald commented on May 26, 2024

Use-Case

  • 50 items
  • 5 children processes (batches of 10 items)
  • Before the command is terminated, a report should be created, which contains the number of correctly processed and failed IDs (for simplicity, let's assume a CSV export).
  • If one of the child processes terminates for any reason, in my imagination an exception could be thrown, including additional information about the processed / and open items. The master process would then be able to write a report before finally terminating.
  • It's not necessary to let another child process the failed items of the other process.

So the goal is not to proceed in the master and ignore the error, but to allow some additional error handling before finally terminating.

from console-parallelization.

theofidry avatar theofidry commented on May 26, 2024

Can't that be done in the regular runSingleCommand though? runTolerantSingleCommand is within the child process as well, not the parent one

from console-parallelization.

andreas-gruenwald avatar andreas-gruenwald commented on May 26, 2024

I would say that depends on the architecture and on the use case.

In case of managing a CSV summary report, you have to ensure that

  • no simultanous write access to the file on the filesystem takes place,
  • the items are written in their original order. When multiple children processes are running, one worker may be faster than the other, so without additional handling, the sort order may not be the same in the resulting CSV file.

This is a very simplification of our use-cases, but I hope you get the idea.
We want the children processes to perform the "stupid" work, but let the "master process" do the administrative things.

Basically it is not about the error handling, but it is about exchanging some payload between the child processes and the master. The master can then react in runAfterBatch().


I performed another test today and found out that you can use static variables to exchange data between the child processes and the parent processes in order to to so.

...
protected static $FAULTS = [];
...
protected function fetchItems(InputInterface $input): array {
        $list = [];
        for ($i=1; $i <= 10; $i++) {
            $list[] = serialize(['number' => $i, 'error' => rand(1,10) <= 3]);
        }
        return $list;
 }

protected function runSingleCommand(string $item, InputInterface $input, OutputInterface $output): void
    {
        $item = unserialize($item);
        $number = $item['number'];
        $hasError = $item['error'];
        echo "Item: ".$item['number'].' - '.($hasError ? 'ERROR ' : ' SUCESS').PHP_EOL;
        if ($hasError) {
            self::$FAULTS[] = $item;
        }
        sleep(1);
    }

protected function runAfterBatch(InputInterface $input, OutputInterface $output, array $items): void
    {
        // flush the database and clear the entity manager
        p_r(self::$FAULTS);
    }

runAfterBatch returned

array:2 [           
  0 => array:2 [    
    "number" => 9   
    "error" => true 
  ]                 
  1 => array:2 [    
    "number" => 10  
    "error" => true 
  ]                 
]                   

so obviously one can use static variables to exchange payload between the master and the child processes, which kind of surprised me. Was that supposed to work or is it just a coincidence that the result is correctly? May there be any issues with using static variables?

from console-parallelization.

andreas-gruenwald avatar andreas-gruenwald commented on May 26, 2024

That sounds weird indeed, are you sure you tested with parallelized process (i.e. at least 2)?

My mistake, wasn't aware that the default number of processes equals 1.

bin/console app:parallelization-advanced-example -p 3

Now the static variables do not work reliably anymore.


To sum up:
It would be great to have the possibility to exchange payload between children and master process in 2.x!

from console-parallelization.

theofidry avatar theofidry commented on May 26, 2024

It would be great to have the possibility to exchange payload between children and master process in 2.x!

It's more of a PHP problem really :/ Unfortunately there is no easy way to do this. Maybe krakjoe/parallel can help there but last time I tried it was failing in my case (see krakjoe/parallel#64) and requires PHP ZTS which is in itself a challenge as well.

from console-parallelization.

andreas-gruenwald avatar andreas-gruenwald commented on May 26, 2024

Ok, I got it.
PHP ZTS ist not an option for us...

Just thinking out loud:
If there is no solution implementation for the message transfer, the bundle could provide a hook where one can

  • save / return payload based on a child process process (e.g. return non-empty array in runSingleCommand() and if the array is not empty, savePayload() is called, which by default is not implemented.

  • getPayload(), which can be called in runAfterBatch() and runAfterLastCommand() and contains one payload per child process.

Pimcore (or any other framework) could then implement a default solution for exchanging payload (e.g. RabbitMQ / Symfony Messenger, or any other simple data store) which can then be implemented in all Pimcore commands.

from console-parallelization.

theofidry avatar theofidry commented on May 26, 2024

I think the best solution if you want to exchange payloads between processes is to rely on an external mechanism: be it via a DB, a dedicated file or something. Technically it could be possible to do without, but it's wacky and very limited so I do not think it is wise to try to encourage that.

from console-parallelization.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.