Comments (9)
so obviously one can use static variables to exchange payload between the master and the child processes, which kind of surprised me. Was that supposed to work or is it just a coincidence that the result is correctly? May there be any issues with using static variables?
That sounds weird indeed, are you sure you tested with parallelized process (i.e. at least 2)? Might be that it's shared in-memory otherwise. I'm not super familiar with it but I recall some possible optimization done with "simple" arrays (as just arrays of scalar values or nested arrays of those).
Looking at the code again I would like to change a couple of things, but that's likely going to be in 2.x. So if that's what you need let's open it up then
from console-parallelization.
The parent handles exceptions: it logs them & reset the container. The rationale is that the errors that are thrown by runSingleCommand
should make the process fail.
The reason for that behaviour is that say you are processing 50 items in the segment by batches of 10, and the exception is thrown on the 17th item: if the parent carries on it would skip 33 items.
A solution would be to yield the item number somehow which could be given back to the master process in order to carry on at the 18th item, but it's more complex and so far we did not really need it either.
from console-parallelization.
Use-Case
- 50 items
- 5 children processes (batches of 10 items)
- Before the command is terminated, a report should be created, which contains the number of correctly processed and failed IDs (for simplicity, let's assume a CSV export).
- If one of the child processes terminates for any reason, in my imagination an exception could be thrown, including additional information about the processed / and open items. The master process would then be able to write a report before finally terminating.
- It's not necessary to let another child process the failed items of the other process.
So the goal is not to proceed in the master and ignore the error, but to allow some additional error handling before finally terminating.
from console-parallelization.
Can't that be done in the regular runSingleCommand
though? runTolerantSingleCommand
is within the child process as well, not the parent one
from console-parallelization.
I would say that depends on the architecture and on the use case.
In case of managing a CSV summary report, you have to ensure that
- no simultanous write access to the file on the filesystem takes place,
- the items are written in their original order. When multiple children processes are running, one worker may be faster than the other, so without additional handling, the sort order may not be the same in the resulting CSV file.
This is a very simplification of our use-cases, but I hope you get the idea.
We want the children processes to perform the "stupid" work, but let the "master process" do the administrative things.
Basically it is not about the error handling, but it is about exchanging some payload between the child processes and the master. The master can then react in runAfterBatch()
.
I performed another test today and found out that you can use static variables to exchange data between the child processes and the parent processes in order to to so.
...
protected static $FAULTS = [];
...
protected function fetchItems(InputInterface $input): array {
$list = [];
for ($i=1; $i <= 10; $i++) {
$list[] = serialize(['number' => $i, 'error' => rand(1,10) <= 3]);
}
return $list;
}
protected function runSingleCommand(string $item, InputInterface $input, OutputInterface $output): void
{
$item = unserialize($item);
$number = $item['number'];
$hasError = $item['error'];
echo "Item: ".$item['number'].' - '.($hasError ? 'ERROR ' : ' SUCESS').PHP_EOL;
if ($hasError) {
self::$FAULTS[] = $item;
}
sleep(1);
}
protected function runAfterBatch(InputInterface $input, OutputInterface $output, array $items): void
{
// flush the database and clear the entity manager
p_r(self::$FAULTS);
}
runAfterBatch
returned
array:2 [
0 => array:2 [
"number" => 9
"error" => true
]
1 => array:2 [
"number" => 10
"error" => true
]
]
so obviously one can use static variables to exchange payload between the master and the child processes, which kind of surprised me. Was that supposed to work or is it just a coincidence that the result is correctly? May there be any issues with using static variables?
from console-parallelization.
That sounds weird indeed, are you sure you tested with parallelized process (i.e. at least 2)?
My mistake, wasn't aware that the default number of processes equals 1.
bin/console app:parallelization-advanced-example -p 3
Now the static variables do not work reliably anymore.
To sum up:
It would be great to have the possibility to exchange payload between children and master process in 2.x!
from console-parallelization.
It would be great to have the possibility to exchange payload between children and master process in 2.x!
It's more of a PHP problem really :/ Unfortunately there is no easy way to do this. Maybe krakjoe/parallel can help there but last time I tried it was failing in my case (see krakjoe/parallel#64) and requires PHP ZTS which is in itself a challenge as well.
from console-parallelization.
Ok, I got it.
PHP ZTS ist not an option for us...
Just thinking out loud:
If there is no solution implementation for the message transfer, the bundle could provide a hook where one can
-
save / return payload based on a child process process (e.g. return non-empty array in
runSingleCommand()
and if the array is not empty,savePayload()
is called, which by default is not implemented. -
getPayload()
, which can be called inrunAfterBatch()
andrunAfterLastCommand()
and contains one payload per child process.
Pimcore (or any other framework) could then implement a default solution for exchanging payload (e.g. RabbitMQ / Symfony Messenger, or any other simple data store) which can then be implemented in all Pimcore commands.
from console-parallelization.
I think the best solution if you want to exchange payloads between processes is to rely on an external mechanism: be it via a DB, a dedicated file or something. Technically it could be possible to do without, but it's wacky and very limited so I do not think it is wise to try to encourage that.
from console-parallelization.
Related Issues (20)
- Roadmap? HOT 1
- Better solution for the "item" - argument? HOT 2
- Pipe breaks when quotes are used in input options
- Plans for v2 HOT 1
- Auto-detect the number of processes
- Always use a sub-process; add no-parallel option HOT 1
- Add options to change the segment size or batch size on the fly
- Add stop-on-failure option
- Configuration calculation may result in an exception if there is no items
- [2.x] Some resources are missing HOT 4
- Limit the exit code to 255
- Incorrect error message HOT 1
- Make it easier to override getParallelExecutableFactory()
- Add a utility decorator logger
- Add a logger to log the item handling to an arbitrary logger HOT 1
- Error when $_SERVER['PWD'] is not set HOT 7
- Documentation for ::getParallelExecutableFactory HOT 3
- Error when the server script name is an absolute path HOT 4
- Consider run before/last segment API
- Make the item argument name configurable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from console-parallelization.