Giter Club home page Giter Club logo

Comments (7)

brianlmoon avatar brianlmoon commented on August 15, 2024

I have the same problem with image data. Mine is the other diirection.

net_gearman does have the issue that it always returns an array. I plan to fix that soon.

As for your issue, binary data is likely not UTF8 safe. The mysql table should really use a blob, not a text for the data. I don't use it personally.

We use an NFS mount of a GlusterFS cluster to store large files that have to be passed around.

from net_gearman.

bartclarkson avatar bartclarkson commented on August 15, 2024

Cool. I've opted to avoid an NFS situation as this code deploys to an autoscaling AWS group of EC2s, and the robustness/redundancy we're after seems best served with RDS acting as the MySQL backend to gearman in the context of how AWS is meant to glue together.

S3 is in the mix in terms of storing the original file and the final files, but the nature in which the PDF is burst into multiple pages, each of which are then immediately farmed out for asynchronous job processing, makes the relative lag of S3 unappealing for the intermediary steps.

Appreciate your thoughts, and I'll keep an eye out on your refactor to the array handling.

from net_gearman.

brianlmoon avatar brianlmoon commented on August 15, 2024

I will look in net_gearman a bit and see if there is something going on there. The table I saw an example of shows a long blob for the data. That should not have any issues with binary data. We store images in them all the time.

It's possible I suppose that something in gearmand is messing it up. Not sure.

Are you passing the data in as a blob or array?

Brian.

On Mar 18, 2015, at 20:31, Bart Clarkson [email protected] wrote:

Cool. I'm opted to avoid an NFS situation as this code deploys to an autoscaling AWS group of EC2s, and the robustness/redundancy we're after seems best served with RDS acting as the MySQL backend to gearman in the context of how AWS is meant to glue together.

S3 is in the mix in terms of storing the original file and the final files, but the nature in which the PDF is burst into multiple pages, each of which are then immediately farmed out for asynchronous job processing, makes the relative lag of S3 unappealing for the intermediary steps.

Appreciate your thoughts, and I'll keep an eye out on your refactor to the array handling.


Reply to this email directly or view it on GitHub.

from net_gearman.

bartclarkson avatar bartclarkson commented on August 15, 2024

I'll share the salient bits, since a coder wants to see code, and for good reason.

First is the manager class that I use to wrap GearmanManager. Simple convenience factor of doing Dependency Injection relative to deployment environment ala Symfony2.

namespace GearmanManager\GearmanManagerBundle\Services;
use Symfony\Component\HttpKernel\Exception\HttpException as HttpException;
require_once '/usr/share/php/Net/Gearman/Client.php';
class GearmanManagerManager
{
    protected $job_server;
    public function __construct($GearmanManagerBundleConfig)
    {
        $this->job_server = $GearmanManagerBundleConfig['job_server'];
    }
    public function addBackgroundJob($function, $data)
    {
        $gmclient= new \Net_Gearman_Client($this->job_server);
        try {
            $result = $gmclient->$function($data);
        } catch (\Exception $e) {
            throw new HttpException("503: " . $e->getMessage(), $e->getMessage());
        }
    }
}

And then here's the code in question that uses it. "local_volume" is defined as the tmp directory worker machines write the pdf page before calling a CLI command to hammer the thing.

// ...
            $this->gearman_manager_manager->addBackgroundJob('ProcessDocumentPage',
                array(
                    'document_id' => $this->document_id,
                    'page_number' => $page,
                    'pdf_page_string' => base64_encode($pdf_page_string),
                    'local_volume' => $this->local_volume
                )
            );
// ...

And then the first bit of the worker.

class Net_Gearman_Job_ProcessDocumentPage extends \Net_Gearman_Job_Common {
    public function run($args) {
        $document_id = $args['document_id'];
        $page_number = $args['page_number'];
        $pdf_page_string = base64_decode($args['pdf_page_string']);
        $local_volume = $args['local_volume'];      
       // ..      
   }
}

from net_gearman.

bartclarkson avatar bartclarkson commented on August 15, 2024

When I inspect the content of the "data" field in mysql, which is definitely typed as "long blob", I see this kind of thing:

{"document_id":"29","page_number":"2", "pdf_page_string": "a-lot-of-characters", "local_volume": "\/tmp"}

Which just looks for all the world like the kinda thing json_encode($some_keyed_array) produces in PHP.

I've also only ever really used blob fields where the blob was truly just One Thing. My favorite database application, Sequel Pro, even has built in functionality for viewing a blob as the file it actually is. Which doesn't have a prayer of succeeding if the blob isn't One Thing.

So I don't know that I can really do any better in my use case. Seems like the nature of the MySQL implementation.

If I set $args equal to $pdf_page_string, it might work. I guess that's what you're getting at when you're talking about passing an array vs the variable value? That would have never occurred to me. But I'd be hosed inasmuch as I need the other data points to do the job.

from net_gearman.

brianlmoon avatar brianlmoon commented on August 15, 2024

Well, I would guess that json_encode is messing up the data. JSON requires valid UTF8 data. You could try using serialize instead. Your worker would need to unserialize.

Brian.

On Mar 18, 2015, at 21:02, Bart Clarkson [email protected] wrote:

I'll share the salient bits, since a coder wants to see code, and for good reason.

First is the manager class that I use to wrap GearmanManager. Simple convenience factor of doing Dependency Injection relative to deployment environment ala Symfony2.

namespace GearmanManager\GearmanManagerBundle\Services;
use Symfony\Component\HttpKernel\Exception\HttpException as HttpException;
require_once '/usr/share/php/Net/Gearman/Client.php';
class GearmanManagerManager
{
protected $job_server;
public function __construct($GearmanManagerBundleConfig)
{
$this->job_server = $GearmanManagerBundleConfig['job_server'];
}
public function addBackgroundJob($function, $data)
{
$gmclient= new \Net_Gearman_Client($this->job_server);
try {
$result = $gmclient->$function($data);
} catch (\Exception $e) {
throw new HttpException("503: " . $e->getMessage(), $e->getMessage());
}
}
}
And then here's the code in question that uses it. "local_volume" is defined as the tmp directory worker machines write the pdf page before calling a CLI command to hammer the thing.

// ...
$this->gearman_manager_manager->addBackgroundJob('ProcessDocumentPage',
array(
'document_id' => $this->document_id,
'page_number' => $page,
'pdf_page_string' => base64_encode($pdf_page_string),
'local_volume' => $this->local_volume
)
);
// ...

Reply to this email directly or view it on GitHub.

from net_gearman.

bartclarkson avatar bartclarkson commented on August 15, 2024

Read my mind. I was just looking at that, and at line 209 of master/Net/Gearman/Client.php.

But it's the same deal for serialize. There's a 4 year old comment for function.serialize.php at php.net about base64 encoding the binary portion of the object to be serialized.

It quite a stretch to assign responsibility to Client.php to iterate $task->arg, detect whether a given nested value is binary with something like ctype_print(), base64_encode it, and then somehow append data to trigger a later decode prior to passing it to the worker. I suppose it's possible, though.

I can really find no reasonable critique for the Gearmand MySQL approach, either. It invites too many questions to imagine Gearmand modifying the mysql integration such that multiple argument values are each written to a unique row as blobs.

There's nothing stopping a given developer from getting everything he or she needs out of the present approach. Either nest a scalar that indicates to the worker the location of the binary file/string, or base64_encode that sucker to a big scalar string.

Thanks so much for your time, Brian. It's a great project, and has helped me immensely.

from net_gearman.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.