Giter Club home page Giter Club logo

Comments (20)

mickeyn avatar mickeyn commented on July 2, 2024

I can add support for passing down the time limit, but I'm afraid of that causing performance issues.
5m should be enough for the amount of data we have (1m would even make a better default) - I've ran scrolled searches of the entire big types and they never come near 5m.

Is it possible to see code that causes this issue for debugging?

Creating a longer-life scrollers on the server for the sake of slower consumption is not being very polite to the server (scrolled searches are taking up resources) - I'd say, try to consume all the data once you have the scroller object, let it go out of scope (so the DELETE req is sent) and then process your data at any pace you wish and both client and server are happy.

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

Sure, the script I'm running is here:
http://subversion.city-fan.org/repos/cfo-repo/buildsys-scripts/pmv

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

wow, that's a lot of scrollers you're creating there :)

you can mitigate this quickly by grouping the distributions into less queries and do the data differentiation later in your Perl code - use the 'es_filter' and the 'terms' query (matches any of multi-values) instead of creating a separate scroller for each one (you may still need to do several as there's some ES limit for how many values you can pass in a single query).
e.g.

my $releases = $mcpan->all(
    'releases',
    {
        es_filter => {
            and => [
                { term  => { maturity => 'released' } },
                { terms => {
                    distribution => [qw<                                                                                                                      
                        ...                                                                                                                                   
                        Algorithm-C3                                                                                                                          
                        Algorithm-Diff                                                                                                                        
                        Algorithm-Diff-XS                                                                                                                     
                        Any-Moose                                                                                                                             
                        AnyEvent                                                                                                                              
                        ...                                                                                                                                   
                    >] } }
            ]
        }
    }
);

There are probably more ways of optimizing it on your end - please let me know how it goes once you tried it.

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

This approach looks promising but I'd like to be able to specify fields as well as the filter and I don't seem to be able to do it - the filter seems to be all that is passed on. What I probably really need is the ability to specify a source filter so I only get back the fields in _source that I'm actually interested in, which would save a lot of network traffic. Is there any way to combine that with es_filter?

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

you can specify fields (or _source entries) through the 'fields' keyword in the same hash ref:

my $releases = $mcpan->all(
    'releases',
    {
        es_filter => {
            ...
        },
        fields => [qw< ... >],
    }
);

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

I tried that but it seemed to ignore the fields parameter when I specified it. I still got back all of the fields as part of _source.

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

I'm not sure I follow you, here's an example:

use MetaCPAN::Client;
use DDP;
p MetaCPAN::Client->new->all(
    'releases',
    {
        es_filter => {
            and => [
                { term  => { maturity => 'released' } },
                { terms => {
                    distribution => [qw<                                      
                        Algorithm-C3                                          
                        Any-Moose                                             
                        AnyEvent                                              
                    >] } }
            ]
        },
        fields => ['abstract'],
    }
)->next;

output:

MetaCPAN::Client::Release  {
    Parents       Moo::Object
    public methods (23) : abstract, archive, author, authorized, date, dependency, distribution, DOES, download_url, first, id, license, maturity, metadata, name, new, provides, resources, stat, status, tests, version, version_numified
    private methods (1) : _known_fields
    internals: {
        data   {
            abstract   "provide framework for multiple event loops"
        }
    }
}

what am I missing?

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

Doing almost exactly the same thing:

use MetaCPAN::Client;
use DDP output => 'stdout';
my $mcpan = MetaCPAN::Client->new();
my $releases = $mcpan->all(
        'releases',
        {
                es_filter => {
                        and => [
                                { term    => { maturity     => 'released'   } },
                                { terms   => { distribution => [qw< Algorithm-C3 Any-Moose AnyEvent >] } }
                        ]
                },
                fields => ['abstract'],
        }
);
p $releases->next;

I get:

MetaCPAN::Client::Release  {
    Parents       Moo::Object
    public methods (23) : abstract, archive, author, authorized, date, dependency, distribution, DOES, download_url, first, id, licens
e, maturity, metadata, name, new, provides, resources, stat, status, tests, version, version_numified
    private methods (1) : _known_fields
    internals: {
        data   {
            abstract           "provide framework for multiple event loops",
            archive            "AnyEvent-0.4.tar.gz",
            author             "MLEHMANN",
            authorized         1,
            date               "2005-12-30T01:29:43",
            dependency         [
                [0] {
                    module         "Event",
                    phase          "runtime",
                    relationship   "requires",
                    version        0.86
                }
            ],
            distribution       "AnyEvent",
            download_url       "https://cpan.metacpan.org/authors/id/M/ML/MLEHMANN/AnyEvent-0.4.tar.gz",
            first              0,
            id                 "oSq7IsV6q5iIPepldwlzULKeook",
            license            [
                [0] "unknown"
            ],
            maturity           "released",
            metadata           {
                abstract         "unknown",
                author           [
                    [0] "unknown"
                ],
                dynamic_config   1,
                generated_by     "ExtUtils::MakeMaker version 6.23, CPAN::Meta::Converter version 2.150005",
                license          [
                    [0] "unknown"
                ],
                meta-spec        {
                    url       "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
                    version   2
                },
                name             "AnyEvent",
                no_index         {
                    directory   [
                        [0] "t",
                        [1] "xt",
                        [2] "inc",
                        [3] "local",
                        [4] "perl5",
                        [5] "fatlib",
                        [6] "example",
                        [7] "blib",
                        [8] "examples",
                        [9] "eg"
                    ]
                },
                prereqs          {
                    runtime   {
                        requires   {
                            Event   0.86
                        }
                    }
                },
                release_status   "stable",
                version          0.4,
                x_installdirs    "site",
                x_version_from   "lib/AnyEvent.pm"
            },
            name               "AnyEvent-0.4",
            provides           [
                [0] "AnyEvent",
                [1] "AnyEvent::Impl::Coro",
                [2] "AnyEvent::Impl::Event",
                [3] "AnyEvent::Impl::Glib",
                [4] "AnyEvent::Impl::Tk"
            ],
            resources          {},
            stat               {
                gid     1009,
                mode    33188,
                mtime   1135906183,
                size    6920,
                uid     1009
            },
            status             "backpan",
            tests              {
                fail      0,
                na        0,
                pass      2,
                unknown   0
            },
            version            0.4,
            version_numified   0.4
        }
    }
}

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

odd, running your code I still get the same response as I did.

which version of the client are you running?

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

2.005000. That was the first thing I thought of too...

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

hmm... I think I wasn't running the latest... let me dig into it.

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

I found we don't pass the fields in this specific case (not even sure why as it was documented)
I have a fix ready and will do a release with it soon.

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

I released 2.006000 (will be available soon) with the missing support for 'fields' and additional support for '_source' - which you will probably need in case you're fetching non-indexed fields (requires some understanding of the ES side of it, but I don't think there's a reasonable way around the ES limitation here)

Note that pulling _source entries means the server fetches the entire document instead of using indexed fields... so performance-wise you're better off using 'fields' if they satisfy what you need.

Please install the new version and let me know how it goes.

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

Now I get the same output as you do :-)

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

cool, so moving forward - are these changes and the advice above helping with your original problem?

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

Yes, now I've updated the script to fetch data for multiple dists in the same query and it seems quite a lot faster and less prone to errors. Thanks for the suggestion!

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

awesome.

can we close this issue then? I rather not allow long scrollers at the moment, as you see they are not really required.

from metacpan-client.

pghmcfc avatar pghmcfc commented on July 2, 2024

Indeed.

from metacpan-client.

reyjrar avatar reyjrar commented on July 2, 2024

Too little. too late, but the size attribute may also come into play.. Scroll timeouts only occur if the scroll isn't rerun within the timeout, ie, each request resets the scroll timeout. The current limit is 100, but making that user configurable to fewer results, maybe helpful?

from metacpan-client.

mickeyn avatar mickeyn commented on July 2, 2024

@reyjrar thanks for the tip - it's not too late :)

The default in our scoller is 100, but the Client sends a 1000 as a size parameter, so that's the actual number.

We can add size parameter, but since the user doesn't get a bulk of results (that's only used internally for buffer management) but an iterator - it might cause some confusion on the user end.

I'll prepare a change to allow size/time settings for pro-users and document it as such with the appropriate warnings.

from metacpan-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.