Comments (20)
I can add support for passing down the time limit, but I'm afraid of that causing performance issues.
5m should be enough for the amount of data we have (1m would even make a better default) - I've ran scrolled searches of the entire big types and they never come near 5m.
Is it possible to see code that causes this issue for debugging?
Creating a longer-life scrollers on the server for the sake of slower consumption is not being very polite to the server (scrolled searches are taking up resources) - I'd say, try to consume all the data once you have the scroller object, let it go out of scope (so the DELETE req is sent) and then process your data at any pace you wish and both client and server are happy.
from metacpan-client.
Sure, the script I'm running is here:
http://subversion.city-fan.org/repos/cfo-repo/buildsys-scripts/pmv
from metacpan-client.
wow, that's a lot of scrollers you're creating there :)
you can mitigate this quickly by grouping the distributions into less queries and do the data differentiation later in your Perl code - use the 'es_filter' and the 'terms' query (matches any of multi-values) instead of creating a separate scroller for each one (you may still need to do several as there's some ES limit for how many values you can pass in a single query).
e.g.
my $releases = $mcpan->all(
'releases',
{
es_filter => {
and => [
{ term => { maturity => 'released' } },
{ terms => {
distribution => [qw<
...
Algorithm-C3
Algorithm-Diff
Algorithm-Diff-XS
Any-Moose
AnyEvent
...
>] } }
]
}
}
);
There are probably more ways of optimizing it on your end - please let me know how it goes once you tried it.
from metacpan-client.
This approach looks promising but I'd like to be able to specify fields as well as the filter and I don't seem to be able to do it - the filter seems to be all that is passed on. What I probably really need is the ability to specify a source filter so I only get back the fields in _source that I'm actually interested in, which would save a lot of network traffic. Is there any way to combine that with es_filter?
from metacpan-client.
you can specify fields (or _source entries) through the 'fields' keyword in the same hash ref:
my $releases = $mcpan->all(
'releases',
{
es_filter => {
...
},
fields => [qw< ... >],
}
);
from metacpan-client.
I tried that but it seemed to ignore the fields parameter when I specified it. I still got back all of the fields as part of _source.
from metacpan-client.
I'm not sure I follow you, here's an example:
use MetaCPAN::Client;
use DDP;
p MetaCPAN::Client->new->all(
'releases',
{
es_filter => {
and => [
{ term => { maturity => 'released' } },
{ terms => {
distribution => [qw<
Algorithm-C3
Any-Moose
AnyEvent
>] } }
]
},
fields => ['abstract'],
}
)->next;
output:
MetaCPAN::Client::Release {
Parents Moo::Object
public methods (23) : abstract, archive, author, authorized, date, dependency, distribution, DOES, download_url, first, id, license, maturity, metadata, name, new, provides, resources, stat, status, tests, version, version_numified
private methods (1) : _known_fields
internals: {
data {
abstract "provide framework for multiple event loops"
}
}
}
what am I missing?
from metacpan-client.
Doing almost exactly the same thing:
use MetaCPAN::Client;
use DDP output => 'stdout';
my $mcpan = MetaCPAN::Client->new();
my $releases = $mcpan->all(
'releases',
{
es_filter => {
and => [
{ term => { maturity => 'released' } },
{ terms => { distribution => [qw< Algorithm-C3 Any-Moose AnyEvent >] } }
]
},
fields => ['abstract'],
}
);
p $releases->next;
I get:
MetaCPAN::Client::Release {
Parents Moo::Object
public methods (23) : abstract, archive, author, authorized, date, dependency, distribution, DOES, download_url, first, id, licens
e, maturity, metadata, name, new, provides, resources, stat, status, tests, version, version_numified
private methods (1) : _known_fields
internals: {
data {
abstract "provide framework for multiple event loops",
archive "AnyEvent-0.4.tar.gz",
author "MLEHMANN",
authorized 1,
date "2005-12-30T01:29:43",
dependency [
[0] {
module "Event",
phase "runtime",
relationship "requires",
version 0.86
}
],
distribution "AnyEvent",
download_url "https://cpan.metacpan.org/authors/id/M/ML/MLEHMANN/AnyEvent-0.4.tar.gz",
first 0,
id "oSq7IsV6q5iIPepldwlzULKeook",
license [
[0] "unknown"
],
maturity "released",
metadata {
abstract "unknown",
author [
[0] "unknown"
],
dynamic_config 1,
generated_by "ExtUtils::MakeMaker version 6.23, CPAN::Meta::Converter version 2.150005",
license [
[0] "unknown"
],
meta-spec {
url "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
version 2
},
name "AnyEvent",
no_index {
directory [
[0] "t",
[1] "xt",
[2] "inc",
[3] "local",
[4] "perl5",
[5] "fatlib",
[6] "example",
[7] "blib",
[8] "examples",
[9] "eg"
]
},
prereqs {
runtime {
requires {
Event 0.86
}
}
},
release_status "stable",
version 0.4,
x_installdirs "site",
x_version_from "lib/AnyEvent.pm"
},
name "AnyEvent-0.4",
provides [
[0] "AnyEvent",
[1] "AnyEvent::Impl::Coro",
[2] "AnyEvent::Impl::Event",
[3] "AnyEvent::Impl::Glib",
[4] "AnyEvent::Impl::Tk"
],
resources {},
stat {
gid 1009,
mode 33188,
mtime 1135906183,
size 6920,
uid 1009
},
status "backpan",
tests {
fail 0,
na 0,
pass 2,
unknown 0
},
version 0.4,
version_numified 0.4
}
}
}
from metacpan-client.
odd, running your code I still get the same response as I did.
which version of the client are you running?
from metacpan-client.
2.005000. That was the first thing I thought of too...
from metacpan-client.
hmm... I think I wasn't running the latest... let me dig into it.
from metacpan-client.
I found we don't pass the fields in this specific case (not even sure why as it was documented)
I have a fix ready and will do a release with it soon.
from metacpan-client.
I released 2.006000 (will be available soon) with the missing support for 'fields' and additional support for '_source' - which you will probably need in case you're fetching non-indexed fields (requires some understanding of the ES side of it, but I don't think there's a reasonable way around the ES limitation here)
Note that pulling _source entries means the server fetches the entire document instead of using indexed fields... so performance-wise you're better off using 'fields' if they satisfy what you need.
Please install the new version and let me know how it goes.
from metacpan-client.
Now I get the same output as you do :-)
from metacpan-client.
cool, so moving forward - are these changes and the advice above helping with your original problem?
from metacpan-client.
Yes, now I've updated the script to fetch data for multiple dists in the same query and it seems quite a lot faster and less prone to errors. Thanks for the suggestion!
from metacpan-client.
awesome.
can we close this issue then? I rather not allow long scrollers at the moment, as you see they are not really required.
from metacpan-client.
Indeed.
from metacpan-client.
Too little. too late, but the size attribute may also come into play.. Scroll timeouts only occur if the scroll isn't rerun within the timeout, ie, each request resets the scroll timeout. The current limit is 100, but making that user configurable to fewer results, maybe helpful?
from metacpan-client.
@reyjrar thanks for the tip - it's not too late :)
The default in our scoller is 100, but the Client sends a 1000 as a size parameter, so that's the actual number.
We can add size parameter, but since the user doesn't get a bulk of results (that's only used internally for buffer management) but an iterator - it might cause some confusion on the user end.
I'll prepare a change to allow size/time settings for pro-users and document it as such with the appropriate warnings.
from metacpan-client.
Related Issues (20)
- testsuite fails on alpine linux HOT 9
- fav.pl shipped? HOT 3
- Include the coverage data in the data returned by the recent call HOT 3
- How to fetch the issues of a release/distribution? HOT 5
- How to fetch kwalitee metrics? HOT 12
- tests fail in fetch() HOT 1
- t/api/reverse-dependencies.t started to fail HOT 2
- t/scroll.t failing in 2.008000 HOT 2
- t/ua_trap.t fails (if WWW::Mechanize::Cached is too old?) HOT 6
- Undeclared dependency LWP::Protocol::https HOT 6
- failed to delete scroller at .../MetaCPAN/Client/Scroll.pm line 162. HOT 4
- Handling of HTTP/transport errors
- t/api/author.t fails HOT 1
- specifying fields in requests can break array type checking HOT 1
- `fields` is documented that it will accept a csv string, but fails if it's not an arrayref HOT 1
- "Wrong type of query arguments" after upgrading from v0-era MetaCPAN::Client HOT 5
- reverse_dependencies() points to an invalid URL HOT 2
- Reverse dependencies method misses some modules HOT 1
- reverse_dependencies broken? HOT 3
- Infinite loop while( my $rs = scroller->next ) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metacpan-client.