Giter Club home page Giter Club logo

solrium's Introduction

Project Status: Abandoned

This package has been archived. The former README is now in README-not.

solrium's People

Contributors

1havran avatar kbroman avatar maelle avatar sckott avatar seandavi avatar stevenmmortimer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solrium's Issues

Error in a$response : $ operator is invalid for atomic vector

Hi,

Thanks for this great work.
I am new to R language, and trying to use this today.

I followed the test case:
test-solr_search.r

and run: a <- solr_search(q=':', rows=2, fl='size', url='mysolrurl', key='size')
and then call a$response, it failed with following error:
Error in a$response : $ operator is invalid for atomic vector

a['response']
[1] NA
class(a)
[1] "character"

Any idea why this happened and how to fix?
Thanks again :)

Before 1st version to CRAN

  • Test against some other endpoints, BISON, Dataone, etc.
  • More examples exploring all/most parameters for users not familiar with solr
  • Figure out instructions for local setup and interaction with solr via this pkg
  • Do we need formal S3 classes for return objects? made some simple stuff
  • Do internally parsing of arguments within each function consistently, not consistent right now. NO NEED, ONLY MULTIPLE ARGS ALLOWED FOR FACETING, SO LEAVING ALONE

Package name

@ropensci/owners Does this package name need to change from solr to something else? Or is it okay

Support wt=csv

E.g. http://api.plos.org/search?q=*:*&wt=csv&fl=id,score

id,score
10.1371/journal.pone.0107569/introduction,1.0
10.1371/journal.pone.0107569/results_and_discussion,1.0
10.1371/journal.pone.0107569/materials_and_methods,1.0
10.1371/journal.pone.0107569/supporting_information,1.0
10.1371/journal.pone.0062138,1.0
10.1371/journal.pone.0044030/title,1.0
10.1371/journal.pone.0062138/title,1.0
10.1371/journal.pone.0044030/abstract,1.0
10.1371/journal.pone.0062138/abstract,1.0
10.1371/journal.pone.0044030/references,1.0

XML, dataone, etc

@sckott I think package is a brilliant idea. Standardizing how we handle solr queries across packages would be a big boost. dataone supports a full set of queries as well, and the dataone package provides a basic interface for this. CC'ing @mbjones in case he wants to take a look at how you're going about this or has any suggestions for you.

I see your query about XML in the README. Though I haven't done much with solr at this time, I'd nonetheless recommend we consider supporting XML as well as JSON. I don't think it makes sense for the package to make this decision for the user: a user who wants solr queries that return XML should be able to have them, yes?

JSON certainly has it's advantages, but we have a lot of tools for working with XML that don't really have analogs in JSON: XPath, XPointer, schema, XSLT, etc that can all be pretty handy.

Some queries return errors I don't understand

Probably this is not an issue in the code but some data maybe missing.

solr_group(q=':', group.field='journal', rows=5, group.limit=1,
group.sort='publication_date desc', fl='journal, publication_date', url=url, key=key)

Error in rbindlist(lapply(datout, function(x) { :
Item 4 has 4 columns, inconsistent with item 1 which has 5 columns

The problem is that I get a similar error with many queries to http://api.plos.org/search

Can you deal with this missing data?

Multiple fl parameters not accepted with some versions of Solr

Right now, we have solr setup so that users pass in options to the fl parameter in a vector like c("one", "two"), which gets parsed to fl=one&fl=two in the URL string, but it doesn't always work. In Dyrad's Solr endpoint this doesn't work

http://datadryad.org/solr/search/select?q=Galliard&wt=json&fl=handle&fl=dc.title_sort

the second fl parameter is ignored.

But this works in PLOS's search API

http://api.plos.org/search?q=*:*&wt=json&fl=id&fl=journal

Rare behaviour with param qt for Solr_group

I can not reproduce it in PLOS.

Normal query with 3 attributes in field list. I get a warning

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url)
                             groupValue numFound start rating touroperator
1  137a9c30-ee49-11df-a13b-0050569335f3    12545     0      3           JI
2  103fe3f0-8f5c-11df-a2df-001c42000009     6702     0      3           JI
3  79760f30-5b14-11e2-bb05-000c297659d3    19983     0      1           CH
4  50e1b6f0-5fe7-11e2-bb05-000c297659d3    39773     0      2           JI
5  fda70780-9b3c-11e0-9153-005056930057     1659     0      4           JI
6  8fdcaba0-bc9c-11e2-a109-000c297659d3    69484     0      2           JI
7  10d5bb50-8f5c-11df-a2df-001c42000009     5235     0      4           JI
8  0e3769c0-8f5c-11df-a2df-001c42000009    51906     0      4           JI
9  108fd8b0-8f5c-11df-a2df-001c42000009     2584     0      3           JI
10 0c880c10-8f5c-11df-a2df-001c42000009    57270     0      2           JI
   price
1  12700
2  13500
3  13700
4  14017
5  14700
6  14833
7  15166
8  15208
9  15225
10 15233
Warning message:
In if (names(input) == "response") { :
  the condition has length > 1 and only the first element will be used

Same query with raw true

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, raw=TRUE)
[1] "{\"responseHeader\":{\"status\":0,\"QTime\":1472},\"grouped\":{\"accoid\":{\"matches\":34553291,\"groups\":[{\"groupValue\":\"137a9c30-ee49-11df-a13b-0050569335f3\",\"doclist\":{\"numFound\":12545,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":12700}]}},{\"groupValue\":\"103fe3f0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":6702,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":13500}]}},{\"groupValue\":\"79760f30-5b14-11e2-bb05-000c297659d3\",\"doclist\":{\"numFound\":19983,\"start\":0,\"docs\":[{\"rating\":1,\"touroperator\":\"CH\",\"price\":13700}]}},{\"groupValue\":\"50e1b6f0-5fe7-11e2-bb05-000c297659d3\",\"doclist\":{\"numFound\":39773,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":14017}]}},{\"groupValue\":\"fda70780-9b3c-11e0-9153-005056930057\",\"doclist\":{\"numFound\":1659,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":14700}]}},{\"groupValue\":\"8fdcaba0-bc9c-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":69484,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":14833}]}},{\"groupValue\":\"10d5bb50-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":5235,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":15166}]}},{\"groupValue\":\"0e3769c0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":51906,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":15208}]}},{\"groupValue\":\"108fd8b0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":2584,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":15225}]}},{\"groupValue\":\"0c880c10-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":57270,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":15233}]}}]}}}\n"
attr(,"class")
[1] "sr_group"
attr(,"wt")
[1] "json"

When I add qt='distributedSearch' in the response the last 2 attributes are missing

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, qt='distributedSearch', raw=FALSE)
                             groupValue numFound start touroperator
1  accaa2a0-fb51-11e2-a109-000c297659d3    17750     0           CH
2  77f8e0f0-9c42-11e2-a109-000c297659d3     4084     0           JI
3  53432a60-c7df-11e0-aa1b-005056930057     6636     0           JI
4  edefdd00-8f5b-11df-a2df-001c42000009    23974     0           JI
5  137a9c30-ee49-11df-a13b-0050569335f3    12545     0           JI
6  10438d70-8f5c-11df-a2df-001c42000009    13220     0           CH
7  110c34a0-8f5c-11df-a2df-001c42000009    13384     0           CH
8  10427c00-8f5c-11df-a2df-001c42000009     8898     0           JI
9  c69d6fb0-9c41-11e2-a109-000c297659d3     4104     0           JI
10 6f885e80-9336-11e0-9153-005056930057    13065     0           CH
Warning message:
In if (names(input) == "response") { :
  the condition has length > 1 and only the first element will be used

In the raw response they are also missing

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, qt='distributedSearch', raw=TRUE) 
[1] "{\"responseHeader\":{\"status\":0,\"QTime\":1774},\"grouped\":{\"accoid\":{\"matches\":141800873,\"groups\":[{\"groupValue\":\"accaa2a0-fb51-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":17750,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"77f8e0f0-9c42-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":4084,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"53432a60-c7df-11e0-aa1b-005056930057\",\"doclist\":{\"numFound\":6636,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"edefdd00-8f5b-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":23974,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"137a9c30-ee49-11df-a13b-0050569335f3\",\"doclist\":{\"numFound\":12545,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"10438d70-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":13220,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"110c34a0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":13384,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"c69d6fb0-9c41-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":4104,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"10427c00-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":8898,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"6f885e80-9336-11e0-9153-005056930057\",\"doclist\":{\"numFound\":13065,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}}]}}}\n"
attr(,"class")
[1] "sr_group"
attr(,"wt")
[1] "json"

I don't know how to get url sent to Solr to check if the url was built correctly.

group.truncate and group.facet does not make sense for solr_group

This 2 properties make sense when you do a search with groups and facets.
Then the results in the facets are related to the grouping.

I can remove it from the documentation, but I wanted to ask first.

 \item{group.truncate}{(logical) If true, facet counts are
  based on the most relevant document of each group
  matching the query. Same applies for StatsComponent.
  Default is false. <!> Solr3.4 Supported from Solr 3.4 and
  up.}

  \item{group.facet}{(logical) Ihether to compute grouped
  facets for the field facets specified in facet.field
  parameters. Grouped facets are computed based on the
  first specified group. Just like normal field faceting,
  fields shouldn't be tokenized (otherwise counts are
  computed for each token). Grouped faceting supports
  single and multivalued fields. Default is false. <!>
  Solr4.0 WARNING: If this parameter is set to true on a
  sharded environment, all the documents that belong to the
  same group have to be located in the same shard,
  otherwise the count will be incorrect. If you are using
  SolrCloud, consider using "custom hashing"}

Add result grouping/field collapsing

Docs: http://wiki.apache.org/solr/FieldCollapsing

An example query:

http://api.plos.org/search/?q=ecology&group=true&group.field=journal&group.limit=3&fl=id,score

{
grouped: {
journal: {
matches: 18120,
groups: [
{
groupValue: "plos one",
doclist: {
numFound: 13939,
start: 0,
docs: [
{
id: "10.1371/journal.pone.0059813"
}
]
}
},
{
groupValue: "plos biology",
doclist: {
numFound: 746,
start: 0,
docs: [
{
id: "10.1371/journal.pbio.0020072"
}
]
}
},
{

...cutoff

solr_group group.main

group.main should be a boolean following the doc

  \item{group.main}{(logical) If true, the result of the
  last field grouping command is used as the main result
  list in the response, using group.format=simple}

And should return the the group field in the main result

solr_group(q=':', group.field='journal', rows=5, group.limit=3, group.sort='publication_date desc', group.format='simple', group.main='true', fl='publication_date', url=url, key=key)
numFound start publication_date
1 889099 0 2014-01-17T00:00:00Z
2 889099 0 2014-01-17T00:00:00Z
3 889099 0 2014-01-17T00:00:00Z
4 889099 0 2014-01-16T00:00:00Z
5 889099 0 2014-01-16T00:00:00Z

Test Dataone from solr

Carl says Dataone has a solr interface. Test from here and make sure it works, give examples, etc.

Try to unify group, facet, any other functionality into solr_search, or a new fxn

grouping, faceting, etc. are all just param options in search, so all could be done from one function. Returning raw data would be easy. However,

  • Dealing with parsing the complex result might be tricky. Though perhaps simply user mlt, group, facet, etc. parsers for each component returned.
  • As far as I know, you can't get regular search results (i.e., the docs element) back when group=true, but perhaps there is a way to return docs

Regression in solr_search

response <- solr_search(q=':', fl='id', rows=2, url=url, key=key)
response$numFound
NULL

Looks like while adding the solr_group function the solr_search response has lost the attributes

  • numFound
  • start

and only returns the docs

solr_search(q=':', fl='id', rows=2, url=url, key=key)
id
1 10.1371/journal.pone.0071557
2 10.1371/journal.pone.0064577/title

set wt=csv as default

This should provide significant speed advantage over xml and json, and appears to be in Solr for many versions now, meaning it should work for most Solr installations, hopefully.

Should write larger test suite for wt=csv specifically to make sure it's not failing anywhere, and data output is identical to wt=json and wt=xml

also, experiment with replacing read.table() with something else, like data.table::fread(), readr::read_csv() from https://github.com/hadley/readr

readme xhtml

From kurt hornik

These have README.md files which when converted to (X)HTML using a
current version of pandoc show problems when validated using W3C Markup
Validator, see below.

Most of these problems are caused by using images without giving a name
(so the required alt attribute for <img> is not provided), or using <br>
instead of <br/>.

Pls fix these problems in your README.md files for your next release: in
all cases I inspected, the fixes were obvious and confirmation using
pandoc and W3C markup validator seemed unnecessary.

Please also visit your package check web page at http://cran.r-project.org/web/checks/check_results_PACKAGENAME.html to see if other problems need to be addressed as well.

readme fixes

These packages contain README.md files with invalid HTML output created
by pandoc 1.12.4.2 according to W3C-validator.

I attach the HTML errors and warnings found below, and will put copies
of the corresponding HTML files up at
http://www.r-project.org/nosvn/pandoc.

Please investigate the problems and fix as needed.

Afaics, many of the problems are caused by adding "raw" HTML elements in
the README.md files and not realizing that the default output format
"html" is XHTML 1 (and not HTML 5). E.g., a raw
results in an

end tag for "br" omitted, but OMITTAG NO was specified

error.

Best
-k

solr.html:
  Valid: FALSE (errors: 1, warnings: 0)
  Errors:
    line  col  message
     339   98  required attribute "alt" not specified

giving multiple facets as character vectors instead of comma-separated strings?

Nice work on this package, looks awesome and really useful.

Minor quibble: in an example like the one you give below:

solr_facet(q = "*:*", facet.field = "journal", facet.query = "cell,bird", url = url)

it feels a bit un-R like to me that facet.query is "cell,bird" instead of c("cell", "bird"). As an R user I expect a query on two facets to be a length 2 character object in R, not a character string separated by some particular syntax. (Yeah, the c notation is more verbose, but if I'm programmatically assembling my query from an R object I've created some other way, then paste0(facets, collapse=",") is even more verbose...

Anyway, just a minor thought, probably fine either way.

solr_group sort parameter not working

The sorting of groups is done using the default sorting

solr_group(q='*:*', group.field='journal', rows=5, group.limit=1, group.sort='publication_date desc', sort='publication_date desc', fl='publication_date', url=url, key=key)
                        groupValue numFound start     publication_date
  1                         plos one   677297     0 2014-01-17T00:00:00Z
  2 plos neglected tropical diseases    19106     0 2014-01-16T00:00:00Z
  3                    plos genetics    33698     0 2014-01-16T00:00:00Z
  4                             none    62518     0 2012-10-23T00:00:00Z
  5                     plos biology    24111     0 2014-01-14T00:00:00Z

Looks like the param sort is not sent to Solr server

solr_group(q='*:*', group.field='journal', rows=5, group.limit=1, 
group.sort='publication_date desc', sort='error', fl='publication_date', url=url, key=key)
                        groupValue numFound start     publication_date
  1                         plos one   677297     0 2014-01-17T00:00:00Z
  2 plos neglected tropical diseases    19106     0 2014-01-16T00:00:00Z
  3                    plos genetics    33698     0 2014-01-16T00:00:00Z
  4                             none    62518     0 2012-10-23T00:00:00Z
  5                     plos biology    24111     0 2014-01-14T00:00:00Z

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.