Giter Club home page Giter Club logo

solrdump's Introduction

README

Export documents from a SOLR index as JSON, fast and simply from the command line.

Requesting large number of documents from SOLR can lead to Deep Paging problems:

When you wish to fetch a very large number of sorted results from Solr to feed into an external system, using very large values for the start or rows parameters can be very inefficient.

See also: Fetching A Large Number of Sorted Results: Cursors

As an alternative to increasing the "start" parameter to request subsequent pages of sorted results, Solr supports using a "Cursor" to scan through results. Cursors in Solr are a logical concept, that doesn't involve caching any state information on the server. Instead the sort values of the last document returned to the client are used to compute a "mark" representing a logical point in the ordered space of sort values.

Requirements

SOLR 4.7 or higher, since the cursor mechanism was introduced with SOLR 4.7 (2014-02-25) — see also efficient deep paging with cursors.

Project Status: Active – The project has reached a stable, usable state and is being actively developed. https://goreportcard.com/report/github.com/ubleipzig/solrdump

This project has been developed for Project finc at Leipzig University Library.

Installation

Via debian or rpm package.

Or via go tool:

$ go get github.com/ubleipzig/solrdump/...

Usage

$ solrdump -h
Usage of solrdump:
  -fl string
        field or fields to export, separate multiple values by comma
  -q string
        SOLR query (default "*:*")
  -rows int
        number of rows returned per request (default 1000)
  -server string
        SOLR server, host post and collection (default "http://localhost:8983/solr/example")
  -sort string
        sort order (only unique fields allowed) (default "id asc")
  -verbose
        show progress
  -version
        show version and exit

Export id and title field for all documents:

$ solrdump -server https://localhost:8983/solr/biblio -q '*:*' -fl id,title
{"id":"0000001864","title":"Veröffentlichungen des Museums für Völkerkunde zu Leipzig"}
{"id":"0000002001","title":"Festschrift zur Feier des 500jährigen Bestehens der ... /"}
...

Export documents matching a query and postprocess with jq:

$ solrdump -server https://localhost:8983/solr/biblio -q 'title:"topic model"' -fl id,title | \
  jq -r .title | \
  head -10

A generic approach to topic models and its application to virtual communities /
Topic models for image retrieval on large scale databases
On the use of language models and topic models in the web new algorithms for filtering, ...
Integration von Topic Models und Netzwerkanalyse bei der Bestimmung des Kundenwertes
Time dynamic topic models /
...

Instant search as one-liner

Using solrdump + jq + fzf (or peco).

$ solrdump -server http://solr.io/solr/biblio -q 'title:"leipzig"' -fl 'id,source_id,title' | \
    jq -rc '[.source_id, .title[:80]] | @tsv' | fzf -e

...

solrdump's People

Contributors

miku avatar hcoyote avatar pranjalkishor avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.