Giter Club home page Giter Club logo

request-sanitizer-component's Introduction

RequestSanitizerComponent

A search component for Solr to sanitize request parameter input

Java CI with Maven

Install

The component is installed using Solr's package manager:

  1. Start Solr with package manager enabled

    # You need to set Java property 'enable.packages=true', e.g.
    bin/solr -c -Denable.packages=true
    
  2. Install this plugin repository into your Solr cluster

    bin/solr package add-repo cominvent https://raw.githubusercontent.com/cominvent/solr-plugins/master
    
  3. Install the package

    # First confirm that the package is in the list
    bin/solr package list-available
    # Install and deploy the plugin
    bin/solr package install request-sanitizer
    # Deploy the package to your collection(s)
    bin/solr package deploy request-sanitizer -y -collections mycoll  
    

Configuration

After install and deploy, the component is installed and ready to use. What remains is to configure your Request Handler(s). In solrconfig.xml:

  1. Add the component as first-component to your /select handler:

    <arr name="first-components">
      <str>request-sanitizer</str>
    </arr>
    
  2. Define sanitizing rules in defaults section of /select handler: These are examples of rules you can apply:

    <str name="sanitize">rows=>100:100</str>
    <str name="sanitize">offset=>10000:10000</str>
    

Available rules

Always override the field, just like invariant

sanitize=rows=25 or sanitize=rows=invariant:25

Map values to other values (if no match found, will use input value):

sanitize=echoParams=alle:all eksplisitt:explicit

Set default value if param is not set

sanitize=debugQuery=default:true

Restrict numeric value to a max limit (if >100 then cap at 100)

sanitize=rows=>100:100

Multiple replacements through multiple http params

sanitize=rows=>100:100&sanitize=offset=>10000:10000

Build

Build with maven:

mvn package

Copy the jar to a place where Solr can find it:

SOLR_HOME=/path/to/solr/home
mkdir $SOLR_HOME/lib
cp target/request-sanitizer-*.jar $SOLR_HOME/lib/

Contributions

The component is licensed under the Apache License, so you can use it freely for anything :)

I hope to extend the component with other useful sanitizing features, see issue tracker.

Pull Requests welcome!

Manual install

This is an alaternative way to install, if you don't want to use package manager:

Download a pre-built jar from releases section. and drop it in your $SOLR_HOME/lib/

Then define the component in solrconfig.xml:

<searchComponent name="request-sanitizer" class="com.cominvent.solr.RequestSanitizerComponent"/>

Now you can configure the component as above

request-sanitizer-component's People

Contributors

janhoy avatar renovate-bot avatar dependabot[bot] avatar

Stargazers

Bill Dueber avatar Hakan Özler avatar  avatar Vincenzo D'Amore avatar Toke Eskildsen avatar Laurent Monin avatar

Watchers

 avatar James Cloos avatar Furkan KAMACI avatar Simon T avatar

Forkers

renovate-bot

request-sanitizer-component's Issues

Sanitizing the 'rows' request parameter results in no documents

I have a solr cloud setup with 16 shards.

I've set up the request sanitizer to limit rows to 1000 with the following in solrconfig.xml:

<str name="sanitize">rows=>1000:1000</str>

This works as expected and limits rows to 1000. However, the rows sanitation is affecting the start request parameter as well.

When I query this URL I see a valid response containing documents:
http://solr-901:8983/solr/journals_dev/select?fl=id&fq=doc_type:full&q=*:*&rows=1000&start=15000&wt=json
However, when I query this URL I see a response containing no documents:
http://solr-901:8983/solr/journals_dev/select?fl=id&fq=doc_type:full&q=*:*&rows=1000&start=16000&wt=json

Notice that the only difference is the start value.

I have determined that this behavior is dictated by the number of shards multiplied by the rows sanitation number. So for my case, 16 shards x 1000 row limit means I will get no results when I query with start > 16,000.

Is this expected behavior, and is there any way I can work around it? We use paging on our website and this will affect any searches that go beyond result 16,000. We still need to limit rows, though.

Thanks!

Simple auto-sanitize mode

Would be nice to add an auto-sanitize mode where it would be enough to say sanitize=auto in order to avoid the most harmful clients out there :) Example:

  • Ignore optimize=true (for update handlers)
  • ignore stream.url, stream.bodyand stream.file params
  • limit rows to 200 by default
  • limit offset to 2000 by default
  • disable qt parameter as it can be misused to switch handlers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.