Giter Club home page Giter Club logo

hoover-search's Introduction

Welcome to Liquid Investigations!

The ‘Liquid investigations’ project is a Google DNI funded project, driven by CRJI.org, where coders and journalists work towards making investigative collaborations less burdensome and more secure.

We are creating an open source digital toolkit, based on existing hardware and software. When fully developed, the kit will allow for distributed data search and sharing, annotations, wiki and chat. While the software can run on any server, focus will be placed on small and portable devices.

Please take a look at our website at https://liquidinvestigations.org

You can find our project wiki here.

hoover-search's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar gabriel-v avatar ioanpocol avatar jarib avatar k-jell avatar mgax avatar mugurrus avatar raduklb avatar radunichita avatar salevajo avatar spiderpig86 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoover-search's Issues

Deduplication

Right now, two documents with identical content are indexede separately. The task is to treat them as a single entity and list all paths where that content shows up in a collection.

YubiKey authentication

The hoover.contrib.twofactor plugin supports TOTP authentication (e.g. with a smartphone app). It should also support YubiKey authentication.

wrong display name in collection column

The collectios in the column are named after the collection slug (with some capitalization). We have a bunch of fields in snoop and/or search that say "Name" and "Description", let's use those instead

elasticsearch throws error: Connection reset by peer

When indexing a collection, after ca. 74000 documents the process for

docker-compose run --rm search ./manage.py update -v2 mycollection

suddenly stops with this error message:

2018-02-08 12:33:35 1 INFO hoover.search.index updating <Collection: mycollection>                                                                                                                                       
2018-02-08 12:33:35 1 INFO hoover.search.index resuming load: {'feed_state': 'http://snoop/htmidi/feed?lt=2018-02-07T04:20:59.559099Z', 'report': {'indexed': 74000}}                                              
2018-02-08 12:34:00 1 WARNING elasticsearch POST http://search-es:9200/_bulk [status:N/A request:1.140s]                                                                                                           
Traceback (most recent call last):                                                                                                                                                                                 
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen                                                                                                                    
    chunked=chunked)                                                                                                                                                                                               
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 356, in _make_request                                                                                                              
    conn.request(method, url, **httplib_request_kw)                                                                                                                                                                
  File "/usr/local/lib/python3.6/http/client.py", line 1239, in request                                                                                                                                            
    self._send_request(method, url, body, headers, encode_chunked)                                                                                                                                                 
  File "/usr/local/lib/python3.6/http/client.py", line 1285, in _send_request                                                                                                                                      
    self.endheaders(body, encode_chunked=encode_chunked)                                                                                                                                                           
  File "/usr/local/lib/python3.6/http/client.py", line 1234, in endheaders                                                                                                                                         
    self._send_output(message_body, encode_chunked=encode_chunked)                                                                                                                                                 
  File "/usr/local/lib/python3.6/http/client.py", line 1065, in _send_output                                                                                                                                       
    self.send(chunk)                                                                                                                                                                                               
  File "/usr/local/lib/python3.6/http/client.py", line 986, in send                                                                                                                                                
    self.sock.sendall(data)                                                                                                                                                                                        
ConnectionResetError: [Errno 104] Connection reset by peer

I assume, there is a very large file in mycollection causing this error.

As a workaround, a max document size could be introduced or a large file split into several pieces.

Run tests on Travis

Since we have a test suite in this repo, we should set up a Travis build.

not displaying search results beyond 10.000

When searching a term that has more then 10.000 hits (e.g. "done"), Hoover will display only up to 10.000 results (no matter if I choose 10 or 1.000 results per page).

When trying to go to the page that would display the messages from 10.001 onwards, an error message is displayed: "Unknown server error while searching". Is this a Hoover search limitation or a Hoover UI ?

The exact same problem I encounter with your UI application @jarib on https://hoover-ui.herokuapp.com/. The message reads "Error: unable to fetch https://hoover-ui.herokuapp.com/search: 500 Internal Server Error"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.