Giter Club home page Giter Club logo

query-server's Introduction

Query-Server

Build Status Dependency Status Join the chat at https://gitter.im/fossasia/query-server codecov

The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo, Baidu, Exalead, Quora, Parsijoo, Dailymotion, Mojeek and Youtube) and get the results as json, xml or csv. The tool also stores the searched query string in a MongoDB database for analytical purposes.

Deploy to Docker Cloud Deploy Deploy on Scalingo Deploy to Bluemix

Table of Contents

Test Deployment

A test deployment of the project is available here: https://query-server.herokuapp.com

API

The API(s) provided by query-server are as follows:

GET /api/v1/search/<search-engine>?query=query&format=format

search-engine : [google, ask, bing, duckduckgo, yahoo, baidu, exalead, quora, youtube, parsijoo, mojeek, dailymotion]

query : query can be any string

format : [json, xml, csv]

A sample query : /api/v1/search/bing?query=fossasia&format=xml&num=10

Error Codes

404 Not Found : Incorrect Search Engine, Zero Response
400 Bad Request : query and/or format is not in the correct format
500 Internal Server Error : Server Error from Search Engine

Dependencies

Installation

  1. Local Installation

  2. Deployment on Heroku

  3. Deployment with Docker

Contribute

Found an issue? Post it in the issue tracker For pull requests please read Open Source Developer Guide and Best Practices at FOSSASIA

License

This project is currently licensed under the Apache License version 2.0. A copy of LICENSE should be present along with the source code. To obtain the software under a different license, please contact FOSSASIA.

query-server's People

Contributors

abhishek1995s avatar afrozas avatar anshulmalik avatar bhaveshan avatar dgarvit avatar dilraj45 avatar djmgit avatar dragneel7 avatar gabru-md avatar gitter-badger avatar imujjwal96 avatar jajodiaraghav avatar krtkvrm avatar mariobehling avatar nikhilkumarsingh avatar nikhilrayaprolu avatar niranjan94 avatar parths007 avatar prabhakarpd7284 avatar raju249 avatar remorax avatar rupav avatar s2606 avatar shashank-sharma avatar starlord1311 avatar treejoker avatar umangahuja1 avatar vaibhavsingh97 avatar warusadura avatar yashladha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

query-server's Issues

Combine server and scrapper onto a single platform

Right now we have the following:

  • Server in Node.js
  • Scrapper in python

and the server executes the python as a process. Which is not efficient at all. Imagine once this is integrated into the Open Event project. There might be many API calls coming to the query-server and each of these API calls will be spawning a new python process. We'll be having so many process spawns (one for each API call).

This will also make adding unit tests etc easier since all is in a single platform.

Instead, we are looking to implement the following

  • Server also in python (A micro-framework or a basic server would do. Flask or werkzeug is recommended). The scrapper will be a python module in the server. (No more process spawns for API calls. Response would be quicker too)
  • Dockerfile needs to be adjust accordingly

add a field for number of results

Referring to Issue : #59

I suggest If there could be a drop-down for the number of results.
As it could be user's choice to get a particular number of data as results.

Thanks!

Improve scraper to provide teaser text

Currently the teaser text is being shown only for BING on query-server, but it isn't being shown for other search engines. Please improve the scraper to provide teaser text for Google and other search engines as well.

Repeatation of query strings in query_list.txt

In query_list.txt same query strings are stored multiple times. For example, searching the following :
"fossasia", "openspource","opensource is fun" one after the other stores the following in query_list.txt :


fossasia

fossasia
opensource

opensource
fossasia
opensource is fun

(I deleted the existing contents of query_list.txt before executing the above)

I think this is happening because the entire query list is being appended to the query_list.txt file each time we make a new search

I am not sure whether it is bug or it has been done on purpose.
If its a bug I would love to send a PR :)

Re-organize the repository

@enigmaeth

Let's move all the app code into an src directory.
Let the root have all the meta files such as .travis.yml, package.json, requirements.txt etc

Return appropriate error responses.

  • Incorrect search engine - return 404 - Not Found
  • Zero responses - return 404 - Not Found
  • if format is something other than xml or json - return 400 - Bad Request
  • if query is empty or missing - return 400 - Bad Request

Along with the proper HTTP Status code header, the body should also contain the error message in either json or xml (depending on what was asked by the user)

Implement Yaydoc docs generation in query-server

Yaydoc is FOSSASIA's own automatic documentation generation and deployment project. At the crux of it, Yaydoc generates a documentation website using the markdown present in a project's repository and keeps the website in sync with the changes made in project's documentation.

The project has been under development since the start of GSoC 2017 and we believe that it can be brought into production. For that, we'd like to start with the Query-server project. The project requires a .yaydoc.yml configuration file at the project's root directory. Adding necessary information in the configuration file and registering the project at https://yaydoc.herokuapp.com/ will be enough to generate the documentation website.

Are there unused dependancies in requirements.txt?

README.md currently lists the project dependencies as:
Dependencies

  • Python 2.x or Python 3.x
  • Node.js
  • Pip
  • Flask
  • BeautifulSoup4

But requirements.txt currently list many more dependencies.

This repo only contains two Python scripts and a quick read of them says that they use:

  • server.py: dicttoxml, flask, and pymongo
  • scraper.py: requests, BeautifulSoup4

Could the following dependencies be safely removed from requirements.txt?

coverage>=4.3.4  # could be pip installed in .travis.yml but not shipped in production
coveralls>=1.1  # could be pip installed in .travis.yml but not shipped in production
feedgen>=0.5.1
futures>=3.0.5
html5lib>=0.9999999
Jinja2>=2.9.5
librabbitmq>=1.5.1
mechanize>=0.2.5
pytest>=3.0.6  # there are no tests in the repo
pytest-cov>=2.4.0  # no pytest means no pytest-cov?
webencodings>=0.5

Responsiveness of website

Current Behaviour:
As of now, there is no/less responsiveness as in mobile mode searchbox disappears, and other buttons behaves abnormally.
image
image
image
image

Expected Behaviour:
Website should adapt to various screen sizes

I am interseted in working on this issue

Use Duckduckgo instead of google

There are several reasons to support duckduckgo instead of google for your query server:

  • As of now, there is no python 3 version of Mechanize library.
  • Dependency of mechanize library is removed if we try to scrape the duckduckgo search results.
  • Your IP is more prone to getting blocked by google as compared to duckduckgo if you exceed the no. of permissible requests.

So, should I make a pull request with a modified version of query search using duckduckgo?

Migration of Bootstrap v3 to Bootstrap v4

Need of migration:

  • New 5 tier grid system for smaller screens
    Bootstrap has a sophisticated responsive grid system that allows developers to target devices with different viewports. Bootstrap 3 currently has 4 grid classes for columns, .col-xs-XX for mobile phones, .col-sm-XX for tablets, .col-md-XX for desktops, and .col-lg-XX for larger desktops. Bootstrap 4 will enhance the grid system with a fifth one that will facilitate developers to target smaller devices under 480px viewport width.

  • Relative CSS units
    Instead of pixels, the new major release will use REMs and EMs that make it possible to implement responsive typography on Bootstrap sites. This will also increase readability, and make sites more accessible for disabled users.

  • Bootstrap cards
    Introduced new UI components called cards. Cards will replace the former wells, thumbnails and panels, and will provide users with a more streamlined workflow.

  • Flexbox support
    Bootstrap 4 makes it possible to take leverage of CSS3โ€™s Flexbox Layout and makes use of the float and display CSS properties to implement a fluid layout.

  • All the JavaScript plugins have been rewritten to support ES6. This means you have better plugin support in Bootstrap 4 and can take advantage of the new ES6 features

Benefit: Overall UI improvements + Responsiveness

I want to work on this issue

Api for query server

@niranjan94 I am currently working on creating an api for query server. Can you please specify the parameters to be passed to query server by external user to get the response.

Update README

The readme contains html5lib and mechanize as dependencies. Now that we're using requests, we don't require html5lib and mechanize, hence the readme should be revamped.

@mariobehling May I change the readme and also add workflow gifs,etc just like in searss?

queries are not being added to database

The "things" being searched are not added to the database..........

screen shot 2017-09-11 at 8 14 42 pm

the condition inside this if is wrong.

screen shot 2017-09-11 at 8 17 56 pm

in the above screen shot the databases are not actually made since the mongoDB doesn't create a database until and unless something is inserted to it...........

i would like to update the code so that it starts adding the searched queries to database

Look and feel of website

Current Behaviour:
All buttons looks same

Expected Behaviour:
Buttons should have logo of the search engine

I am interested in working on this issue

The search results sometimes provide a link with a cryptic ending

Some search results have a cryptic addition at the end, e.g. as follows
<item> <title>FOSSASIA Summit 2017 | Asia's Open Technology Event about ...</title> <link>http://2017.fossasia.org//RK=1/RS=KKHGnF8gw2hmJFCYS8.5sUEeQ6o-</link> </item> <item>
Please find the reason for this and take it out.
screenshot from 2017-06-08 10-25-04

app not executing on Python 3.x

README.md shows that :

Dependencies

  • Python 2.x or Python 3.x
  • Node.js
  • Pip
  • Flask
  • BeautifulSoup4

But dependencies using pip could not be installed for python3.x .

Quoting the error message :

File "/tmp/pip-build-s4ac1_v3/librabbitmq/setup.py", line 191
except Exception, exc:

This is because librabbitmq is not supported in Python3.x version.
Please change the README.md .

Common Results

Add a tab to show 10 results that are common to all the search engines for the stated query.

Update README file

The README file contains instructions pertaining to the old command line based usage. Update it according to the current state.

update Readme and Handle TypeError

  • update readme.md with the latest parameters for the parameters, i.e. including the number of results to be searched for.
  • Handling TypeError for the same , whenever a 500 request is sent from the server.

I would like to work on this issue.
Thanks :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.