fossasia / query-server Goto Github PK

Query Server Search Engines

License: Apache License 2.0

Python 73.60% HTML 21.28% CSS 4.17% Dockerfile 0.95%

query-server's Introduction

Query-Server

The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo, Baidu, Exalead, Quora, Parsijoo, Dailymotion, Mojeek and Youtube) and get the results as json, xml or csv. The tool also stores the searched query string in a MongoDB database for analytical purposes.

Test Deployment
API
Error Codes
Dependencies
Installation
Contribute

Test Deployment

A test deployment of the project is available here: https://query-server.herokuapp.com

API

The API(s) provided by query-server are as follows:

GET /api/v1/search/<search-engine>?query=query&format=format

search-engine : [google, ask, bing, duckduckgo, yahoo, baidu, exalead, quora, youtube, parsijoo, mojeek, dailymotion]

query : query can be any string

format : [json, xml, csv]

A sample query : /api/v1/search/bing?query=fossasia&format=xml&num=10

Error Codes

404 Not Found : Incorrect Search Engine, Zero Response
400 Bad Request : query and/or format is not in the correct format
500 Internal Server Error : Server Error from Search Engine

Dependencies

MongoDB
Python 2.7
- BeautifulSoup4
- dicttoxml
- Flask
- pymongo
- requests
Node.js
- bower.io

Installation

Contribute

Found an issue? Post it in the issue tracker For pull requests please read Open Source Developer Guide and Best Practices at FOSSASIA

License

This project is currently licensed under the Apache License version 2.0. A copy of LICENSE should be present along with the source code. To obtain the software under a different license, please contact FOSSASIA.

query-server's People

Contributors

Stargazers

Watchers

Forkers

duongnam nikhilkumarsingh djmgit kshitijsingla rhemon kalbhor alejoheredia nglexuen winson8 dhruvkumarverma ngyifei truongdiv afrozas gitter-badger abhishek1995s ramolaweb niccokunzmann fazeem84 nikhilrayaprolu mridulnagpal alirizwi guptanitesh karandeepsj bruce-wayne99 samyak0210 surana-mudit pranaygupta36 kalpit4088 saniiit obliviateandsurrender him98 ishucr7 djinn-anthrope yudhik11 pratik0509 samyakag surendratelidevara palash16 anveshc05 shucon saivenkat09 natsuset apoorav2923 sravyapulle aditya3498 sakethkhandavalli shovan312 rohan750 7vikpeculiar sandeepkallepalli nirvansinghania agentcap openngo anirudh458 sahil227 bournvita1998 ashwinir12345 newbass udaysd7897 vaibhavb26 koushikayila nikhil3456 aniketshrimal prathyakshun srikeshav koushik999999 gupudiharsha pratikmandlecha apoorav29 swatiiiit abhisagar srinadhupreetham sparsh789 anishgg anji48 mvtej gulshan-mittal dakshlalwani devanshg27 aamirfarhan vikrant1697 troublemagnet social-being pradeepppc newbazz sachin-chandani awesome-archive ksubbu199 animireddy firesans rv1996 shreyanshdwivedi vaibhavsingh97 gauravkulkarni96 vibhor98 sounak98 man-jain tameeshb lokesh97 mohdomama

query-server's Issues

add a youtube support

this would be an enhancement to the existing project

Write Tests

We don't have any tests at the moment and would love to test our scrapers.

Let's discuss the choice of tester.

Unittest https://docs.python.org/2/library/unittest.html

Store the results in a database rather than in files

Currently, a file contains all the queried strings and the results of each query are stored in separate files.

Add a database to store the queries and strings.

Improve scraper to provide results for image, videos and file-types

Image and video results don't have any endpoint in API of query server currently, which specifically takes file-type as input. Please implement it to allow us to include a query service for images and videos as well.

Combine server and scrapper onto a single platform

Right now we have the following:

Server in Node.js
Scrapper in python

and the server executes the python as a process. Which is not efficient at all. Imagine once this is integrated into the Open Event project. There might be many API calls coming to the query-server and each of these API calls will be spawning a new python process. We'll be having so many process spawns (one for each API call).

This will also make adding unit tests etc easier since all is in a single platform.

Instead, we are looking to implement the following

Server also in python (A micro-framework or a basic server would do. Flask or werkzeug is recommended). The scrapper will be a python module in the server. (No more process spawns for API calls. Response would be quicker too)
Dockerfile needs to be adjust accordingly

Set up Yacy Grid and Query Server and add Query Server to provide service

One goal of Query Server should provide search results to the Yacy Grid server. The results can be used as way to verify results or add an additional layer of search results.

Please set up Yacy Grid (https://github.com/yacy/yacy_grid_mcp) and check possibilities to run Query Server as a component.

Create a LICENSE file

Please choose an open-source license for the project.

Set up travis and other tests

We have added a starter file for travis. Please help to make travis work and add other tests.

Increase number of results to 100

Currently search results are limited to 10 results. Please increase the output to 100 results per search.

Travis .yml file is incorrect or doesn't exist.

There currently doesn't seem to be a travis yml file for this repository, hence the pull requests run against some ruby tests (?). Lets add proper tests.

add a field for number of results

Referring to Issue : #59

I suggest If there could be a drop-down for the number of results.
As it could be user's choice to get a particular number of data as results.

Thanks!

pressing the active format button should not make it inactive

when an active format button is clicked, it makes it inactive and sets format=None in the request parameters. this is leading to an error 400 error.
before pressing the button

after pressing the button

Improve scraper to provide teaser text

Currently the teaser text is being shown only for BING on query-server, but it isn't being shown for other search engines. Please improve the scraper to provide teaser text for Google and other search engines as well.

Repeatation of query strings in query_list.txt

In query_list.txt same query strings are stored multiple times. For example, searching the following :
"fossasia", "openspource","opensource is fun" one after the other stores the following in query_list.txt :


fossasia

fossasia
opensource

opensource
fossasia
opensource is fun

(I deleted the existing contents of query_list.txt before executing the above)

I think this is happening because the entire query list is being appended to the query_list.txt file each time we make a new search

I am not sure whether it is bug or it has been done on purpose.
If its a bug I would love to send a PR :)

Add deploy to docker button

Button for uploading to docker cloud

The query server does not give an xml output

No XML output yet, but it should.

Re-organize the repository

@enigmaeth

Let's move all the app code into an src directory.
Let the root have all the meta files such as .travis.yml, package.json, requirements.txt etc

Return appropriate error responses.

Incorrect search engine - return 404 - Not Found
Zero responses - return 404 - Not Found
if format is something other than xml or json - return 400 - Bad Request
if query is empty or missing - return 400 - Bad Request

Along with the proper HTTP Status code header, the body should also contain the error message in either json or xml (depending on what was asked by the user)

Change administrative branch policy?

fossasia/badgeyay#99
@niccokunzmann

Implement Yaydoc docs generation in query-server

Yaydoc is FOSSASIA's own automatic documentation generation and deployment project. At the crux of it, Yaydoc generates a documentation website using the markdown present in a project's repository and keeps the website in sync with the changes made in project's documentation.

The project has been under development since the start of GSoC 2017 and we believe that it can be brought into production. For that, we'd like to start with the Query-server project. The project requires a .yaydoc.yml configuration file at the project's root directory. Adding necessary information in the configuration file and registering the project at https://yaydoc.herokuapp.com/ will be enough to generate the documentation website.

Develop different scrapers that can plug into the query server

As per the discussion on gitter channel by @mariobehling different scrapers for each domain, similar to the loklak server be implemented.

Number of queries fetched from duckduckgo is not equal to the num value

As shown in image even if the value of num is set to 10 we are getting 30 results.
URL: http://0.0.0.0:7001/api/v1/search/duckduckgo?query=fossasia&format=json&num=10
Issue with Duckduckgo search engine only. For rest it's working fine.

Add Bing as an option

If its alright, I would like to add bing as an option for the search engines

Add a Graphical User Interface

The server is a command line tool currently. Add a GUI so that when deploying this server as a service, we can have a landing/home page similar to the one at https://loklak.org/ or http://api.asksusi.com/ .

AttributeError for DuckDuckGo search

I have mentioned this problem in searss in fossasia/searss#13 and also have done a PR with its fix fossasia/searss#14

If the PR there is accepted, if allowed I would also like to update the rss-generator.py here.

Python3.x: Making scrapers Python3 exclusive and adhere to PEP 8 standards

Given compatibility issues, all scrapers and the app in general should be ported to Python3.

Update all functions and modules to Python 3.
Adhere to PEP 8 styling standards.

Provide json output along with xml output

to be done after #45

Are there unused dependancies in requirements.txt?

README.md currently lists the project dependencies as:
Dependencies

Python 2.x or Python 3.x
Node.js
Pip
Flask
BeautifulSoup4

But requirements.txt currently list many more dependencies.

This repo only contains two Python scripts and a quick read of them says that they use:

server.py: dicttoxml, flask, and pymongo
scraper.py: requests, BeautifulSoup4

Could the following dependencies be safely removed from requirements.txt?

coverage>=4.3.4  # could be pip installed in .travis.yml but not shipped in production
coveralls>=1.1  # could be pip installed in .travis.yml but not shipped in production
feedgen>=0.5.1
futures>=3.0.5
html5lib>=0.9999999
Jinja2>=2.9.5
librabbitmq>=1.5.1
mechanize>=0.2.5
pytest>=3.0.6  # there are no tests in the repo
pytest-cov>=2.4.0  # no pytest means no pytest-cov?
webencodings>=0.5

Responsiveness of website

Current Behaviour:
As of now, there is no/less responsiveness as in mobile mode searchbox disappears, and other buttons behaves abnormally.

Expected Behaviour:
Website should adapt to various screen sizes

I am interseted in working on this issue

Use Duckduckgo instead of google

There are several reasons to support duckduckgo instead of google for your query server:

As of now, there is no python 3 version of Mechanize library.
Dependency of mechanize library is removed if we try to scrape the duckduckgo search results.
Your IP is more prone to getting blocked by google as compared to duckduckgo if you exceed the no. of permissible requests.

So, should I make a pull request with a modified version of query search using duckduckgo?

Migration of Bootstrap v3 to Bootstrap v4

Need of migration:

New 5 tier grid system for smaller screens
Bootstrap has a sophisticated responsive grid system that allows developers to target devices with different viewports. Bootstrap 3 currently has 4 grid classes for columns, .col-xs-XX for mobile phones, .col-sm-XX for tablets, .col-md-XX for desktops, and .col-lg-XX for larger desktops. Bootstrap 4 will enhance the grid system with a fifth one that will facilitate developers to target smaller devices under 480px viewport width.
Relative CSS units
Instead of pixels, the new major release will use REMs and EMs that make it possible to implement responsive typography on Bootstrap sites. This will also increase readability, and make sites more accessible for disabled users.
Bootstrap cards
Introduced new UI components called cards. Cards will replace the former wells, thumbnails and panels, and will provide users with a more streamlined workflow.
Flexbox support
Bootstrap 4 makes it possible to take leverage of CSS3’s Flexbox Layout and makes use of the float and display CSS properties to implement a fluid layout.
All the JavaScript plugins have been rewritten to support ES6. This means you have better plugin support in Bootstrap 4 and can take advantage of the new ES6 features

Benefit: Overall UI improvements + Responsiveness

I want to work on this issue

Api for query server

@niranjan94 I am currently working on creating an api for query server. Can you please specify the parameters to be passed to query server by external user to get the response.

Update README

The readme contains html5lib and mechanize as dependencies. Now that we're using requests, we don't require html5lib and mechanize, hence the readme should be revamped.

@mariobehling May I change the readme and also add workflow gifs,etc just like in searss?

what are all need to be done to get query-server deployable

@enigmaeth @mariobehling
this is in regards to issue fossasia/open-event-server#2675
so what all are left to get the query server hosted?
If it is deployable. can I work on deploying the query-server

Set proper content-type headers when returning data

Right now, for both json and xml the return content type is text/html. Ensure proper content type and no-cache headers are set.

librabbitmq contains at least one Python 3 syntax error

The version of librabbitmq that this project is using contains at least one Python 3 syntax error.

See: celery/librabbitmq#99

https://travis-ci.org/fossasia/query-server/jobs/275757559#L479

    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-eiupkwho/librabbitmq/setup.py", line 191
        except Exception, exc:
                        ^
    SyntaxError: invalid syntax

Empty search result for Yahoo

Yahoo search is not working.

More info here: #40 (comment)

Syntax-highlighting missing for JSON response

queries are not being added to database

The "things" being searched are not added to the database..........

the condition inside this if is wrong.

in the above screen shot the databases are not actually made since the mongoDB doesn't create a database until and unless something is inserted to it...........

i would like to update the code so that it starts adding the searched queries to database

Look and feel of website

Current Behaviour:
All buttons looks same

Expected Behaviour:
Buttons should have logo of the search engine

I am interested in working on this issue

TypeError on query-server.herokuapp.com

I used this query: https://query-server.herokuapp.com/api/v1/search/google?query=anshul&format=json&num=10

And this is what I got :

Auto-deploy query server on merged pull request on Heroku

Please set up the query server on Heroku from the master branch and add the info to the Readme.md.

The search results sometimes provide a link with a cryptic ending

Some search results have a cryptic addition at the end, e.g. as follows
<item> <title>FOSSASIA Summit 2017 | Asia's Open Technology Event about ...</title> <link>http://2017.fossasia.org//RK=1/RS=KKHGnF8gw2hmJFCYS8.5sUEeQ6o-</link> </item> <item>
Please find the reason for this and take it out.

Adding ask.com support

This will be an enhancement to the existing project

app not executing on Python 3.x

README.md shows that :

Dependencies

Python 2.x or Python 3.x

Node.js

Pip

Flask

BeautifulSoup4

But dependencies using pip could not be installed for python3.x .

Quoting the error message :

File "/tmp/pip-build-s4ac1_v3/librabbitmq/setup.py", line 191
except Exception, exc:

This is because librabbitmq is not supported in Python3.x version.
Please change the README.md .

update readme.md with the latest parameters for the parameters, i.e. including the number of results to be searched for.
Handling TypeError for the same , whenever a 500 request is sent from the server.

I would like to work on this issue.
Thanks :D

fossasia / query-server Goto Github PK

query-server's Introduction

Query-Server

Table of Contents

Test Deployment

API

Error Codes

Dependencies

Installation

Contribute

License

query-server's People

Contributors

Stargazers

Watchers

Forkers

query-server's Issues

Dependencies

Recommend Projects

Recommend Topics

Recommend Org