adamlwgriffiths / amazon_scraper Goto Github PK

Provides content not accessible through the standard Amazon API

License: Other

Python 100.00%

amazon_scraper's Introduction

Amazon Scraper

A Hybrid Web scraper / API client. Supplements the standard Amazon API with web scraping functionality to get extra data. Specifically, product reviews.

Uses the Amazon Simple Product API to provide API accessible data. API search functions are imported directly into the amazon_scraper module.

Parameters are in the same style as the Amazon Simple Product API, which in turn uses Bottlenose style parameters. Hence the non-Pythonic parameter names (ItemId).

The AmazonScraper constructor will pass 'args' and 'kwargs' to Bottlenose (via Amazon Simple Product API). Bottlenose supports AWS regions, queries per second limiting, query caching and other nice features. Please view Bottlenose' API for more information on this.

The latest version of python-amazon-simple-product-api (1.5.0 at time of writing), doesn't support these arguemnts, only Region. If you require these, please use the latest code from their repository with the following command:

pip install git+https://github.com/yoavaviram/python-amazon-simple-product-api.git#egg=python-amazon-simple-product-api

Caveat

Amazon continually try and keep scrapers from working, they do this by:

A/B testing (randomly receive different HTML).
Huge numbers of HTML layouts for the same product categories.
Changing HTML layouts.
Moving content inside iFrames.

Amazon have resorted to moving more and more content into iFrames which this scraper can't handle. I envisage a time where most data will be inaccessible without more complex logic.

I've spent a long time trying to get these scrapers working and it's a never ending battle. I don't have the time to continually keep up the pace with Amazon. If you are interested in improving Amazon Scraper, please let me know (creating an issue is fine). Any help is appreciated.

Installation

pip install amazon_scraper

Dependencies

Examples

All Products All The Time

Create an API instance:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here")

The creation function accepts 'kwargs' which are passed to 'bottlenose.Amazon' constructor:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here", Region='UK', MaxQPS=0.9, Timeout=5.0)

Search:

>>> from __future__ import print_function
>>> import itertools
>>> for p in itertools.islice(amzn.search(Keywords='python', SearchIndex='Books'), 5):
>>>     print(p.title)
Learning Python, 5th Edition
Python Programming: An Introduction to Computer Science 2nd Edition
Python In A Day: Learn The Basics, Learn It Quick, Start Coding Fast (In A Day Books) (Volume 1)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Cookbook

Lookup by ASIN/ItemId:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.url
http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top

Batch Lookups:

>>> for p in amzn.lookup(ItemId='B0051QVF7A,B007HCCNJU,B00BTI6HBS'):
>>>     print(p.title)
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
Kindle, 6" E Ink Display, Wi-Fi - Includes Special Offers (Black)
Kindle Paperwhite 3G, 6" High Resolution Display with Next-Gen Built-in Light, Free 3G + Wi-Fi - Includes Special Offers

By URL:

>>> p = amzn.lookup(URL='http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.asin
B0051QVF7A

Product Ratings:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.ratings
[8, 4, 6, 4, 13]

Alternative Bindings:

>>> p = amzn.lookup(ItemId='B000GRFTPS')
>>> p.alternatives
['B00IVM5X7E', '9163192993', '0899669433', 'B00IPXPQ9O', '1482998742', '0441444814', '1497344824']
>>> for asin in p.alternatives:
>>>     alt = amzn.lookup(ItemId=asin)
>>>     print(alt.title, alt.binding)
The King in Yellow Kindle Edition
The King in Yellow Unknown Binding
King in Yellow Hardcover
The Yellow Sign Audible Audio Edition
The King in Yellow MP3 CD
THE KING IN YELLOW Mass Market Paperback
The King in Yellow Paperback

Supplemental text not available via the API:

>>> p = amzn.lookup(ItemId='0441016685')
>>> p.supplemental_text
[u"Bob Howard is a computer-hacker desk jockey ... ", u"Lovecraft\'s Cthulhu meets Len Deighton\'s spies ... ", u"This dark, funny blend of SF and ... "]

Review API

View lists of reviews:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = p.reviews()
>>> rs.asin
B0051QVF7A
>>> # print the reviews on this first page
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']
>>> rs.url
http://www.amazon.com/product-reviews/B0051QVF7A/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending
>>> # iterate over reviews on this page only
>>> for r in rs.brief_reviews:
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...
>>> # iterate over all brief reviews on all pages
>>> for r in rs:
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...

View detailed reviews:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> # this will iterate over all reviews on all pages
>>> # each review will require a download as it is on a seperate page
>>> for r in rs.full_reviews():
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...

Convert a brief review to a full review:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> # this will iterate over all reviews on all pages
>>> # each review will require a download as it is on a seperate page
>>> for r in rs:
>>>     print(r.id)
>>>     fr = r.full_review()
>>>     print(fr.id)

Quickly get a list of all reviews on a review page using the all_reviews property. This uses the brief reviews provided on the review page to avoid downloading each review separately. As such, some information may not be accessible:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = p.reviews()
>>> all_reviews_on_page = list(rs)
>>> len(all_reviews_on_page)
10
>>> r = all_reviews_on_page[0]
>>> r.title
'Fantastic device - pick your Kindle!'
>>> fr = r.full_review()
>>> fr.title
'Fantastic device - pick your Kindle!'

By ASIN/ItemId:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> rs.asin
B0051QVF7A
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']

For individual reviews use the review method:

>>> review_id = 'R3MF0NIRI3BT1E'
>>> r = amzn.review(Id=review_id)
>>> r.id
R3MF0NIRI3BT1E
>>> r.asin
B00492CIC8
>>> r.url
http://www.amazon.com/review/R3MF0NIRI3BT1E
>>> r.date
2011-09-29 18:27:14+00:00
>>> r.author
FreeSpirit
>>> r.text
Having been a little overwhelmed by the choices between all the new Kindles ... <snip>

By URL:

>>> r = amzn.review(URL='http://www.amazon.com/review/R3MF0NIRI3BT1E')
>>> r.id
R3MF0NIRI3BT1E

User Reviews API

This package also supports getting reviews written by a specific user.

Get reviews that a single author has created:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> ur.brief_reviews
>>> ur.name
>>> fr = list(ur.brief_reviews)[0].full_review()

Get reviews for a user, from a review object

>>> r = amzn.review(Id="R3MF0NIRI3BT1E") >>> # we can get the reviews directly, or via the API with a URL or ID >>> ur = r.user_reviews() >>> ur = amzn.user_reviews(URL=r.author_reviews_url) >>> ur = amzn.user_reviews(Id=r.author_id) >>> ur.brief_reviews >>> ur.name

Iterate over the current page's reviews:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> for r in ur.brief_reviews:
>>>     print(r.id)

Iterate over all author reviews:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> for r in ur:
>>>     print(r.id)

Authors

Adam Griffiths

Greg Rehm

amazon_scraper's People

Contributors

Stargazers

Watchers

amazon_scraper's Issues

Random stopping on multiples of 10

I am randomly having stops occur as I try to scrape all the reviews for a product (using 'reviews'). I don't receive an error, and the 'soup' output for the last review it scraped doesn't seem all that informative (i.e., it looks like the reviews before it). This doesn't happen for specific products in particular nor does it occur after the same review each time. Specifically, sometimes if I re-scrape the same product, it starts or stops at a different review and sometimes all of the reviews will be scraped. If I am scraping multiple products one-after-the-other, the scraper will just continue on to the next product, even though it didn't scrape all the reviews for the first product. Finally, the scraper tends to stop on multiples of 10 (e.g., 80, 110, etc). This makes me believe it has something to do with continuing on to the next page.

Here is the code I'm using (along with a product ID where the scraper randomly stopped):

p = amzn.lookup(ItemId='B008LX6OC6') #also ItemID='B000F8EUFI'
rs = p.reviews()
for review in rs:
    print review.asin
    print review.url
    print review.soup

Can't get ASIN for reviews

I'm running into an odd error when trying to get the ASIN for reviews. I get an RS object, but oddly can't get at the ASIN despite the fact it has that attribute on the span. Any help?

...
RS value: <amazon_scraper.reviews.Reviews object at 0x10c0c9890>
...

Here's my traceback:

Traceback (most recent call last):
    File "amazon_reviews.py", line 112, in <module>
scrapeAmazonReviews(filepath, title, amzn, url)
    File "amazon_reviews.py", line 83, in scrapeAmazonReviews
print "RS ASID: %s" % rs.asin
    File "/Users/pbeeson/.virtualenvs/custom_data_pulls/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 41, in asin
return unicode(span['name'])
    TypeError: 'NoneType' object has no attribute '__getitem__'

Average Review Rating

I didn't see this existed, so I tried parsing it out myself.

avg_rating = float(rs.soup.find("i", {'class':re.compile('averageStarRating')}).text.split(' ')[0])

Should I submit a pull request?

Review ids are not being parsed correctly

with url: http://www.amazon.com/product-reviews/1449355730/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending

I did:

from amazon_scraper import AmazonScraper
amzn = AmazonScraper(stuff...)
revs = amz.reviews(URL="http://www.amazon.com/product-reviews/1449355730/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending")
revs.ids

I get an empty list. The cause might be that amazon changed their html? I'd like to make the change

     @property
     def ids(self):
         return [
-            extract_review_id(anchor['href'])
-            for anchor in self.soup.find_all('a', text=re.compile(ur'permalink', flags=re.I))
+            anchor["id"]
+            for anchor in self.soup.find_all('div', class_="a-section review")
         ]

This matches up a bit more closely with amazon html which looks like

<div id="R2UBSL6L1T8MIF" class="a-section review"><div class="a-row helpful-votes-count"></div>
...

Fix up tests

I'm going to try to take on fixing up the tests and getting them to work properly. Right now there are basic issues just getting the tests to run and we need to turn off the MaxQPS property in order to get them to go. Also there are three test failures and several errors that can be fixed. Hell maybe I'll even figure out how to throw travis on here; but that can be a separate ticket.

Page sometimes not loading?

I'm having sporadic trouble when extracting the ASIN using reviews/full_review:

rs = amzn.reviews(ItemId='006001203X')
fr = r.full_review()
myfile.write("%s," % (fr.asin))

I'm sometimes getting the error:

asin = unicode(tag.string)
AttributeError: 'NoneType' object has no attribute 'string'

My guess is that I'm not getting the content of the page when this error is occurring because the individual review's URL is passed on correctly (fr.url) and I can see that the content exists in my browser, but I am getting "None" when asking for the text of the review (fr.text). Furthermore, sometimes the scraper errors on a specific review and sometimes it doesn't, again making me think this is a loading issue.

In case it helps, I'm using the scraper in conjunction with Tor and PySocks (maybe not necessary?). What would lead to pages sometimes not loading? Any solutions to this issue?

*UPDATE: *

Here is some output when just printing out the reviews (rather than writing them). The format is the review URL followed by the text. What you'll notice is that "None" just seems to appear randomly and when you visit the actual page, there is writing there.

http://www.amazon.com/review/R1GLFST9IJDL3Z
None
http://www.amazon.com/review/R3O5KSEJ5BONJ7
Written by Dr. Atkins, this book is definitely a good way to get started on the diet. My only reservation is that he spends an awful long time convincing the reader to start the diet. But a good resource for a low/no carb diet.
http://www.amazon.com/review/R353I88IYNVGZJ
Thank you it is what I was looking for
http://www.amazon.com/review/R22GIPYTEYX7IK
None

Also, I have seen this happen both with and without using Tor/PySocks.

ImportError: No module named tests

Hi Adam,

first of all - thanks for sharing this project! I am having a small issue. This is my first python project so please bear with me. I cloned this project on a c9.io environment and I am getting a few errors when running the tests, namely this one:

ImportError: No module named tests

I would be very grateful if you could point me in the right direction.

Thanks

-Dat

Edit: things I have tried: Changing the CWD, reinstalling all the dependencies

Add ability to set amazon_base

Regardless of the Region parameter, product reviews are always tried to be fetched from http://www.amazon.com/product-reviews/ due to the amazon_base constant.
I'd suggest that this base url either is set dynamically, depending on the Region or can be specified as a paramter, without having to statically overwrite the variable.

help

hello

im new to coding and want to set up script to pull prices from warehousedeals.com for items that are 70%+ off vs new amazon price, can someone assist me how to do this.
so far i have download python 2 and notepad++

Missing something obvious: urllib 400 error on simple requests

When I try to run the scraper using a simple test and request as follows, it consistently fails as show with this error traceback. Any idea where I am going awry? Apologies for my idiocy here, but not obvious to me why urllib is throwing a 400 error.

from amazon_scraper import AmazonScraper
amzn = AmazonScraper("XXXX", "XXXX", "XXXX")
import itertools
for p in itertools.islice(amzn.search(Keywords='python', SearchIndex='Books'), 5):
   print p.title

And here is the result:

Traceback (most recent call last):
  File "amazon-scraper.py", line 4, in <module>
    for p in itertools.islice(amzn.search(Keywords='python', SearchIndex='Books'), 5):
  File "/Users/pk/anaconda/lib/python2.7/site-packages/amazon_scraper/__init__.py", line 188, in search
    for p in self.api.search(**kwargs):
  File "/Users/pk/anaconda/lib/python2.7/site-packages/amazon/api.py", line 519, in __iter__
    for page in self.iterate_pages():
  File "/Users/pk/anaconda/lib/python2.7/site-packages/amazon/api.py", line 535, in iterate_pages
    yield self._query(ItemPage=self.current_page, **self.kwargs)
  File "/Users/pk/anaconda/lib/python2.7/site-packages/amazon/api.py", line 548, in _query
    response = self.api.ItemSearch(ResponseGroup=ResponseGroup, **kwargs)
  File "/Users/pk/anaconda/lib/python2.7/site-packages/bottlenose/api.py", line 242, in __call__
    {'api_url': api_url, 'cache_url': cache_url})
  File "/Users/pk/anaconda/lib/python2.7/site-packages/bottlenose/api.py", line 203, in _call_api
    return urllib2.urlopen(api_request, timeout=self.Timeout)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/Users/pk/anaconda/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

Problem with InsecureRequest

/home/####/.local/lib/python2.7/site-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)

This is the warning i get on trying this :

from amazon_scraper import AmazonScraper amzn = AmazonScraper(I've passed correct arguments) rs = amzn.reviews(ItemId='B0734X8GW5') for r in rs.ids: rvn = amzn.review(Id=r) print (rvn.id) print (rvn.text)

Add captcha detection

Detect when amazon shoots a captcha at us and raise an appropriate error instead of letting our soup code fail with None dereferences.

See this issue for more information on the format of the captcha and result of it being sent:
#25

Incorrect Syntax in init

Hi,

I get an error when installing and running in init.py in line 60.
It says: "incorrect syntax": _price_regexp = re.compile(ur'(?P[$ú][\d,.]+)', flags=re.I).
The arrow is on the last '.

Also in product.py line 50, reviews.py line 60.

Get Product Price

Sir can you please tell me how to get the Product price through your amazon_scraper API in Python. and also tell me how to get Seller Information.

Problems with .text command

Hello,

I've been able to use many of the commands and functions in the amazon_scraper packge, but the .text doesn't appear to work. I tried to pull the text of a valid review on amazon using the instructions laid out in your documentations. However, I continue to get 'None' instead of the text. I've tried this with several review_ids but all return 'None'. Can someone help me with this?

Thanks,

Brad

Reviews not getting after review page

I am getting the reviews for first review page. But after first page how would I get the reviews for other pages?

extract_asin doesn't work with all Amazon's links

On Amazon home page, the product link are different and extract_asin doesn't work.
I propose you to change _extract_asin_regexp by (/dp/|/gp/product/)(?P<asin>[^/]+)/

#_extract_asin_regexp = re.compile(r'/dp/(?P<asin>[^/]+)/')
_extract_asin_regexp = re.compile(r'(/dp/|/gp/product/)(?P<asin>[^/]+)/')

Example of link : http://www.amazon.com/gp/product/B00GBHZDY4/

Create a new release

Let me know when you're happy with what you've commited.
We'll tag it and release it on PyPi.

Do you have a pypi account?
If so, let me know the username and I'll give you release permissions.

How to get offer listings(all offer price by all merchants for single product)

example : http://www.amazon.com/gp/offer-listing/B00OBRE5UE/

Install requirement contains invalid library name

Installing requirements should have a library named bs4; beautifulsoup4 is not a valid library (python--v>3)

Only getting the last 10 reviews

I seem to be able to only get the last 10 reviews. Is this known?

Reviews API broken?

I don't know if this is an issue, but I can't get the reviews API to work. Even if I use the values in your unit tests they do not work. Do the unit tests work for you? Here are the results:
Fitbit FB401BK Flex Wireless Activity + Sleep Wristband, Black, Small - 5.5" - 6
.5" & Large - 6.5" - 7.9"
[0, 0, 0, 0, 0]
built-in method title of unicode object at 0x0000000004CEAED0>
built-in method title of unicode object at 0x0000000004D66600>
built-in method title of unicode object at 0x0000000004D8BCC0>
built-in method title of unicode object at 0x0000000004DB53C0>
built-in method title of unicode object at 0x0000000004E24A80>
built-in method title of unicode object at 0x0000000004E4C180>
built-in method title of unicode object at 0x0000000004E77870>
built-in method title of unicode object at 0x0000000004E9CF30>
built-in method title of unicode object at 0x0000000004F10630>
built-in method title of unicode object at 0x0000000004F76CF0>
Traceback (most recent call last):
File "test.py", line 11, in
for r in rs:
File "build\bdist.win-amd64\egg\amazon_scraper\reviews.py", line 180, in ite
r
TypeError: init() takes at least 2 arguments (2 given)

GUI?

Is there a recommended software that will provide a gui?

reviews return only 10 results

i tried the function:
rs = amzn.reviews(URL=p.reviews_url)

it return only <10 results.(unless i'm missing something)

how to properly iterate over reviews

I'm trying to run the example block of code that looks like this:

p = amzn.lookup(ItemId='B0051QVF7A')
rs = amzn.reviews(URL=p.reviews_url)

for r in rs:
    print(r)

but at first I get an error like this:

Traceback (most recent call last):
  File "review-scraper/review-scraper.py", line 19, in <module>
    for r in rs:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 178, in __iter__
    for id in page.ids:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 202, in ids
    for anchor in self.soup.find_all('div', class_="a-section review")
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/__init__.py", line 113, in decorator
    raise e
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

And when I install the html5lib package things work a little better (I'm able to print out the first page of reviews) but then I hit another error:

R1V8OBW4HRDV5W
R38AV3D6I8CHS6
R1R19OOAWIN48U
RL37IWIVVB5B4
R3S9D4LLRP7AQN
R1CAZXTXQ6F5A
R36R23EPPWW6UQ
RA751EK4W8EV4
RGZ3A10EDUYQ1
RP149JO3VJ31O
Traceback (most recent call last):
  File "review-scraper/review-scraper.py", line 19, in <module>
    for r in rs:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 180, in __iter__
    page = Reviews(URL=page.next_page_url) if page.next_page_url else None
TypeError: __init__() takes at least 2 arguments (2 given)

Is there a different package I should be using?

AWS Accout

Do I need a AWS account for this to work?

Handle new amazon ratings

Amazon have moved from ratings count (10 vote 1 star), to a percentage (5% vote 1 star)
http://www.amazon.com/dp/B00FLIJJSA

We need to handle this in the API

Problem installing amazon_scraper

Either installing from command line using pip or through PyCharm, I get the following error message:

//
Collecting amazon-scraper
Using cached amazon_scraper-0.3.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 20, in
File "C:\Users...\AppData\Local\Temp\pycharm-packaging0.tmp\amazon-scraper\setup.py", line 4, in
del os.link
AttributeError: link
//

By deleting this line I think it installs OK (I have tried it and used the scraper some time ago).
The problem is when trying to install the scraper automatically using an IDE. I am running PyCharm on Windows 8.1, if this could indicate the source of the problem. Are any Linux users getting this error?
Any help would be appreciated.

Can't parse review date for foreign Amazon regions

dateutils.parser can't parse dates for non-english Amazon regions where review dates are, e.g., am 15. September 2017 instead of on September 15, 2017.

Problem with BeautifulSoup import

I raise an import problem, when I setup a simple scraper :
Here is the content of my script :

from future import print_function
import itertools
from amazon_scraper import AmazonScraper
amzn = AmazonScraper("AKILSM4QQOKUVAX3lNPO", "s8a0hVmtfLL1TLZsKKiCBVvZTdQVG7x1HqhHZ1+E", "")
for p in itertools.islice(amzn.search(Keywords='python',
SearchIndex='Books'), 5):
print(p.title)

Here is the output given after the compilation:

root@nivose:~/amazon# python product.py
Traceback (most recent call last):
  File "product.py", line 3, in <module>
    from amazon_scraper import AmazonScraper
  File "/usr/local/lib/python2.7/dist-packages/amazon_scraper/__init__.py", line 16, in <module>
    from bs4 import BeautifulSoup
  File "build/bdist.linux-x86_64/egg/bs4/__init__.py", line 30, in <module>
  File "build/bdist.linux-x86_64/egg/bs4/builder/__init__.py", line 314, in <module>
  File "build/bdist.linux-x86_64/egg/bs4/builder/_html5lib.py", line 70, in <module>
AttributeError: 'module' object has no attribute '_base