karan / hackernewsapi Goto Github PK

:newspaper: Unofficial Python API for Hacker News. RESTful API at https://github.com/karan/HNify

License: MIT License

Python 98.78% Shell 1.22%

hackernewsapi's Introduction

Unofficial Python API for Hacker News.

Build Status	Test Coverage	Version	Downloads

Features

Compatible with Python 2 (2.7+).
Supports 'top', 'news2','newest' and 'best' posts
Retrieve comments from posts (flat list for now) (story.get_comments())
Pagination support for comments
Handles external posts, self posts and job posts
Get post details for any post (Story.fromid(7024626))

Installation

$ pip install HackerNews

Usage

NOTE: Do not make a lot of requests in a short period of time. HN has it's own throttling system.

from hn import HN

hn = HN()

# print the first 2 pages of newest stories
for story in hn.get_stories(story_type='newest', limit=60):
    print(story.rank, story.title)

API Reference

Class: `HN`

Get stories from Hacker News

`get_stories`

Parameters:

Name	Type	Required	Description	Default
`story_type`	string	No	Returns the stories from this page. One of `(empty string)`, `news2`, `newest`, `best`	`(empty string)` (top)
`limit`	integer	No	Number of stories required from the given page. Cannot be more than 30.	30

Example:

from hn import HN
hn = HN()
hn.get_stories(story_type='newest', limit=10)

`get_leaders`

Parameters:

Name	Type	Required	Description	Default
`limit`	integer	No	Number of top leaders to return	10

Example:

from hn import HN
hn = HN()

# get top 20 users of HN
hn.get_leaders(limit=20)

Class: `Story`

Each Story has the following properties

rank - the rank of story on the page (keep pagination in mind)
story_id - the story's id
title - the title of the story
is_self - true for self/job stories
link - the URL it points to ('' for self posts)
domain - the domain of the link ('' for self posts)
points - the points/karma on the story
submitter - the user who submitted the story ('' for job posts)
submitter_profile - the above user's profile link (can be '')
published_time - the published time
num_comments - the number of comments a story has
comments_link - the link to the comments page

Make an object from the ID of a story

`fromid`

Parameters:

Name	Type	Required	Description	Default
`item_id`	integer	Yes	Initializes an instance of Story for given item_id. Must be a valid story id.

Example:

from hn import Story
story = Story.fromid(6374031)
print story.title

Get a list of Comment's for this story

`get_comments`

Parameters:

Name	Type	Required	Description	Default

Example:

from hn import Story
story = Story.fromid(6374031)
comments = story.get_comments()

Class: `Comment`

Each Comment has the following properties

comment_id - the comment's item id
level - comment's nesting level
user - user's name who submitted the post
time_ago - time when it was submitted
body - text representation of comment (unformatted)
body_html - html of comment, may not be valid

Class: `User`

Each User has the following properties

username - user's profile name
date_created - when the profile was created
karma - user's e-points
avg - user's average karma per day

Examples

See my_test_bot.py

Tests

To run the tests locally just do:

$ chmod 777 runtests.sh
$ ./runtests.sh

To run individual tests,

$ python -m unittest tests.<module name>

The tests are run on a local test server with predownloaded original responses.

Donations

If HackerNewsAPI has helped you in any way, and you'd like to help the developer, please consider donating.

- BTC: 19dLDL4ax7xRmMiGDAbkizh6WA6Yei2zP5

- Flattr: https://flattr.com/profile/thekarangoel

Contribute

If you want to add any new features, or improve existing ones, feel free to send a pull request!

hackernewsapi's People

Contributors

Stargazers

Watchers

hackernewsapi's Issues

newest and best options do not work in get_stories

For the function get_stories, the parameter options 'newest' and 'best' do not work. the reason is because by default the function works for getting only top stories from default page https://news.ycombinator.com/news

I can modify this either by updating the get_stories function or creating separate functions for default, newest and best. The advantage of the latter is that people using the api do not have to pass parameters to get_stories functions every time and they can just call the relevant function; this of course is a disadvantage because alot of code will be re-written. The advantage of the former is less code to be written.

Let me know what you think :)

Can't access my accounts

https://accounts.shopify.com/select?rid=0db02f2b-c73c-4856-afc5-1bb8430bd498&sid=139d1b86-5ff6-4d13-b574-6471dadd903b

packages/modules/adbd

Publish time for job stories

user data.

update the api to fetch user datas submissions , comments , karmas etc

Is there any way to find stories by title?

A question, not a bug. Is there any way to find stories by title? Say,

hn.get_stories(title='some title%)

# or

reg_exp = re.compile('something')
hn.get_stories(title=reg_exp)

Get comments for a story is breaking when no comments are there on Hackernews page

When we call the get_comments method, if story doesn't have any comments the api breaks as i keeps looking for table of comments in the page and won't be able to find it and breaks.

Comment.body_html is incomplete

from hn import Story
story = Story.fromid(7324236)
comments = story.get_comments()
print comments[0].body
print comments[0].body_html

Seems like .body_html only returns a portion of the comment body's HTML.

Polls support

At first glance doesn't seem to parse votes for polls.

IndexError: list index out of range (comments)

table = soup.findChildren('table')[3] # the table holding all comments
IndexError: list index out of range

Pagination

Pagination on HN is a little tricky.

Pages are in the format: https://news.ycombinator.com/x?fnid=6ZNSo6ZztcTOk5nXmNIDa6

I think 6ZNSo6ZztcTOk5nXmNIDa6 refers to a session ID or something along those lines, and it has a timeout.

https://news.ycombinator.com/item?id=3623268

Pip install fails on latest version 1.1.0

See bellow output:

~ » pip install HackerNews                                                                                                                                                                 danclaudiupop@wayland
Downloading/unpacking HackerNews
  Downloading HackerNews-1.1.0.zip
  Running setup.py egg_info for package HackerNews

Installing collected packages: HackerNews
  Running setup.py install for HackerNews
      File "/home/danclaudiupop/projects/.virtualenvs/caca/lib/python2.7/site-packages/hn/hn.py", line 79
        <<<<<<< HEAD
         ^
    SyntaxError: invalid syntax


Successfully installed HackerNews
Cleaning up...

Can you also PEPify the code ? Remove whitespaces, limit all lines to a maximum of 79 characters, etc.

Thanks,

Wf

https://www.wordfence.com/products/wordfence-central/liscene

Respect robots.txt

https://news.ycombinator.com/robots.txt

User-Agent: *
Disallow: /x?
Disallow: /vote?
Disallow: /reply?
Disallow: /submitted?
Disallow: /submitlink?
Disallow: /threads?
Crawl-delay: 30

Need to respect this, codebase will go into a lot of changes.

collections.namedtuple instead of Story

The Story class seems somewhat redundant. You could possibly use collections.namedtuple as a container for properties or simply a dictionary. The print_story method could just be the str special method.

Get any stories comments.

This would mean user should be table to import Story and do something like com = Story(item_id=xxxxxxx).get_comments()

Add python 3 support and test it

Why limit = 30?

Hi again.

def get_stories(self, story_type='', limit=30):
    """
    Yields a list of stories from the passed page
    of HN.
    'story_type' can be:
    \t'' = top stories (homepage) (default)
    \t'newest' = most recent stories
    \t'best' = best stories

    'limit' is the number of stories required. Defaults to 30
    """
    if limit == None or limit < 30:
        limit = 30 # we need at least 30 items

why do you set limit back to 30?

Weird encoding error in str of Story

For some weird reason, the API is throwing a weird UnicodeEncodingError.

Modify __str__ of Story to return self.title only.

Then, try this code:

from hn import HN

hn = HN()
for story in hn.get_stories():
    print story.title
    print story

The output is this:

Turn O(n^2) reverse into O(n)
Turn O(n^2) reverse into O(n)
My run-in with unauthorised Litecoin mining on AWS
My run-in with unauthorised Litecoin mining on AWS
Amazon takes away access to purchased Christmas movie during Christmas
Traceback (most recent call last):
  File "my_test_bot.py", line 11, in <module>
    print story
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 60: ordinal not in range(128)

I have no idea why this is happening, and I created a SO for this too.

I have downloaded the page that is causing this error right now for offline debugging.

usage of Python-3-style syntax, which is also valid in Python 2.7

Enhancement request: HTTPS

Let's use HTTPS/SSL whenever we can, please set BASE_URL = 'https://news.ycombinator.com'. This, in turn, will set comments link and the link to submitter profile to HTTPS as well.

get_comments : IndexError: list index out of range

Hi, I'm having this issue:

On story Snapchat Phone Number Database Leaked, id 6993968

Traceback (most recent call last):
File "hello.py", line 9, in
for comment in story.get_comments():
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hn/hn.py", line 252, in get_comments
return self._build_comments(soup)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hn/hn.py", line 218, in _build_comments
level = int(row.findChildren('td')[1].find('img').get('width')) // 40
IndexError: list index out of range

any idea?

Fix simple typo: explititly -> explicitly

Issue Type

[x] Bug (Typo)

Steps to Replicate

Examine hn/hn.py.
Search for explititly.

Expected Behaviour

Should read explicitly.

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/HackerNewsAPI/pull/new/bugfix_typo_explicitly

Thanks.

Cannot parse result

I write some line to get top or newest stories, when i print and get error like this:

./hackernews_api.py
/usr/lib64/python2.7/site-packages/bs4/init.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 12 of the file ./hackernews_api.py. To get rid of this warning, change code that looks like this:

BeautifulSoup(YOUR_MARKUP})

to this:

BeautifulSoup(YOUR_MARKUP, "html.parser")

markup_type=markup_type))
Traceback (most recent call last):
File "./hackernews_api.py", line 12, in
for story in top_iter:
File "/usr/lib/python2.7/site-packages/hn/hn.py", line 145, in get_stories
stories = self._build_story(all_rows) # get a list of stories on current page
File "/usr/lib/python2.7/site-packages/hn/hn.py", line 71, in _build_story
domain = info_cells[2].find('span').string[2:-2] # slice " (abc.com) "
TypeError: 'NoneType' object has no attribute 'getitem'

Something i went wrong, pls fix.
Thanks

UnicodeEncodeError in my_test_bot.py

Hi,

While you get and try to print the story.title in my_test_bot.py. It gives following error.
'ascii' codec can't encode character u'\u2013' in position 9: ordinal not in range(128)

Solution seems to be :
http://stackoverflow.com/questions/19232385/ascii-codec-cant-encode-character-u-u2013-in-position-9-ordinal-not-in-ran

I will send a pull request in a while.

it would be more "Pythonic" to return a str object, which was formatted using str.format.
Use Requests (http://docs.python-requests.org/en/latest/)

$ python2
Python 2.7.9 (default, Dec 11 2014, 04:42:00) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hn import HN
>>> hn = HN()
>>> for story in hn.get_stories(story_type='newest', limit=10):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/hn/hn.py", line 145, in get_stories
    stories = self._build_story(all_rows) # get a list of stories on current page
  File "/usr/lib/python2.7/site-packages/hn/hn.py", line 71, in _build_story
    domain = info_cells[2].find('span').string[2:-2] # slice " (abc.com) "
TypeError: 'NoneType' object has no attribute '__getitem__'

AttributeError: 'NoneType' object has no attribute 'groups'

Getting the following error:

AttributeError: 'NoneType' object has no attribute 'groups'

karan / hackernewsapi Goto Github PK

hackernewsapi's Introduction

Features

Installation

Usage

API Reference

Class: HN

Get stories from Hacker News

get_stories

get_leaders

Class: Story

Make an object from the ID of a story

fromid

Get a list of Comment's for this story

get_comments

Class: Comment

Class: User

Examples

Tests

Donations

Contribute

hackernewsapi's People

Contributors

Stargazers

Watchers

Forkers

hackernewsapi's Issues

Issue Type

Steps to Replicate

Expected Behaviour

Recommend Projects

Recommend Topics

Recommend Org

Class: `HN`

`get_stories`

`get_leaders`

Class: `Story`

`fromid`

`get_comments`

Class: `Comment`

Class: `User`