Giter Club home page Giter Club logo

hackernewsapi's Introduction

Hacker News API

Unofficial Python API for Hacker News.

Build Status Test Coverage Version Downloads
Build Status Coverage Status Latest Version Downloads

Features

  • Compatible with Python 2 (2.7+).
  • Supports 'top', 'news2','newest' and 'best' posts
  • Retrieve comments from posts (flat list for now) (story.get_comments())
  • Pagination support for comments
  • Handles external posts, self posts and job posts
  • Get post details for any post (Story.fromid(7024626))

Installation

$ pip install HackerNews

Usage

NOTE: Do not make a lot of requests in a short period of time. HN has it's own throttling system.

from hn import HN

hn = HN()

# print the first 2 pages of newest stories
for story in hn.get_stories(story_type='newest', limit=60):
    print(story.rank, story.title)

API Reference

Class: HN

Get stories from Hacker News

get_stories

Parameters:

Name Type Required Description Default
story_type string No Returns the stories from this page. One of (empty string), news2, newest, best (empty string) (top)
limit integer No Number of stories required from the given page. Cannot be more than 30. 30

Example:

from hn import HN
hn = HN()
hn.get_stories(story_type='newest', limit=10)

get_leaders

Parameters:

Name Type Required Description Default
limit integer No Number of top leaders to return 10

Example:

from hn import HN
hn = HN()

# get top 20 users of HN
hn.get_leaders(limit=20)

Class: Story

Each Story has the following properties

  • rank - the rank of story on the page (keep pagination in mind)
  • story_id - the story's id
  • title - the title of the story
  • is_self - true for self/job stories
  • link - the URL it points to ('' for self posts)
  • domain - the domain of the link ('' for self posts)
  • points - the points/karma on the story
  • submitter - the user who submitted the story ('' for job posts)
  • submitter_profile - the above user's profile link (can be '')
  • published_time - the published time
  • num_comments - the number of comments a story has
  • comments_link - the link to the comments page

Make an object from the ID of a story

fromid

Parameters:

Name Type Required Description Default
item_id integer Yes Initializes an instance of Story for given item_id. Must be a valid story id.

Example:

from hn import Story
story = Story.fromid(6374031)
print story.title

Get a list of Comment's for this story

get_comments

Parameters:

Name Type Required Description Default

Example:

from hn import Story
story = Story.fromid(6374031)
comments = story.get_comments()

Class: Comment

Each Comment has the following properties

  • comment_id - the comment's item id
  • level - comment's nesting level
  • user - user's name who submitted the post
  • time_ago - time when it was submitted
  • body - text representation of comment (unformatted)
  • body_html - html of comment, may not be valid

Class: User

Each User has the following properties

  • username - user's profile name
  • date_created - when the profile was created
  • karma - user's e-points
  • avg - user's average karma per day

Examples

See my_test_bot.py

Tests

To run the tests locally just do:

$ chmod 777 runtests.sh
$ ./runtests.sh

To run individual tests,

$ python -m unittest tests.<module name>

The tests are run on a local test server with predownloaded original responses.

Donations

If HackerNewsAPI has helped you in any way, and you'd like to help the developer, please consider donating.

- BTC: 19dLDL4ax7xRmMiGDAbkizh6WA6Yei2zP5

- Flattr: https://flattr.com/profile/thekarangoel

Contribute

If you want to add any new features, or improve existing ones, feel free to send a pull request!

hackernewsapi's People

Contributors

arch119 avatar bitdeli-chef avatar danclaudiupop avatar digital-shokunin avatar gauravaror avatar habi avatar nprescott avatar nucleartide avatar taylan avatar ueg1990 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hackernewsapi's Issues

newest and best options do not work in get_stories

For the function get_stories, the parameter options 'newest' and 'best' do not work. the reason is because by default the function works for getting only top stories from default page https://news.ycombinator.com/news

I can modify this either by updating the get_stories function or creating separate functions for default, newest and best. The advantage of the latter is that people using the api do not have to pass parameters to get_stories functions every time and they can just call the relevant function; this of course is a disadvantage because alot of code will be re-written. The advantage of the former is less code to be written.

Let me know what you think :)

user data.

update the api to fetch user datas submissions , comments , karmas etc

Is there any way to find stories by title?

A question, not a bug. Is there any way to find stories by title? Say,

hn.get_stories(title='some title%)

# or

reg_exp = re.compile('something')
hn.get_stories(title=reg_exp)


Comment.body_html is incomplete

from hn import Story
story = Story.fromid(7324236)
comments = story.get_comments()
print comments[0].body
print comments[0].body_html

Seems like .body_html only returns a portion of the comment body's HTML.

Polls support

At first glance doesn't seem to parse votes for polls.

Pagination

Pagination on HN is a little tricky.

Pages are in the format: https://news.ycombinator.com/x?fnid=6ZNSo6ZztcTOk5nXmNIDa6

I think 6ZNSo6ZztcTOk5nXmNIDa6 refers to a session ID or something along those lines, and it has a timeout.

https://news.ycombinator.com/item?id=3623268

Pip install fails on latest version 1.1.0

See bellow output:

~ » pip install HackerNews                                                                                                                                                                 danclaudiupop@wayland
Downloading/unpacking HackerNews
  Downloading HackerNews-1.1.0.zip
  Running setup.py egg_info for package HackerNews

Installing collected packages: HackerNews
  Running setup.py install for HackerNews
      File "/home/danclaudiupop/projects/.virtualenvs/caca/lib/python2.7/site-packages/hn/hn.py", line 79
        <<<<<<< HEAD
         ^
    SyntaxError: invalid syntax


Successfully installed HackerNews
Cleaning up...

Can you also PEPify the code ? Remove whitespaces, limit all lines to a maximum of 79 characters, etc.

Thanks,

collections.namedtuple instead of Story

The Story class seems somewhat redundant. You could possibly use collections.namedtuple as a container for properties or simply a dictionary. The print_story method could just be the str special method.

Get any stories comments.

This would mean user should be table to import Story and do something like com = Story(item_id=xxxxxxx).get_comments()

Why limit = 30?

Hi again.

def get_stories(self, story_type='', limit=30):
    """
    Yields a list of stories from the passed page
    of HN.
    'story_type' can be:
    \t'' = top stories (homepage) (default)
    \t'newest' = most recent stories
    \t'best' = best stories

    'limit' is the number of stories required. Defaults to 30
    """
    if limit == None or limit < 30:
        limit = 30 # we need at least 30 items

why do you set limit back to 30?

Weird encoding error in __str__ of Story

For some weird reason, the API is throwing a weird UnicodeEncodingError.

Modify __str__ of Story to return self.title only.

Then, try this code:

from hn import HN

hn = HN()
for story in hn.get_stories():
    print story.title
    print story

The output is this:

Turn O(n^2) reverse into O(n)
Turn O(n^2) reverse into O(n)
My run-in with unauthorised Litecoin mining on AWS
My run-in with unauthorised Litecoin mining on AWS
Amazon takes away access to purchased Christmas movie during Christmas
Traceback (most recent call last):
  File "my_test_bot.py", line 11, in <module>
    print story
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 60: ordinal not in range(128)

I have no idea why this is happening, and I created a SO for this too.

I have downloaded the page that is causing this error right now for offline debugging.

Enhancement request: HTTPS

Let's use HTTPS/SSL whenever we can, please set BASE_URL = 'https://news.ycombinator.com'. This, in turn, will set comments link and the link to submitter profile to HTTPS as well.

get_comments : IndexError: list index out of range

Hi, I'm having this issue:

On story Snapchat Phone Number Database Leaked, id 6993968

Traceback (most recent call last):
File "hello.py", line 9, in
for comment in story.get_comments():
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hn/hn.py", line 252, in get_comments
return self._build_comments(soup)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hn/hn.py", line 218, in _build_comments
level = int(row.findChildren('td')[1].find('img').get('width')) // 40
IndexError: list index out of range

any idea?

Fix simple typo: explititly -> explicitly

Issue Type

[x] Bug (Typo)

Steps to Replicate

  1. Examine hn/hn.py.
  2. Search for explititly.

Expected Behaviour

  1. Should read explicitly.

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/HackerNewsAPI/pull/new/bugfix_typo_explicitly

Thanks.

Cannot parse result

I write some line to get top or newest stories, when i print and get error like this:

./hackernews_api.py
/usr/lib64/python2.7/site-packages/bs4/init.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 12 of the file ./hackernews_api.py. To get rid of this warning, change code that looks like this:

BeautifulSoup(YOUR_MARKUP})

to this:

BeautifulSoup(YOUR_MARKUP, "html.parser")

markup_type=markup_type))
Traceback (most recent call last):
File "./hackernews_api.py", line 12, in
for story in top_iter:
File "/usr/lib/python2.7/site-packages/hn/hn.py", line 145, in get_stories
stories = self._build_story(all_rows) # get a list of stories on current page
File "/usr/lib/python2.7/site-packages/hn/hn.py", line 71, in _build_story
domain = info_cells[2].find('span').string[2:-2] # slice " (abc.com) "
TypeError: 'NoneType' object has no attribute 'getitem'

Something i went wrong, pls fix.
Thanks

Make tests work

So I added some code to mock the tests. The idea is, when testing, serve pre-downloaded html files instead of querying HN again and again. This is not working though. Anyone wants to have a look at it?

HackerNews HTML has changed, API has stopped working

$ python2
Python 2.7.9 (default, Dec 11 2014, 04:42:00) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hn import HN
>>> hn = HN()
>>> for story in hn.get_stories(story_type='newest', limit=10):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/hn/hn.py", line 145, in get_stories
    stories = self._build_story(all_rows) # get a list of stories on current page
  File "/usr/lib/python2.7/site-packages/hn/hn.py", line 71, in _build_story
    domain = info_cells[2].find('span').string[2:-2] # slice " (abc.com) "
TypeError: 'NoneType' object has no attribute '__getitem__'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.