Giter Club home page Giter Club logo

haxor's Introduction

haxor

build coverall version supported license

Unofficial Python wrapper for official Hacker News API.

Installation

pip install haxor

Usage

Import and initialization:

from hackernews import HackerNews
hn = HackerNews()

Items

Stories, comments, jobs, Ask HNs and even polls are just items with unique item id.

To query item information by id:

item = hn.get_item(8863)
# >>> item.title
# 'My YC app: Dropbox - Throw away your USB drive'
# >>> item.item_type
# 'story'
# >>> item.kids
# [ 8952, 9224, 8917, ...]

Since most results are returned as integer IDs (like item.kids above), these results require further iteration. Instead of doing this yourself, use the expand flag to get object-oriented, detailed item info by id:

item = hn.get_item(8863, expand=True)
# >>> item.kids
# [<hackernews.Item: 9224 - None>, <hackernews.Item: 8952 - None>, ...]
# >>> item.by
# <hackernews.User: dhouston>

To query a list of Item IDs:

items = hn.get_items_by_ids([8863, 37236, 2345])
# >>> items
# [<hackernews.Item: 8863 - My YC app: Dropbox - Throw away your USB drive>, <hackernews.Item:
# 37236 - None>, <hackernews.Item: 2345 - The Best Buy Scam.>]

Use the item_type filter to specifically select 'story', 'comment', 'job', or 'poll' items:

items = hn.get_items_by_ids([8863, 37236, 2345], item_type='story')
# >>> items
# [<hackernews.Item: 8863 - My YC app: Dropbox - Throw away your USB drive>, <hackernews.Item: # 2345 - The Best Buy Scam.>]

Stories

The HN API allows for real-time querying for New, Top, Best, Ask HN, Show HN, and Jobs stories.

As an example, to get Item objects of current top stories:

top_stories = hn.top_stories()
# >>> top_stories
# [<hackernews.Item: 16924667 - Ethereum Sharding FAQ>, ...]

Useful Item Queries

To get current largest Item id (most recent story, comment, job, or poll):

max_item = hn.get_max_item()
# >>> max_item
# 16925673

Once again, use the expand flag to get an object-oriented, detailed Item representation:

max_item = hn.get_max_item(expand=True)
# >>> max_item
# <hackernews.Item: 16925673 - None>

To get the x most recent Items:

last_ten = hn.get_last(10)
# >>> last_ten
# [<hackernews.Item: 16925688 - Show HN: Eventbot – Group calendar for Slack teams>, ...]

Users

HN users are also queryable.

To query users by user_id (i.e. username on Hacker News):

user = hn.get_user('pg')
# >>> user.user_id
# 'pg'
# >>> user.karma
# 155040

Use the expand flag to get an object-oriented, detailed Item representation for User attributes:

user = hn.get_user('dhouston', expand=True)
# >>> user.stories
# [<hackernews.Item: 1481914 - Dropbox is hiring a Web Engineer>, ...]
# >>> user.comments
# [<hackernews.Item: 16660140 - None>, <hackernews.Item: 15692914 - None>, ...]
# >>> user.jobs
# [<hackernews.Item: 3955262 - Dropbox seeking iOS and Android engineers>, ...]

To query a list of users:

users = hn.get_users_by_ids(['pg','dhouston'])
# >>> users
# [<hackernews.User: pg>, <hackernews.User: dhouston>]

Examples

Get top 10 stories:

hn.top_stories(limit=10)

# [<hackernews.Item: 16924667 - Ethereum Sharding FAQ>, <hackernews.Item: 16925499 - PipelineDB # v0.9.9 – One More Release Until PipelineDB Is a PostgreSQL Extension>, ...]

Find all the 'jobs' post from Top Stories:

stories = hn.top_stories()
for story in stories:
    if story.item_type == 'job':
        print(story)

# <hackernews.Item: 16925047 - Taplytics (YC W14) is solving hard engineering problems in
# Toronto and hiring>
# ...
# ...

Find Python jobs from monthly who is hiring thread:

# Who is hiring - April 2018
# https://news.ycombinator.com/item?id=16735011

who_is_hiring = hn.get_item(16735011, expand=True)

for comment in who_is_hiring.kids:
    if 'python' in comment.text.lower():
        print(comment)

# <hackernews.Item: 16735358 - None>
# <hackernews.Item: 16737152 - None>
# ...
# ...

API Reference

Class: HackerNews

Parameters:

Name Type Required Description Default
version string No specifies Hacker News API version v0

get_item

Description: Returns Item object

Parameters:

Name Type Required Description Default
item_id string/int Yes unique item id of Hacker News story, comment etc None
expand bool No flag to indicate whether to transform all IDs into objects False

get_items_by_ids

Description: Returns list of Item objects

Parameters:

Name Type Required Description Default
item_ids list of string/int Yes unique item ids of Hacker News stories, comments etc None
item_type string No item type to filter results with None

get_user

Description: Returns User object

Parameters:

Name Type Required Description Default
user_id string Yes unique user id of a Hacker News user None
expand bool No flag to indicate whether to transform all IDs into objects False

get_users_by_ids

Description: Returns list of User objects

Parameters:

Name Type Required Description Default
user_ids list of string/int Yes unique user ids of Hacker News users None

top_stories

Description: Returns list of Item objects of current top stories

Parameters:

Name Type Required Description Default
raw bool No indicate whether to represent all objects in raw json False
limit int No specifies the number of stories to be returned None

new_stories

Description: Returns list of Item objects of current new stories

Parameters:

Name Type Required Description Default
raw bool No indicate whether to represent all objects in raw json False
limit int No specifies the number of stories to be returned None

ask_stories

Description: Returns list of Item objects of latest Ask HN stories

Parameters:

Name Type Required Description Default
raw bool No indicate whether to represent all objects in raw json False
limit int No specifies the number of stories to be returned None

show_stories

Description: Returns list of Item objects of latest Show HN stories

Parameters:

Name Type Required Description Default
raw bool No indicate whether to represent all objects in raw json False
limit int No specifies the number of stories to be returned None

job_stories

Description: Returns list of Item objects of latest Job stories

Parameters:

Name Type Required Description Default
raw bool No indicate whether to represent all objects in raw json False
limit int No specifies the number of stories to be returned None

updates

Description: Returns list of Item and User objects that have been changed/updated recently.

Parameters: N/A

get_max_item

Description: Returns current largest item id or current largest Item object

Parameters:

Name Type Required Description Default
expand bool No flag to indicate whether to transform ID into object False

get_all

Description: Returns all Item objects from HN

Parameters: N/A

get_last

Description: Returns list of num most recent Item objects

Parameters:

Name Type Required Description Default
num int No numbr of most recent records to pull from HN 10

Class: Item

From Official HackerNews Item:

Property Description
item_id The item’s unique id.
deleted true if the item is deleted.
item_type The type of item. One of “job”, “story”, “comment”, “poll”, or “pollopt”.
by The username of the item’s author.
submission_time Creation date of the item, in Python datetime.
text The comment, Ask HN, or poll text. HTML.
dead true if the item is dead.
parent The item’s parent. For comments, either another comment or the relevant story. For pollopts, the relevant poll.
poll The ids of poll's.
kids The ids of the item’s comments, in ranked display order.
url The URL of the story.
score The story’s score, or the votes for a pollopt.
title The title of the story or poll.
parts A list of related pollopts, in display order.
descendants In the case of stories or polls, the total comment count.
raw original JSON response.

Class: User

From Official HackerNews User:

Property Description
user_id The user’s unique username. Case-sensitive.
delay Delay in minutes between a comment’s creation and its visibility to other users.
created Creation date of the user, in Python datetime.
karma The user’s karma.
about The user’s optional self-description. HTML.
submitted List of the user’s stories, polls and comments.
raw original JSON response.

Additional properties when expand is used

Property Description
stories The user’s submitted stories.
comments The user's submitted comments.
jobs The user's submitted jobs.
polls The user's submitted polls.
pollopts The user's submitted poll options.

Development

For local development do pip installation of requirements-dev.txt:

pip install -r requirements-dev.txt

Testing

Run the test suite by running:

echo "0.0.0-dev" > version.txt
python setup.py develop
pytest tests

LICENSE

The mighty MIT license. Please check LICENSE for more details.

haxor's People

Contributors

avinassh avatar bryant1410 avatar daturkel avatar ddevault avatar dependabot[bot] avatar fayazkhan avatar formerlychucks avatar hadalin avatar markus00000 avatar robertjkeck2 avatar sriram-mv avatar tony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

haxor's Issues

Update to latest standards of Python 2024

  • I am pretty sure I am not using the asyncio as intended
  • Possibly explore the right ways to do async requests and alternatives
  • types / mypy support
  • Move to Poetry for packaging and publishing (#36)
  • Optimisations where async can be used. e.g.
    if expand:
    item.by = self.get_user(item.by)
    item.kids = self.get_items_by_ids(item.kids) if item.kids else None
    item.parent = self.get_item(item.parent) if item.parent else None
    item.poll = self.get_item(item.poll) if item.poll else None
    item.parts = (
    self.get_items_by_ids(item.parts) if item.parts else None
    )

`asyncio.get_event_loop()`: DeprecationWarning: There is no current event loop

While running tests, following is printed:

=============================== warnings summary ===============================
tests/test_ask_stories.py::TestAskStories::test_ask_stories
  /home/runner/work/haxor/haxor/hackernews/__init__.py:139: DeprecationWarning: There is no current event loop
    loop = asyncio.get_event_loop()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

https://github.com/avinassh/haxor/actions/runs/7166888744/job/19511836841

on stackoverflow: https://stackoverflow.com/questions/73361664/asyncio-get-event-loop-deprecationwarning-there-is-no-current-event-loop

Speeding up performance?

Hi avinassh,

I'm running into performance issues running the examples. It seems the HN API route can be considerably slower than just screen scraping.

Is there a way for me to speed this up? I wonder if the connection to Firebase is being reused--seems like with requests this should be done automatically?

Using timeit here are my results:

for story_id in hn.top_stories(limit=10):
    print hn.get_item(story_id)

timeit: 4.0882999897 sec

who_is_hiring = hn.get_item(8394339)

for comment_id in who_is_hiring.kids:
    comment = hn.get_item(comment_id)
    if 'python' in comment.text.lower():
        print comment.item_id

timeit: 125.318946123 sec

Thanks!

ImportError: No module named 'pypandoc'

When installing haxor I get this error:

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
  File "/tmp/pip-install-zl_kpo7s/haxor/setup.py", line 6, in <module>
    import pypandoc
ImportError: No module named 'pypandoc'

The error won’t occur if I install pypandoc before haxor.

The project’s requirements-dev.txt contains pypandoc but requirements.txt does not. Is this intended?

Issue while fetching comments

hn = HackerNews()
who_is_hiring = hn.get_item(12202865)
for comment_id in who_is_hiring.kids:
    print('processing:- '+str(comment_id))
    comment = hn.get_item(comment_id) #this line caused the error
    if comment.text is not None:
        cleantext = BeautifulSoup(comment.text.lower(),'lxml').text.strip()
        comments.append(cleantext)

Error is:

Traceback (most recent call last):
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 331, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/adapters.py", line 362, in send
    timeout=timeout
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    _pool=self, _stacktrace=stacktrace)
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/anaconda3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/anaconda3/anaconda/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Not compatible with python 3.11 due to possible bug in aiohttp

my code:

from hackernews import HackerNews
hn = HackerNews()

output:

:!python /Users/A78751003/projects/py/hn/main.py
Traceback (most recent call last):
  File "/Users/A78751003/projects/py/hn/main.py", line 1, in <module>
    from hackernews import HackerNews
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/hackernews/__init__.py", line 19, in <module>
    import aiohttp
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/aiohttp/__init__.py", line 6, in <module>
    from .client import *  # noqa
    ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/aiohttp/client.py", line 16, in <module>
    from . import client_exceptions, client_reqrep
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 18, in <module>
    from . import hdrs, helpers, http, multipart, payload
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/aiohttp/helpers.py", line 23, in <module>
    import async_timeout
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/async_timeout/__init__.py", line 10, in <module>
    class timeout:
  File "/Users/A78751003/projects/py/hn/.venv/lib/python3.11/site-packages/async_timeout/__init__.py", line 40, in timeout
    @asyncio.coroutine
     ^^^^^^^^^^^^^^^^^
AttributeError: module 'asyncio' has no attribute 'coroutine'. Did you mean: 'coroutines'?

According to stackoverflow that is because asyncio dropped the decorator with python 3.11.

Outdated version on pip

Hi, I noticed that the version on pip is outdated as it does not support the descendants field added recently.

You might want to update it. Temporarily I can solve it by using the github url as the dependency.

Update requests dependency

Is it possible to make haxor support requests>=2.4.0 to solve this dependency conflict I have?

haxor 1.1 has requirement requests==2.0.0, but you'll have requests 2.18.4 which is incompatible.

Missing parentheses in call to 'print' on Python 3.4.3 pip install

Hi, I'm getting the following error trying to install on Python 3.4.3, but it works fine on 2.7.10:

$ pip install haxor
Collecting haxor
  Using cached haxor-0.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/private/var/folders/rp/8nf53wk57bgd12w6fr31rrwh0000gn/T/pip-build-yuswawpl/haxor/setup.py", line 12
        print long_description
                             ^
    SyntaxError: Missing parentheses in call to 'print'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/rp/8nf53wk57bgd12w6fr31rrwh0000gn/T/pip-build-yuswawpl/haxor

RecursionError when running tests

Running tests:

pip install -r requirements-dev.txt
python setup.py develop
pytest tests

produces the following error

RecursionError: maximum recursion depth exceeded while calling a Python object

Python version: 3.6.5

TimeoutError Top Stories

Calling https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty from within the browser works fine, however if I try to leverage your wrapper

hn = HackerNews()
top_stories = hn.top_stories()
top_stories

I´ll run into ProtocolError: ('Connection aborted.', TimeoutError(10060,...) for most of the time.

Any hints?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.