Giter Club home page Giter Club logo

python_aws_polly_hacker_news_article_reader's Introduction

Read Top 10 Article Titles and Top Comment from Hacker News out loud with Python, AWS Polly and Raspberry Pi Zero

Grabs top 10 articles from Hacker News and passes them to Amazon Polly with basic formatting (limit to 1500 chars) and plays through Raspberry Pi PWM audio output

Raspberry Pi Zero with PWM Audio Output through GPIO pin 13, mono parallel wired

video

https://www.youtube.com/watch?v=fWfatVYML9o?t=23s

requirements

  • AWS Polly Account (free 5M characters per month)
    • Need AWSCLI with access key, secret key, region
  • Raspberry Pi connected to internet
    • Low pass filter circuit (parts) if using Raspberry Pi Zero (no headphone jack) need resistor, capacitor, film capacitor
  • Python Boto3 sdk

notes on AWS CLI

I should go over the AWS CLI part as this was confusing to me, mostly referred to their docs

Currently this runs on Python 2.7.9 on Raspbian, but I was developing it on Python 3.7, in particular the HTML.Parser to unescape HTML may be problematic.

Files

  • hn_article_top_comment_reader.py (main file)
  • findurls.py (replaces full urls eg. http://somewebsite.com to the word link)

Note:

This is intended to be ran by your crontab, and it checks if a certain device is currently connected to your network. It pings the local ip, don't forget to change that, it's blank by default so this script will not run. It's at the very top of the hn_article_top_comment_reader.py file

Intro

After getting a taste of Amazon Polly's capability (text to speech synthesis) I wanted to make something that reads Hacker News' articles out loud. In general I like to read the comments not so much the link itself. The links also bring around the unknown page structure for scraping which is not my intent. So this script runs through crontab every hour and checks if my desktop is connected to the local network (if I'm home) as this desktop is usually off when I'm not home.

I have to be honest I had no idea what I was doing, I initially wanted to develop this with JavaScript but that didn't make sense as far as running it "back end" or "headless" and I'm not speaking for Node I currently don't use Node. I meant the regular client side JavaScript which would require a front end page being open to run the script... so I switched to Python entirely which I've never really used Python before so it was cool to learn some of it. I also apologize before hand if somehting looks really stupid like the url replacement. Initially the synthesizer would read every character in a URL outloud which quickly became annoying. There are also other random quirks to correct like reading a -> out loud as "minus greater than" or something like that.

General steps

  1. Get the data from Hacker News' API
  1. Pass data to Amazon Polly
  • Here is where I really did not know what I was doing, I went through a lot of their docs, but ultimately I stripped part of their server.py script and had to figure out how to write the streaming data into a file and got lucky.
  • This is where you need the AWS CLI to be setup correctly on your local system and then login to Amazon and generate a user (non-root) for an access key/secret key, and I just gave full permissions to Amazon Polly only
  1. Go through a directories' contents and play sound
  • I needed to use pygame as several Stack Overflow threads mentioned it, in Windows I used os.start and it would use Groove to play the sound, not ideal with regard to Raspberry Pi/Headless
  • I just named the files 1.mp3, 2.mp3, etc... and limited to just 10 articles as reading an article outloud takes time
  • Every time it runs, it deletes the directory contents where the sound files are stored and recreates it (not ideal I realize).
  • I have a Pi Zero and they don't have a headphone jack so I used a tutorial by Adafruit to get sound to output through pin 13 (GPIO) and had to build a little low pass filter
  • the audio is surprisingly good I mean it's not playing music just speech but good enough, had to up the volume using a command amixer sset PCM,0 200%
  • link: https://learn.adafruit.com/adding-basic-audio-ouput-to-raspberry-pi-zero/pi-zero-pwm-audio

Closing thoughts

Yeah it's pretty cool, I got all of my "goals" accomplished, I had to build a lot of random small things and then join it together I apologize if the code looks trash (it most likely does) I am new to Python need to improve/optimize code in general.

It's kind of funny, now that I have it I don't know if I actually want to keep it. The voice kind of drones on, in my opinion the Kendra voice ID is good, I do have a preference for female voices. But the "Lexicons" could use work but I have no idea about that at this time. In particular the questions, has that general "raise voice at the end" pattern and it doens't work, usually I just leave out question marks in a question but that's not implemented in this code.

Update - 09-22-2017

I modified the main script a lot namely:

  • changed from notion of "top 10 articles" to "some article comments"
  • save previously read article_id's and comment_id's to a file to not read again unless the comment changed
  • added predefined-sounds eg. text that shouldn't be synthesized over and over

Overall though, after my free year trial of AWS runs out I'm not sure if I'll keep this running (actually pay for it), I tried to reduces calls with the change check (most recent update 09-20-2017) but this time it goes beyond the first 10 articles.

Note

The numbering is misleading, I don't actually match the article number on Hacker News, it's ordered according to the same start-to-finish however some articles may not have any comments yet, and also if it has already been read alout before, it won't be read again (if no change) so this spot would be skipped over, the counter is just a general iterator.

Caution

Currently the previous_articles.txt has no check to reset it, I gotta think about it, what check to use to reset the previous_articles.txt file It'll just keep growing if it's not reset every now and then. Possibly if there are 0 new content reset to empty.

Actually I'll just add it as an else to the if connected section of the main script. If the script is not supposed to run, reset the file.

Another caution

It just occurred to me if the previous_articles.txt is not reset (reference ip is still connected), and previous_articles.txt grows when after each article read run, adds the new article_id:artice_top_comment_id pair... if two article_id's exist with different comment id's I think that would still run as it only runs the first match... hmm

Not sure can solve it just working on something else at this time.

python_aws_polly_hacker_news_article_reader's People

Contributors

jdc-cunningham avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.