adashofdata / nlp-in-python-tutorial Goto Github PK
View Code? Open in Web Editor NEWcomparing stand up comedians using natural language processing
comparing stand up comedians using natural language processing
data = {}
for i, c in enumerate(comedians):
with open("transcripts/" + c + ".txt", "rb") as file:
data[c] = pickle.load(file)
FileNotFoundError Traceback (most recent call last)
in
2 data = {}
3 for i, c in enumerate(comedians):
----> 4 with open("transcripts/" + c + ".txt", "rb") as file:
5 data[c] = pickle.load(file)
FileNotFoundError: [Errno 2] No such file or directory: 'transcripts/louis.txt'
I went through this tutorial on 2019-07-06 with vanilla installs of recommended software applications and updates on a Windows 10 Home x64 machine and encountered a deprecation warning when running 2-Exploratory-Data-Analysis.ipynb in the jupyter notebook.
Series.nonzero()
is deprecated and will be removed in a future version.
Update code in 2-Exploratory-Data-Analysis.ipynb from original:
uniques = data[comedian].nonzero()[0].size
to
# uniques = data[comedian].nonzero()[0].size (deprecated)
uniques = data[comedian].to_numpy().nonzero()[0].size
The code:
re.sub('[.*?]', '', text)
Will consume all content between an opening '[' and a closing ']'. This means something like this:
one [two] three [four] five
would become
one five
Something like this RE would be better (IMHO)
re.sub('[[^]]*]','',text
P.S. Great talk!
I went through this tutorial on 2019-07-07 with vanilla installs of recommended software applications and updates on a Windows 10 Home x64 machine and encountered a deprecation warning when running 3-Sentiment-Analysis.ipynb in the jupyter notebook.
The ymin
argument was deprecated in Matplotlib 3.0 and will be removed in 3.2.
The ymax
argument was deprecated in Matplotlib 3.0 and will be removed in 3.2.
Update code in 3-Sentiment-Analysis.ipynb from original:
plt.ylim(ymin=-.2, ymax=.3)
to
# plt.ylim(ymin=-.2, ymax=.3) (deprecated)
plt.ylim(bottom=-.2, top=.3)
dtm_stop.pkl , how you created it , I am getting error in that
Hi,
I would like to know what went wrong in the below code. I'm supposed to get a list of texts from the transcripts that were scrapped from the below urls. But all I get is an empty list. Any help would be much appreciated !
The code:
`def url_to_transcript(url):
'''Returns transcript data specifically from scrapsfromtheloft.com.'''
page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")
text = [p.text for p in soup.find(class_="elementor-widget-container").find_all('p')]
print(url)
return text
urls = ['http://scrapsfromtheloft.com/2017/05/06/louis-ck-oh-my-god-full-transcript/',
'http://scrapsfromtheloft.com/2017/04/11/dave-chappelle-age-spin-2017-full-transcript/',
'http://scrapsfromtheloft.com/2018/03/15/ricky-gervais-humanity-transcript/',
'http://scrapsfromtheloft.com/2017/08/07/bo-burnham-2013-full-transcript/',
'http://scrapsfromtheloft.com/2017/05/24/bill-burr-im-sorry-feel-way-2014-full-transcript/',
'http://scrapsfromtheloft.com/2017/04/21/jim-jefferies-bare-2014-full-transcript/',
'http://scrapsfromtheloft.com/2017/08/02/john-mulaney-comeback-kid-2015-full-transcript/',
'http://scrapsfromtheloft.com/2017/10/21/hasan-minhaj-homecoming-king-2017-full-transcript/',
'http://scrapsfromtheloft.com/2017/09/19/ali-wong-baby-cobra-2016-full-transcript/',
'http://scrapsfromtheloft.com/2017/08/03/anthony-jeselnik-thoughts-prayers-2015-full-transcript/',
'http://scrapsfromtheloft.com/2018/03/03/mike-birbiglia-my-girlfriends-boyfriend-2013-full-transcript/',
'http://scrapsfromtheloft.com/2017/08/19/joe-rogan-triggered-2016-full-transcript/']
comedians = ['louis', 'dave', 'ricky', 'bo', 'bill', 'jim', 'john', 'hasan', 'ali', 'anthony', 'mike', 'joe']`
It should be 'elementor-widget-theme-post-content' instead of 'post-content'
I ran your code with python3 on Colab and I got some errors.
Error:
in transcriptMaker(url)
8 page = requests.get(url).text
9 soup = BeautifulSoup(page, "lxml")
---> 10 text = [p.text for p in soup.find(class_="post-content").find_all('p')]
11 print(url)
12 return text
AttributeError: 'NoneType' object has no attribute 'find_all'
Thanks for your great job
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.