Comments (8)
@aejuice-github
OK, we'll look into (1) and post back.. Thanks for letting us know.
from aws-kendra-transcribe-media-search.
from aws-kendra-transcribe-media-search.
@rstrahan 1. Incorrect. Your playlist works because results are cached. According to your documentation, you do not parse videos twice. It does not work with any other playlist, I've tried plenty. In fact, we've debugged why it does not work. Your app uses a package pytube which is outdated. YouTube has changed the link structure and it cannot get a video. You can find more info and error message at https://stackoverflow.com/questions/68945080/pytube-exceptions-regexmatcherror-get-throttling-function-name-could-not-find
Here is a link to the playlist https://www.youtube.com/playlist?list=PLr7J3R1sT1C5pcB_xuhH1cTV9W19z1iDk
I hope you'll be able to resolve it. We would be very interested in using the service.
- You're correct. The issue is resolved.
from aws-kendra-transcribe-media-search.
Confirming that I can easily repro the problem..
Referring to colleague who implemented this feature. Tx.
from aws-kendra-transcribe-media-search.
Regarding issue (1). Post investigation
The solution "does not" cache a playlist. When the playlist is changed the solution will index the videos per the new playlist (if they have not been indexed prior).
The issue that is currently causing the stack to fail is with pytube version 15.0.0. Issue - > (pytube/pytube#1707).
We are working to fix this in the interim while a permanent fix is made to the pytube main branch.
Also if you do not want to index the YT media, you could leave the playlist empty and only mention the S3 bucket source where your media is stored then the stack deploys successfully and indexes the media from s3.
from aws-kendra-transcribe-media-search.
Tested v0.3.1 which contains the pytube 15.0.0. fix.
I am now able to index youtube videos on the YT playlist provided as default value of the CFN template parameter. Also able to change the playlist to the playlist quoted in the issue above https://www.youtube.com/playlist?list=PLr7J3R1sT1C5pcB_xuhH1cTV9W19z1iDk. And the indexer picks up the new videos as well.
This issue is fixed and can be closed.
from aws-kendra-transcribe-media-search.
Release v0.3.1 based on your PR @roshansthomas addresses this issue. Updated artifacts published to the public S3 bucket.
(The fix is temporary, and applies only to the current version of pytube, 5.0.0. We expect that the next release of pytube will address the issue officially)
@aejuice-github please deploy again and report back if you encounter any remaining issues. Thanks again for letting us know about the problem.
from aws-kendra-transcribe-media-search.
Closing issue as the the solution now uses yt_dlp package
from aws-kendra-transcribe-media-search.
Related Issues (5)
- Pytube package errors out in the latest build 15.0.0 while downloading audio from YouTube HOT 1
- YouTube Indexer Lambda function downloads already indexed and downloaded media HOT 1
- At times when a YouTube video within a playlist becomes un-available, the playlist is not updated. On downloading this video, the ytindexer errors out HOT 1
- MediaSearch Finder app deployment in Code Build fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-kendra-transcribe-media-search.