The aparatdownloader from edrisranjbar

Add a requirements file

سلام فایل requirments.txt
میزاشتین عالی میشد

write a good README file

we should update the readme file based and get some idea from other public opensource repos.

some of the implementations we want to add:

add a better description
add license badge
add contributes list as a table
update contributing section
improve installation guideline
add language and tech used in the project

Selenium4 compatibility problem

We are getting error below when trying to run the script.
DeprecationWarning: executable_path has been deprecated, please pass in a Service object

Problem

notice that we are using a very old version of selenium in project (since I wrote this project 3 years ago😎) and the new version of selenium had some changes that forces us to use chrome driver Service; so we need to migrate our code to the new selenium.

A proper solution

we should do the import stuff and change our code to support Google chrome services.
Here's the code for importing the new Service thing

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

and then for Service implementation:

browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

use a better way of waiting for page to be loaded

currently we are using time.sleep() method to tell system to wait for X seconds and after that run the next line of code. but we need a proper way to tell system wait until page loaded.

Problem

we do not know exactly how much would it take to load up Aparat pages. so putting a large number as sleep value, would make the script totally inefficient.

Solution

there is a proper way to do such a thing. we can use WebDriverWait which exists in selenium.webdriver.support.ui and takes 2 argument, first one would be the driver, and the second one is timeout value.

Use PyQt5 for GUI

for several reasons I prefer PyQT5 over tkinter, so we need to ship the code to pyqt. pyqt also has a Graphic designer that we can drag and drop the UI, then just import the UI file in the python script and add functionality to that.

selenium Attribute Error while trying to download

when we click on download button on GUI; we get an error in the terminal that says AttributeError: 'WebDriver' object has no attribute 'find_elements_by_css_selector'. the whole error log is below:

DevTools listening on ws://127.0.0.1:2621/devtools/browser/4a3d03fb-d71e-4bf1-93e9-67f63797b891
[6084:5316:0803/110134.485:ERROR:device_event_log_impl.cc(214)] [11:01:34.485] Bluetooth: bluetooth_adapter_winrt.cc:1074 Getting Default Adapter failed.
Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\edris\AppData\Local\Programs\Python\Python310\lib\tkinter\__init__.py", line 1921, in __call__
    return self.func(*args)
  File "C:\Users\edris\Desktop\Projects\AparatDownloader\AparatDownloader.py", line 80, in download
    downloader.downloadFromPlayList(txtUrl.get())
  File "C:\Users\edris\Desktop\Projects\AparatDownloader\AparatDownloader.py", line 54, in downloadFromPlayList
    links = browser.find_elements_by_css_selector(
AttributeError: 'WebDriver' object has no attribute 'find_elements_by_css_selector'
[6084:11296:0803/110232.263:ERROR:util.cc(127)] Can't create base directory: C:\Program Files\Google\GoogleUpdater
Traceback (most recent call last):
  File "C:\Users\edris\Desktop\Projects\AparatDownloader\AparatDownloader.py", line 94, in <module>
    root.mainloop()
  File "C:\Users\edris\AppData\Local\Programs\Python\Python310\lib\tkinter\__init__.py", line 1458, in mainloop
    self.tk.mainloop(n)
KeyboardInterrupt

What the hack is wrong Here? 🙄

Actually selenium 3.0 uses selectors in a different way that it used to be 3 years a go. so we need to ship to code to the new version.

How to fix the issue?

as Docs says we should import from selenium.webdriver.common.by import By and use browser.find_element(By.XPATH,selector)

write unit tests

we should write tests to make sure that the scrapper still works fine.

TODO:

Here are some steps to take

test that all requirements are met
test that URL validation works
test we can get download link of a single video from URL
test that app returns proper error when URL does not exists
test that we can download and save the actual video file
test that we can get all video links from a playlist
test that the given URL is a playlist
test that we can get all resolutions available for a video

Project is extremely focused on chrome

The script does not work with arch linux and firefox.
It's very difficult to make this script run.
I suggest basing the project on youtube-dl.

Migrate from selenium to BeautifulSoup

we should migrate from selenium to Beautiful Soup python library to make the script faster and more efficient. the difference between these two libraries is that selenium uses browser driver to open up a web page in a real browser on machine and crawl in it; but with requests and bs4; we can easily do the web scrapping stuff and return back the required data from downloading process.

Here's the documentation link to Beautiful Soup (bs4) library, Click here!

INVALID: Open up browser immediately after running the script

right now, after running the script, it's going to open a new chrome browser with an empty tab; which is not good at all. because the user looses focus from GUI app. so we need to open browser after user clicks download.
This way user stays focused on what his going to do.

Separate core scrapping functionality from GUI

we need to refactor codes so that GUI codes would be separated from the rest of the code. in our case we can create a new class called Scrapper and put all of the codes about curling and web scrapping inside that class. then we can import modules from that class as we want to.

TODO:

Here are some of the steps we should take in order to make the codes cleaner.

have separate Core class for scrapping called Scrapper
put scrapper in another file and import in AparatDownloader.py
rename methods
refactor Scrapper class
main file should only contains GUI stuff

[Docs]: Dependencies not complete in ReadMe.md and bloated in requirements.txt

The README.md doesn't include a full list of dependencies that are required.
For example it's also required to install webdriver_manager.

Therefore i tried to look into the requirements.txt which includes a lot of dependencies that are not even required for testing.

Suggestion:
Add every important module that is required for running into requirements.txt and make a secondary file file for testing requirements.

edrisranjbar / aparatdownloader Goto Github PK

aparatdownloader's Introduction

Aparat Downloader

Contributing

Requirements

Installation

aparatdownloader's People

Contributors

Stargazers

Watchers

Forkers

aparatdownloader's Issues

Problem

A proper solution

Problem

Solution

What the hack is wrong Here? 🙄

How to fix the issue?

TODO:

TODO:

Recommend Projects

Recommend Topics

Recommend Org