Comments (28)
I've tried with overwriting on and off. I opened the .JSON file and it just shows 1 big line of everything so i filtered it to just find how many "https" there was in the whole .json file. It returned with 2,598 matches so im guessing it got the links to 2598 videos where on the OF pages it shows 2,680 so from this i guess some are private...
I ran using the arguments B and C to try eliminate videos and images getting mixed.
Images it managed to download 2404. on the OF page there is 15,981. Then i used the same method in the .JSON file and it shows there was 15,981 "https" matches so script is scraping the links correctly as shown in the JSON files but not all the links are getting downloaded.
from ultimascraper.
Can you also do me a favour and sort the links by the Post ID and check the date column to see what is the earliest date for the links? I'm willing to bet it stops before early 2017.
from ultimascraper.
Jesus, that's a lot. Try with overwriting on. Have you checked if the link JSON contains 2600 videos or just 350?
Also, some videos get downloaded into the image folder. OnlyFans changed them how their API works so I need to fix that. 😢
from ultimascraper.
What you can do is using the below link to convert the JSON file to an excel spreadsheet. You should be able to accurately find out how links are available for download.
you can copy the links to a download manager to make it download faster and it's also resumable as well. I'm using Internet Download Manager and it works really well.
from ultimascraper.
@kave333
I've fixed the image and video mixing issue but I'm not ready to push the commit yet.
I'm also overhauling the json exporting.
@dilemmax I found corrupted posts in 2017 that wouldn't download. It was due to them having the same filename. It WAS the same photo but OnlyFans decided to duplicate them...
Since the script is multithreaded, the script downloads both files with the same name at the same time which causes them to overwrite each other and become corrupted. That's why you'll see fewer files than you accounted for. To fix this I'll just tell the script to search for duplicates and rename them and it should solve the problem.
from ultimascraper.
I also found a problem with the math in my script that would get 1 page less when rounding to the floor.
Fixed this as well. Few more tests and I'll push a commit.
from ultimascraper.
Pushed the commit. Open/reopen if you still got a problem.
from ultimascraper.
@dilemmax anyway i can contact you to discuss the IDM further?
from ultimascraper.
@kave333 sure can. my email address is [email protected]
from ultimascraper.
I also found a problem with the math in my script that would get 1 page less when rounding to the floor.
Fixed this as well. Few more tests and I'll push a commit.
Thanks DigitalCriminal, I'll test it out now
from ultimascraper.
@DIGITALCRIMINAL, I don't think I can re-open this issue since I'm not the OP. I'm still having the same problem. It's still scrapping the same video links but I'm think maybe the Content Creator may posted the videos at some stage and then made it private? Just a theory.
from ultimascraper.
Hmm, what's the total number of videos on the account @dilemmax because I've noticed this with another account
from ultimascraper.
312 videos but it only scrapped 307 video links
from ultimascraper.
@dilemmax have you checked the archive.json to check if there are any invalid links?
from ultimascraper.
Is archive.json in the same folder as links.json?
from ultimascraper.
username/metadata/archive.json
from ultimascraper.
okay thanks. Nope it didn't create one.
from ultimascraper.
Are you using the latest release because the script doesn't create links.json anymore
from ultimascraper.
yeah I am. Sorry I meant videos.json
from ultimascraper.
@dilemmax also, my bad. The archive.json is for a different module and images/videos/audio.json is correct. It seems like we have the same problem, but for me, it's for images. I'm going to look into it further now though to see what's happening.
from ultimascraper.
Okay no worries. I wish I have enough programming knowledge to help you troubleshoot this problem.
from ultimascraper.
Do you think it has something to do with pinned posts? There are 4 pinned posts in mine but 307+4 doesn't equal 312
from ultimascraper.
@dilemmax It's all good, you've helped a lot c:
And Nah it's not the pinned posts because mine is in the .json
I honestly just think that the posts are private. The missing content isn't even in the API request so there's nothing we can do...
from ultimascraper.
@dilemmax
Added an option to export to csv instead of json in the latest release.
from ultimascraper.
Thanks @DIGITALCRIMINAL! That would help a lot! While you're at it, is there anyway to strip away text in angle brackets and unicode text from the "text" column when scrapping to a json/csv file?
Currently, I open the json file in Word and use the wildcard search function to delete any text starting and ending with angle brackets, replace \n with a space etc. Then convert it to a spreadsheet and make the relevant changes before creating a batch file to rename the files with relevant titles.
Basically I have to do a lot before I end up with something like this
"ren "5d9ec0518452aaf05bd2d.mp4" "[2019-10-11] 194 - HOOK UP P2 JAYJAY & MM ENJOY!.mp4""
from ultimascraper.
Hope I make sense!
from ultimascraper.
@dilemmax open new issue for this please
from ultimascraper.
Will do :)
from ultimascraper.
Related Issues (20)
- Scraper starts but crashes after a few models HOT 1
- can't get SUB name now
- Unable to scrape all Fansly performer posts HOT 4
- won't get user profile HOT 2
- Add clear tutorial HOT 2
- Fansly choosing subscription crashes
- Failed update and Ultima Scraper Collection
- [Fansly] Not pulling new content after subscribing to previously followed model HOT 4
- Limit of 167 subscriptions in Onlyfans HOT 1
- Unable to get the auth information by following the guide HOT 2
- ModuleNotFoundError: No module named 'poetry.console'
- MP4Decrpyt in bin folder, exe can't find it.
- keyerror: 'data' HOT 3
- No module named 'requests' HOT 1
- Docker image fails to build due to psycopg2 build failure.
- websockets required but not install by poetry/pyproject.toml
- Crash right after login HOT 2
- No downloads after running file, stuck in a loop HOT 2
- How to get the fansly cookie? HOT 1
- Missing active subscriptions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultimascraper.