Are there any examples of saving multiple files? For example, saving multiple images f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is it too much to ask for an example project such <a href="https://github.com/oltarase

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Downloading Files about crawly HOT 11 CLOSED

elixir-crawly commented on May 14, 2024

Downloading Files

from crawly.

Comments (11)

Ziinc commented on May 14, 2024 1

https://stackoverflow.com/questions/30267943/elixir-download-a-file-image-from-a-url

Use a custom pipeline to manage the downloading . In your spider, scrape the media urls and pass it as a nested map key. Then pattern match on it.

https://hexdocs.pm/crawly/basic_concepts.html#custom-item-pipelines.

Crawly processes the items sequentially, but for long downloads you might want to offload it to a queue or use an async Task to download it.

from crawly.

Ziinc commented on May 14, 2024 1

@oltarasenko sounds like a good idea, i'll think a bit more about the api and update here. I should have time for it in the coming weeks.

@s0kil i think it would be more appropriate to have a how-to article in the docs. There are some inherent issues with having many example repos, such as maintenance and keeping them in sync.

from crawly.

michaltrzcinka commented on May 14, 2024 1

shall we create a pipeline capable for autodowloading images? e.g. {Crawly.Pipelines.DownloadMedia, field: image, dest: /folder_name}?

Just a heads-up - I've started working on such generic pipeline today.

from crawly.

s0kil commented on May 14, 2024

If you do not mind, could you also mention streaming large files to disk.

from crawly.

oltarasenko commented on May 14, 2024

@s0kil I think @Ziinc gave a good answer, pipeline is a good way to go! Otherwise, in my own projects, I am downloading media from the parse_item callback directly. Crawly is a queue management system itself, so technically your worker will just spend a bit more time downloading the image, that's it.

@Ziinc shall we create a pipeline capable for autodowloading images? e.g. {Crawly.Pipelines.DownloadMedia, field: image, dest: /folder_name}?

from crawly.

s0kil commented on May 14, 2024

@oltarasenko Will downloading a larger file in parse_item block the spider from continuing to crawl and parse?

from crawly.

oltarasenko commented on May 14, 2024

No, it does not block the Crawly itself, just one worker which is downloading something, but all other workers are operational. (Comparing it with Scrapy, where non-reactor based downloads will block the world, Crawly operates without problems)

from crawly.

s0kil commented on May 14, 2024

Is it too much to ask for an example project such https://github.com/oltarasenko/crawly-spider-example, saving the blog posts into each individual file?

from crawly.

Ziinc commented on May 14, 2024

@s0kil could you give some info on how you are working around the downloading of files now?

from crawly.

s0kil commented on May 14, 2024

@Ziinc I could not get it working yet.

from crawly.

Ziinc commented on May 14, 2024

@oltarasenko I will implement a generic supervised task execution process as mentioned here #88 (comment) for pipelines to hook into.

from crawly.

Downloading Files about crawly HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent