I believe a on_finish/0 optional callback would be very beneficial (I personnaly need

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

on_finish callback for spiders about crawly HOT 12 CLOSED

AoRakurai commented on May 14, 2024

on_finish callback for spiders

from crawly.

Comments (12)

Ziinc commented on May 14, 2024 1

I agree having event callbacks are useful and that a concerted effort to cover most/all events would be good. However I don't agree with having this as a specific callback on the spider.

If anything, this feels like something that should live in the settings, as an option. Then, with the custom_settings callback being implemented in #69 , hooking into the events would be as simple as doing

# assuming the eventually implemented callback expects a map of the settings... 
def custom_settings(global) do
    %{global | on_spider_stop: &MyApp.my_callback/1 } 
end

Where the my_callback function receives the spider state.

This allows for the dev to do a default callback, then override as necessary.

from crawly.

oltarasenko commented on May 14, 2024 1

Ok, now the related PR is merged. I will run the release procedure soon.

from crawly.

Ziinc commented on May 14, 2024

Seems easy for the Engine to run code that is beyond Crawly's context.

What's wrong with polling the Engine for the Spider status? Seems like a good solution to me.

Another alternative is for the Engine to push messages on the spider status to a GenServer of choice.

from crawly.

oltarasenko commented on May 14, 2024

@Ziinc I agree with @AoRakurai regarding the fact that it's better not to poll the status if it's possible. As I understand the desired function is to call a custom callback once the spider is exiting. It's similar to what we can find in Scrapy. I was using it quite a lot when I was working with python and Scrapy, and I can agree it's useful. Maybe we should consider introducing all callbacks from the page above...

from crawly.

AoRakurai commented on May 14, 2024

That would make Crawly a good fit for a lot more use cases I believe. The fact there was no on_finish callback forced me fork it temporarily so I could keep using it in my project.

from crawly.

AoRakurai commented on May 14, 2024

I agree that assuming settings can be implemented per spider, being able to define a on_spider_stop in settings would be ideal.

from crawly.

Ziinc commented on May 14, 2024

Furthermore, I feel that these callback functions should be fired off using async Tasks, instead of synchronously.

https://hexdocs.pm/elixir/Task.html

from crawly.

oltarasenko commented on May 14, 2024

Furthermore, I feel that these callback functions should be fired off using async Tasks, instead of synchronously.

https://hexdocs.pm/elixir/Task.html

Could you explain it?

from crawly.

Ziinc commented on May 14, 2024

@oltarasenko since there isn't a need for the the callback to interact with the main engine process, the spawned Task process can be executed concurrently as a side effect.

A few benefits is:

non-blocking
thrown errors don't crash the main engine process
have its own supervisor/tree, if necessary.

There's quite a few patterns for using Tasks in the docs, but I think just firing the callbacks as Task.async/1 will suffice.

from crawly.

Ziinc commented on May 14, 2024

Actually the docs recommends using a supervisor to oversee Tasks execution. So each callback can be executed as a Task when the event occurs.

For example, we recommend developers to always start tasks under a supervisor. This provides more visibility and allows you to control how those tasks are terminated when a node shuts down.

https://hexdocs.pm/elixir/Task.html#module-ancestor-and-caller-tracking

from crawly.

oltarasenko commented on May 14, 2024

Ok, I think this will be the last issue which will form the 0.9.0 Crawly release. I will prepare a rollout as soon as we will find the common vision on the solution of the problem.

from crawly.

oltarasenko commented on May 14, 2024

@AoRakurai This is released now!

from crawly.

on_finish callback for spiders about crawly HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent