Giter Club home page Giter Club logo

Comments (12)

Ziinc avatar Ziinc commented on May 14, 2024 1

I agree having event callbacks are useful and that a concerted effort to cover most/all events would be good. However I don't agree with having this as a specific callback on the spider.

If anything, this feels like something that should live in the settings, as an option. Then, with the custom_settings callback being implemented in #69 , hooking into the events would be as simple as doing

# assuming the eventually implemented callback expects a map of the settings... 
def custom_settings(global) do
    %{global | on_spider_stop: &MyApp.my_callback/1 } 
end

Where the my_callback function receives the spider state.

This allows for the dev to do a default callback, then override as necessary.

from crawly.

oltarasenko avatar oltarasenko commented on May 14, 2024 1

Ok, now the related PR is merged. I will run the release procedure soon.

from crawly.

Ziinc avatar Ziinc commented on May 14, 2024

Seems easy for the Engine to run code that is beyond Crawly's context.

What's wrong with polling the Engine for the Spider status? Seems like a good solution to me.

Another alternative is for the Engine to push messages on the spider status to a GenServer of choice.

from crawly.

oltarasenko avatar oltarasenko commented on May 14, 2024

@Ziinc I agree with @AoRakurai regarding the fact that it's better not to poll the status if it's possible. As I understand the desired function is to call a custom callback once the spider is exiting. It's similar to what we can find in Scrapy. I was using it quite a lot when I was working with python and Scrapy, and I can agree it's useful. Maybe we should consider introducing all callbacks from the page above...

from crawly.

AoRakurai avatar AoRakurai commented on May 14, 2024

That would make Crawly a good fit for a lot more use cases I believe. The fact there was no on_finish callback forced me fork it temporarily so I could keep using it in my project.

from crawly.

AoRakurai avatar AoRakurai commented on May 14, 2024

I agree that assuming settings can be implemented per spider, being able to define a on_spider_stop in settings would be ideal.

from crawly.

Ziinc avatar Ziinc commented on May 14, 2024

Furthermore, I feel that these callback functions should be fired off using async Tasks, instead of synchronously.

https://hexdocs.pm/elixir/Task.html

from crawly.

oltarasenko avatar oltarasenko commented on May 14, 2024

Furthermore, I feel that these callback functions should be fired off using async Tasks, instead of synchronously.

https://hexdocs.pm/elixir/Task.html

Could you explain it?

from crawly.

Ziinc avatar Ziinc commented on May 14, 2024

@oltarasenko since there isn't a need for the the callback to interact with the main engine process, the spawned Task process can be executed concurrently as a side effect.

A few benefits is:

  • non-blocking
  • thrown errors don't crash the main engine process
  • have its own supervisor/tree, if necessary.

There's quite a few patterns for using Tasks in the docs, but I think just firing the callbacks as Task.async/1 will suffice.

from crawly.

Ziinc avatar Ziinc commented on May 14, 2024

Actually the docs recommends using a supervisor to oversee Tasks execution. So each callback can be executed as a Task when the event occurs.

For example, we recommend developers to always start tasks under a supervisor. This provides more visibility and allows you to control how those tasks are terminated when a node shuts down.

https://hexdocs.pm/elixir/Task.html#module-ancestor-and-caller-tracking

from crawly.

oltarasenko avatar oltarasenko commented on May 14, 2024

Ok, I think this will be the last issue which will form the 0.9.0 Crawly release. I will prepare a rollout as soon as we will find the common vision on the solution of the problem.

from crawly.

oltarasenko avatar oltarasenko commented on May 14, 2024

@AoRakurai This is released now!

from crawly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.