Giter Club home page Giter Club logo

chromiumoxide's Introduction

chromiumoxide

Build Crates.io Documentation

chromiumoxide provides a high-level and async API to control Chrome or Chromium over the DevTools Protocol. It comes with support for all types of the Chrome DevTools Protocol and can launch a headless or full (non-headless) Chrome or Chromium instance or connect to an already running instance.

Usage

use futures::StreamExt;

use chromiumoxide::browser::{Browser, BrowserConfig};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    
   // create a `Browser` that spawns a `chromium` process running with UI (`with_head()`, headless is default) 
   // and the handler that drives the websocket etc.
    let (mut browser, mut handler) =
        Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
    
   // spawn a new task that continuously polls the handler
    let handle = async_std::task::spawn(async move {
        while let Some(h) = handler.next().await {
            if h.is_err() {
                break;
            }
        }
    });
    
   // create a new browser page and navigate to the url
    let page = browser.new_page("https://en.wikipedia.org").await?;
    
   // find the search bar type into the search field and hit `Enter`,
   // this triggers a new navigation to the search result page
   page.find_element("input#searchInput")
           .await?
           .click()
           .await?
           .type_str("Rust programming language")
           .await?
           .press_key("Enter")
           .await?;

   let html = page.wait_for_navigation().await?.content().await?;
   
    browser.close().await?;
    handle.await;
    Ok(())
}

The current API still lacks some functionality, but the Page::execute function allows sending all chromiumoxide_types::Command types (see Generated Code). Most Element and Page functions are basically just simplified command constructions and combinations, like Page::pdf:

pub async fn pdf(&self, params: PrintToPdfParams) -> Result<Vec<u8>> {
     let res = self.execute(params).await?;
     Ok(base64::decode(&res.data)?)
 }

If you need something else, the Page::execute function allows for writing your own command wrappers. PRs are very welcome if you think a meaningful command is missing a designated function.

Add chromiumoxide to your project

chromiumoxide comes with support for the async-std and tokio runtime.

By default chromiumoxide is configured with async-std.

Use chromiumoxide with the async-std runtime:

chromiumoxide = { git = "https://github.com/mattsse/chromiumoxide", branch = "main"}

To use the tokio runtime instead add features = ["tokio-runtime"] and set default-features = false to disable the default runtime (async-std):

chromiumoxide = { git = "https://github.com/mattsse/chromiumoxide", features = ["tokio-runtime"], default-features = false, branch = "main"}

This configuration is made possible primarily by the websocket crate of choice: async-tungstenite.

Generated Code

The chromiumoxide_pdl crate contains a PDL parser, which is a rust rewrite of a python script in the chromium source tree and a Generator that turns the parsed PDL files into rust code. The chromiumoxide_cdp crate only purpose is to invoke the generator during its build process and include the generated output before compiling the crate itself. This separation is done merely because the generated output is ~60K lines of rust code (not including all the proc macro expansions). So expect the compiling to take some time. The generator can be configured and used independently, see chromiumoxide_cdp/build.rs.

Every chrome pdl domain is put in its own rust module, the types for the page domain of the browser_protocol are in chromiumoxide_cdp::cdp::browser_protocol::page, the runtime domain of the js_protocol in chromiumoxide_cdp::cdp::js_protocol::runtime and so on.

vanilla.aslushnikov.com is a great resource to browse all the types defined in the pdl files. This site displays Command types as defined in the pdl files as Method. chromiumoxid sticks to the Command nomenclature. So for everything that is defined as a command type in the pdl (=marked as Method on vanilla.aslushnikov.com) chromiumoxide contains a type for command and a designated type for the return type. For every command there is a <name of command>Params type with builder support (<name of command>Params::builder()) and its corresponding return type: <name of command>Returns. All commands share an implementation of the chromiumoxide_types::Command trait. All Events are bundled in single enum (CdpEvent)

Fetcher

By default chromiumoxide will try to find an installed version of chromium on the computer it runs on. It is possible to download and install one automatically for some platforms using the fetcher.

Ther features are currently a bit messy due to a Cargo bug and will be changed once it is resolved. Based on your runtime and TLS configuration you should enable one of the following:

  • _fetcher-rustls-async-std
  • _fetcher-rusttls-tokio
  • _fetcher-native-async-std
  • _fetcher-native-tokio
use std::path::Path;

use futures::StreamExt;

use chromiumoxide::browser::{BrowserConfig};
use chromiumoxide::fetcher::{BrowserFetcher, BrowserFetcherOptions};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let download_path = Path::new("./download");
    async_std::fs::create_dir_all(&download_path).await?;
    let fetcher = BrowserFetcher::new(
        BrowserFetcherOptions::builder()
            .with_path(&download_path)
            .build()?,
    );
    let info = fetcher.fetch().await?;

    let config = BrowserConfig::builder()
        .chrome_executable(info.executable_path)
        .build()?,
}

Known Issues

  • The rust files generated for the PDL files in chromiumoxide_cdp don't compile when support for experimental types is manually turned off (export CDP_NO_EXPERIMENTAL=true). This is because the use of some experimental pdl types in the *.pdl files themselves are not marked as experimental.

Troubleshooting

Q: A new chromium instance is being launched but then times out.

A: Check that your chromium language settings are set to English. chromiumoxide tries to parse the debugging port from the chromium process output and that is limited to english.

License

Licensed under either of these:

References

chromiumoxide's People

Contributors

alexstorm1313 avatar bobajeff avatar chirok11 avatar d4h0 avatar dcjanus avatar demurgos avatar dfrankland avatar djc avatar ernestas-poskus avatar escritorio-gustavo avatar hackermondev avatar hakr avatar j-mendez avatar janu-cambrelen avatar jtplouffe avatar jvatic avatar krant avatar mattsse avatar mirror-kt avatar mozgiii avatar ongchi avatar privaterookie avatar ryo33 avatar sandmail32 avatar shulcsm avatar starrify avatar sytten avatar t4rp avatar try876 avatar williamvenner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chromiumoxide's Issues

Add the same options for screenshots as Puppeteer

Would it be of any interest to add the same options for screenshots that Puppeteer has? Specifically, this is the option to take full-page screenshots and omit the page background.

Here's the options, which are the same as CaptureScreenshotParams, but with the added fullPage and omitBackground properties:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L141-L149

Here's where full-page and omitting of background screenshots happen:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L1657-L1725

An example of taking a full-page screenshot:

use chromiumoxide::{
    browser::{Browser, BrowserConfig},
    cdp::browser_protocol::{
        dom::Rect,
        emulation::{ScreenOrientation, ScreenOrientationType, SetDeviceMetricsOverrideParams},
        page::{CaptureScreenshotFormat, CaptureScreenshotParams, GetLayoutMetricsReturns},
    },
};
use futures::StreamExt;

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (browser, mut handler) = Browser::launch(BrowserConfig::builder().build()?).await?;

    async_std::task::spawn(async move {
        loop {
            let _event = handler.next().await.unwrap();
        }
    });

    let page = browser.new_page("https://en.wikipedia.org").await?;

    page.wait_for_navigation().await?;

    // Set the emulated device to the same size as the page
    let GetLayoutMetricsReturns {
        content_size: Rect { width, height, .. },
        ..
    } = page.layout_metrics().await?;
    page.execute(
        SetDeviceMetricsOverrideParams::builder()
            .mobile(false)
            .width(width.ceil() as i64)
            .height(height.ceil() as i64)
            .device_scale_factor(1.0)
            .screen_orientation(
                ScreenOrientation::builder()
                    .angle(0)
                    .r#type(ScreenOrientationType::PortraitPrimary)
                    .build()?,
            )
            .build()?,
    )
    .await?;

    page.save_screenshot(
        CaptureScreenshotParams::builder()
            .format(CaptureScreenshotFormat::Png)
            .build(),
        "wiki.png",
    )
    .await?;

    Ok(())
}

Closing a page should also close the receiver half of the message channel in the target

I'm not sure if waiting for Target.targetdestroyed is necessary if assuming target eventually gets cleaned up. Page gets consumed and if you are dealing with raw ids you are on your own.

Agree. But there is a potential race condition for following scenario:

let el =  page.find_element("input#searchInput").await?;
...
page.close().await?;
...
// this will fail because page is already closed.
el.click().await?;

This el.click() will fail regardless, but at the moment the reason it fails is because the click command is sent to the browser and the browser returns an error. This happens because the Target of the Page is not removed yet and is still able to receive events sent from an Element. Preferably the el.click() should fail with a SendError because the receiver should already be dropped at this point, preventing the overhead of sending a request that is guaranteed to fail.

I think it's more important to figure out a way to just destroy the page target since for majority of cases you don't care about unload and just want to clean up.

This can be done with Target.closeTarget instead, that's how puppeteer does it.

Originally posted by @mattsse in #12 (comment)

How to use contexts etc?

I went to examples and couldn't find any code which creates contexts etc?

I want to translate this code

let browser = await firefox.launch({headless:false})
let context = await browser.newContext();
let page = await context.newPage();
await page.goto("https://jsonip.com");

Browser launch viewport option

In nodejs if start Puppeteer with a null viewport, the viewport is set to fit the window size.

await puppeteer.launch( {
  defaultViewport : null //viewport auto adjust to window size
} )

So i suggest

BrowserConfigBuilder::viewport(vp:Viewport)

->

BrowserConfigBuilder::viewport(vp:Option<Viewport>)

unresolved import `crate::browser::process::get_chrome_path_from_registry

windows 10
cargo run
Updating git repository https://github.com/mattsse/chromiumoxide
Updating git repository https://github.com/sdroege/async-tungstenite
Updating crates.io index
Blocking waiting for file lock on package cache
Blocking waiting for file lock on package cache
Downloaded input_buffer v0.4.0
Downloaded tungstenite v0.12.0
Downloaded 2 crates (62.1 KB) in 2.28s
Blocking waiting for file lock on package cache
Compiling syn v1.0.60
Compiling winapi v0.3.9
Compiling getrandom v0.2.2
Compiling input_buffer v0.4.0
Compiling rand_core v0.6.1
Compiling rand_chacha v0.3.0
Compiling rand v0.8.3
Compiling nb-connect v1.0.2
Compiling winapi-util v0.1.5
Compiling async-process v1.0.2
Compiling atty v0.2.14
Compiling termcolor v1.1.2
Compiling ctor v0.1.19
Compiling serde_derive v1.0.123
Compiling futures-macro v0.3.12
Compiling thiserror-impl v1.0.23
Compiling async-attributes v1.1.2
Compiling value-bag v1.0.0-alpha.6
Compiling log v0.4.14
Compiling futures-util v0.3.12
Compiling polling v2.0.2
Compiling kv-log-macro v1.0.7
Compiling tungstenite v0.12.0
Compiling env_logger v0.7.1
Compiling thiserror v1.0.23
Compiling async-io v1.3.1
Compiling which v4.0.2
Compiling pretty_env_logger v0.4.0
Compiling async-global-executor v2.0.2
Compiling async-std v1.9.0
Compiling futures-executor v0.3.12
Compiling futures v0.3.12
Compiling serde v1.0.123
Compiling serde_json v1.0.62
Compiling async-tungstenite v0.12.0 (https://github.com/sdroege/async-tungstenite#f51f0744)
Compiling chromiumoxide_types v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_pdl v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_cdp v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
error[E0432]: unresolved import crate::browser::process::get_chrome_path_from_registry
--> C:\Users\52752.cargo\bin\git\checkouts\chromiumoxide-b0a6fd9b32d92502\7d7ef5a\src\browser.rs:560:13
|
560 | use crate::browser::process::get_chrome_path_from_registry;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no get_chrome_path_from_registry in process

error: aborting due to previous error

For more information about this error, try rustc --explain E0432.
error: could not compile chromiumoxide

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

F:\web\xiaolu\rust\greetings\web>

Allow disabling of viewport emulation

Currently, the viewport will emulate a specified size or the default (800x600).
I'd like to be able to disable this when using the Browser::connect() as it looks weird otherwise.

Screenshot_2021-01-04_20-45-15

  • Maybe we can pass a BrowserConfig as a second parameter to Browser::connect()
    • And add disable_viewport_emulation field to the Viewport struct (set to true to disable viewport emulation.)

Similar to what Puppeteer does when you set defaultViewport to null.

find_element and CSS selector

Hello

I just start playing with this tool.
But i can't get any selector to work beside [name='j_username']

th code bellow will faile with error "-32000: DOM Error while querying"
same thing for [id='j_username']

did i miss somthing?

let page = browser.new_page("https://www.google.com/").await?; std::thread::sleep(std::time::Duration::from_secs(3)); page.find_element("a[href=https://policies.google.com/privacy?hl=en-MA&fg=1]").await? .click() .await?;

Tnx

Page::event_listener() doesn't receive any events

Page::event_listener() doesn't receive events. The following is a small test listening for Fetch.requestPaused: (I've also tried listening for other events and receive nothing).

use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;
use chromiumoxide_cdp::cdp::browser_protocol::fetch;
mod get_debug_ws_url;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
    let (browser, mut handler) = 
    Browser::connect(debug_ws_url).await?;

    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });
    //Open new page to get page context
    let page = browser.new_page("http://www.example.com").await?;
    //send `Fetch.enable` - tells browser to pause on request to pattern (http://test/)
    page.execute(fetch::EnableParams {
        patterns: vec![
            fetch::RequestPattern {
            url_pattern: "http://test/*".to_string().into(),
            resource_type: None,
            request_stage: fetch::RequestStage::Request.into(),
        }].into(),
        handle_auth_requests: None,
    }).await?;
    //listen for `Fetch.requestPaused`
    let mut events = page.event_listener::<fetch::EventRequestPaused>().await?;
    async_std::task::spawn(async move {
        while let Some(event) = events.next().await {
            println!("Fetch.requestPaused");// Fetch.requestPaused is never recieved
        }
    });
    //navigate to http://test/
    page.goto("http://test/").await?; //this correctly pauses
        handle.await;
    Ok(())
}

Edit: Also, here's get_debug_ws_url.rs.

use std::collections::HashMap;

pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
    let resp = reqwest::get("http://127.0.0.1:9222/json/version")
        .await?
        .json::<HashMap<String, String>>()
        .await?;
        let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
    Ok(web_socket_debugger_url)
}

Add support for event subscriptions

Listening on specfifc events is currently not supported. Instead the Handler ist a Stream of CdpEvents, however its doesn't actually yield them, but only the CdpErrors.

It would be nice to listen to a specific kind of event like Page.animationCreated asynchronously.
The Handler should probably refactored as Stream<Item = CdpError> or as simple Future the never finishes.

The individual Targets should allow for registering event listeners that subscribe to a specific event and get the event via a channel. Since there can be several event listeners for each topic, the subscriptions are basically HashMap<Topic, Vec<EventListener>>. Also we don't want to clone the event for every listener, instead the event in question should be wrapped in an Arc and sent to every event listener channel without cloning.
Sending as event as trait objects (something along the lines of Arc<dyn Event>) would be useful and would make using event listeners more convenient.

Also support for introducing custom events should be considered. In order to prevent mass cloning of serde_json::Value, somehow we need to send some converter that turns serde_json::Value into Arc<dyn Event> along with the registration of the event listener.

Does this support waitUntil: networkidle?

I saw references to NetworkIdle in the source, I wonder if it's supported yet to wait until network has been idle X amount of time. This is a huge benefit of puppeteer vs webdriver, especially for JS web crawlers, or to just simplify actions on dynamic content. If it is supported and I'm not blind/stupid- examples or documentation on this particular feature would really help boost this library I believe! :)

Change Handler Stream Item

Right now the Handler is a Stream<Item= Result<CdpEventMessage>>, however, nothing meaningful is returned. Since event subscriptions are supported now with #4 the handler should be Stream<Item= Result<()>>, so basically a stream of errors.

How to implement the 'Copy' trait for `chromiumoxide::Browser`?

The code looks like this:

async fn main() {
...
	let (browser, _handler) = Browser::launch(
		BrowserConfig::builder()
			.build()
			.unwrap()
		)
		.await
		.unwrap();
...
	for u in urls {
		check(browser, &u).await // it's supposed to run with threading
	}
...
}

async fn check(browser: Browser, url: &str) {
	let page = browser.new_page(url).await.unwrap();
	let check: bool = page.evaluate("some_script")
		.await
		.unwrap()
		.into_value()
		.unwrap();
...
}

I'm really new into Rust, but - this confuses me for threading/concurrent purposes.

Any workarounds for those?

publish new version?

Hi, There are some changes since last publish, is there any plan to publish chromiumoxide 0.3.0?

deadlock found

Hi, I was trying to impl a HTML-to-PNG service on this crate, and with your great work, the service is almost done, but unfortunately, before service to be used in production environment, we would test it with million random requests. At first, everything looks fine, but after some time, maybe an hour, maybe a day, service would hang forever.

Each sevice instance create 8 chromium instances, each request to my service would generate HTML to be render, with debug=true in query string, generated HTML would be respond, without this flag, my service would random choose an chromium instance and set content for it.

When this issue happend, my service thread is busy(100% cpu used in htop), and some( 1 ~ 2 ) chromium threads with 30% ~ 70% cpu used, remain of them are sleeping. Requests with debug=true, which means it respond without calling chromiumoxide, response successfully. Requests without debug=true, hang forever without any error, even if I've set request_timeout to 5 secs.

GDB attach my service thread, paused it and print thread backtrace, paused it and print thread backtrace ... . Repeat this a dozen times, here is my result:

(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d12a in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff9c in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d17f in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d192 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff94 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d1a1 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d101 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x00005647913a220e in core::tuple::<impl core::cmp::Ord for (A,B)>::cmp () at /rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs:58
58	/rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs: No such file or directory.
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d187 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d18d in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()

Improve javascript function evaluation

The Page::evaluate interface expects an expression to be evaluated in the browser

page.evaluate("1+2") results in a value of 3 where as page.evaluate("() => { return 1+2;}") results in a value of Function, because the Runtime.evaluate command expects an expression.

To evaluate (async) functions a way to determine whether the expression passed to Page::evaluate is in fact a javascript function, if so we send this as Runtime.callFunctionOn instead.

This could be achieved by writing a poor parser to distinguish between expression | function, or change the type of the expression param that the evaluate functions accept to a new trait (something like Evaluable) and two types Expresion and JsFunction that are bundled in an enum.

There should be a new api function: Page::evaluate_function that uses Runtime.callFunctionOn.
The most intuitive solution would probably be Page::evaluate strictly for Runtime.evaluate and Page::evaluate_function for Runtime.callFunctionOn and possibly additional checks on Page::evaluate

Update Readme to correct dependency line

Because cargo --git still assumes "master" branch instead of HEAD and because github has changed all repositories over to main the dependency lines in the README will fail with:

cannot locate remote-tracking branch 'origin/master'; class=Reference (4); code=NotFound

I had to specify the branch with:
chromiumoxide = { git = "https://github.com/mattsse/chromiumoxide", branch = "main"}

Add support for browser.on events

I'm needing to use browser.on events similar to what puppeteer documents here:
https://pptr.dev/#?product=Puppeteer&version=v5.5.0&show=api-class-browser

The main ones I'm interested in are browser.on('targetchanged') and browser.on('targetcreated').
I think puppeteer sends Target.setDiscoverTargets [1] on start up and and emits those events when it receives Target.targetInfoChanged and Target.targetCreated respectivly.

I'm willing to take this on however I'm unsure on the correct way to send messages to the connection.

browser.pages() doesn't return pages that are opened manually (via browser UI)

Running the following main.rs file will only print 1 (the one opened via browser.new_page) even if other pages are already open or new ones get opened manually through the browser UI:

use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;

mod get_debug_ws_url;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
    let (browser, mut handler) = 
    Browser::connect(debug_ws_url).await?;

    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let my_handle = async_std::task::spawn(async move {
        let page = browser.new_page("https://en.wikipedia.org").await;
        loop {
            let pages = browser.pages().await;
            let pages = pages.unwrap_or_default();
            println!("{:?}", pages.len());
            async_std::task::sleep(std::time::Duration::from_secs(5)).await;
        }
    });
        handle.await;
    Ok(())
}

Edit: Also, this is the get_debug_ws_url.rs:

use std::collections::HashMap;

pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
    let resp = reqwest::get("http://127.0.0.1:9222/json/version")
        .await?
        .json::<HashMap<String, String>>()
        .await?;
        let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
    Ok(web_socket_debugger_url)
}

Handle Websocket disconnect gracefully

In the event the websocket crashes no more commands can be send or received so in that we case we should

  • drop the Handler entirely short term
  • add the option to try a reconnect, number of retries should be configureable.

HeapProfiler support

Is it supported?

Running browser.execute(EnableParams {}).await? fails with:
Error: Chrome(Error { code: -32601, message: "'HeapProfiler.enable' wasn't found" })

Getting entries from the browser console

Working on a project where it would be super useful to know what gets logged to the browser console, and we're handling it like so:

let mut events = page.event_listener::<EventEntryAdded>().await?;

eprintln!("Logging to console...");

page
  .evaluate("console.log('foo')")
  .await?;

while let Some(event) = events.next().await {
  eprintln!("{:?}", event);
}

The problem is that events.next() seems to be hanging forever.

Is this the correct way to get messages from the browser console?

Navigation result

Hello,
Is there a way to tell if page has successfully navigated or not? Ideally i want a response code. Puppeteers goto and wait_for_navigation seem to resolve to HTTPResponse Is that accessible somewhere or not implemented yet?

Drop stderr/stdout reader may cause some unexpected behavior

From this function, we took and drop stderr reader, which in my test, not a good idea.

In my simple test, parent process drop reader from child stdout, and child would got a broken pipe error if it's trying to put something to stdout, In Rust, this would cause panic because of this line, but I don't know what would happen in CPP program.

parent code:

fn main() {
    let file = "target/release/child";
    let mut result = std::process::Command::new(file)
        .stdout(Stdio::piped())
        .spawn()
        .unwrap();
    let stdout = result.stdout.take().unwrap();
    let mut reader = BufReader::new(stdout);
    let mut line = String::new();
    reader.read_line(&mut line).unwrap();
    println!("read {} bytes", line.len());
    drop(reader);
    loop {
        println!("waiting");
        sleep(Duration::from_secs(1));
    }
}

child code

fn main() {
    loop {
        println!("{}", "FOOFOOFOOFOOFOO".repeat(128));
        std::fs::write(
            "child.timestamp",
            UNIX_EPOCH.elapsed().unwrap().as_secs().to_string(),
        )
        .unwrap();
        sleep(Duration::from_secs(1));
    }
}

html page does not expand to window extents

Below is the code I am using to test oxide.

    let baseurl = "http://www.google.com".to_owned();
    let (browser, mut handler) =
      Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
    let handle = tokio::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let _p = browser.new_page(baseurl.clone()).await?;
    handle.await?;
    Ok(())

The page comes up and the viewport/webpage is not expanding to the window extents. Is there something that would cause this resize issue?

Never get timed out from `BrowserConfigBuilder`

Never met the end time of request_timeout from chromiumoxide::browser::BrowserConfigBuilder.

Environment

Chromiumoxide version: git+https://github.com/mattsse/chromiumoxide?branch=main#dd1003e68e9d9914c28cbb7d6022996833a465d9

Steps to Replicate

Code:

use {
    chromiumoxide::browser::{Browser, BrowserConfig},
    futures::StreamExt,
    std::time::{Duration, Instant}
};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (browser, mut handler) = Browser::launch(
        BrowserConfig::builder()
            .request_timeout(Duration::from_secs(1))
            .build()?
        ).await?;

    let _handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let start = Instant::now();

    let site = "https://photricity.com/flw/"; // this is endless loading website prank
    let page = browser.new_page(site).await?;
    let _html = page.wait_for_navigation().await?.content().await?;

    println!("Time elapsed: {:?}", start.elapsed());

    Ok(())
}

Proof of Concept

Expectations don't exceed the specified timeout (1sec), but quite the opposite.

Screenshot_2021-07-01_07-03-06

Additional context

The same thing happened to timeout field in chromiumoxide::cdp::js_protocol::runtime::EvaluateParams.

examples "wiki-tokio" can't click correctly "input#searchInput"

i try to run cargo run --example wiki-tokio --features="tokio-runtime",it would be click input#searchInput and input text about "Rust programming language".But it actually click the element that is #ca-history,so the exmaples "wiki-tokio" finally will open https://en.wikipedia.org/w/index.php?title=Main_Page&action=history.I've been thinking for a long time, but I don't know why.My chrome version is Version 89.0.4389.90 (Official Build) (64-bit)

Browser hangs indefinitely if new_page() fails

I'm seeing an issue that appears to be similar to the closed issue #52 in which browsers failing to open to new_page don't use the browser's timeout and instead hang indefinitely. It should be noted that I'm on v0.3.1 and so should have the fix from PR #56.

I have a timeout on the new_page() call (longer than the browser's request timeout).
If this timeout is hit and I move on, the browser refuses to allow any other requests until it is torn down.

Here's a minimal repro:

use futures_util::stream::{StreamExt};

use chromiumoxide::browser::{Browser, BrowserConfig};
use tokio::time::{timeout, sleep, Duration};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Started...");

    let (browser, mut handler) = Browser::launch(
        BrowserConfig::builder()
            .request_timeout(Duration::from_secs(3))
            .build()?,
    )
    .await?;

    tokio::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    // Navigation to a valid sample page succeeds
    attempt_navigation("https://example.com".to_owned(), &browser).await;
    
    // Navigation to invalid url fails (and should). This is a silly example, it 
    // is simply to demonstrate the behaviour. When the browser cannot successfully
    // open the page, it hangs indefinitely. Follow up requests (even those that
    // previously worked) will fail until the browser is torn down
    println!("Attempting to navigate to invalid page, should fail and timeout");
    attempt_navigation("https://%wxample.com".to_owned(), &browser).await;

    println!("Sleeping for a few seconds to allow timeout to happen");
    sleep(Duration::from_secs(5)).await;

    // Attempting the same page navigation that previously succeeded.
    // -> should now fail
    println!("Re-attempting to open previously successful page...");
    attempt_navigation("https://example.com".to_owned(), &browser).await;

    return Ok(());
}

async fn attempt_navigation(url: String, browser: &Browser) {
    let page_result = timeout(Duration::from_secs(5), browser.new_page(&url)).await;
    let _page = match page_result {
        Ok(_page) => println!("Page {} navigated to successfully", &url),
        Err(_) => println!("Request for new page timeout"),
    };
    return ;
}

Add support for navigation history

Support for API functions Page::go_back and Page::go_forward should be added.

This would require to track the the history in the Target and probably needs additional internal Message variants.
Size of the stored history should be configurable.

Add support for incognito sessions

Support for several BrowserContextIds is missing.

Allow activating incognito mode for an already running browser as well as from the get go.

Ws(Capacity(MessageTooLong { .. }))

I get this error whilst using Page::pdf, I assume the PDF is too large for the internal channel that is used to communicate data?

thread 'actix-server worker 1' panicked at 'called `Result::unwrap()` on an `Err` value: Ws(Capacity(MessageTooLong { size: 23800033, max_size: 16777216 }))', src/main.rs:28:43`

I don't have a backtrace unfortunately because async destroys any useful information in it.

Relevant code

actix_web::rt::spawn(async move {
    loop {
        handler.next().await.unwrap().unwrap(); // panics here
    }
});

Chrome stays idle infinitely

I've on macOS and set up chromiumoxide for use with async-std. I tried to scrape text from a webpage, and it looks like it hangs when calling browser.new_page(). Chrome says "Establishing secure connection..." and stays like that infinitely. It only terminates if I kill the process or Cmd+Q Chrome itself.

(I've also tried other URLs such as https://www.google.com too to see what happens and it loads fully (no throbber or "Establishing secure connection..." message), but still stays stuck endlessly.)

Is there something I'm missing that's causing things to stay running infinitely like this?

This is the code I'm running โ€” it's an adaptation of the Usage example on the crates.io page:

use chromiumoxide::browser::{Browser, BrowserConfig};
use futures::StreamExt;

#[async_std::main]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url =
        "https://www.allrecipes.com/recipe/283601/sesame-seared-tuna-and-sushi-bar-spinach-salad/"
            .to_owned();
    let (browser, mut handler) =
        Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
    println!("Created browser");
    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });
    println!("Created handle"); // Last print statement that actually executes.
    let page = browser.new_page(&url).await?;
    println!("Opened page to URL: {}", &url);
    let title = page
        .find_element("h1.headline.heading-content")
        .await?
        .inner_text()
        .await?
        .map(|text| text.trim().to_owned());
    println!(
        "title = {}",
        match title {
            Some(text) => text,
            None => "".to_owned(),
        }
    );
    handle.await;
    Ok(())
}

`find_elements` sometimes can't find `<a>` nodes

I've been having problems with find_elements, but only some pages like this one, where it returns an error:

Chrome(Error { code: -32000, message: "Could not find node with given id" })`.

I'm basically doing:

page.goto("https://www.newegg.ca/westinghouse-wh27fx9019-27-full-hd/p/N82E16824569002").await?;
page.find_elements("a").await.unwrap();

Selecting other tags (ex <button>) on that page works fine.

Given there are a sizeable amount of <a> tags (approx 1900) on the page I thought that could be a problem, but it can successfully find them in wikipedia pages with a lot more links (upwards of 2500).

I tried to wait_for_navigation() but that also didn't pan out.

Any idea what could be the problem?

Method trait shouldn't take `self`

The Method trait requires an instance for the functions that return the identifier.
Since all identifiers are known before hand and are constants anyways, making this a type function would make things easier and allows for more convenience at some parts of the api.

Drawback of this approach: custom events face limitations in determine their identifiers, since it is restricted to static.
However that can't be used dynamically when listening to events, since the identifiers must be known before hand.

Best solution would be to introduce another trait that focuses on type level.

Set language to english

I am not a chrome expert but concerning the README notice for english (A: Check that your chromium language settings are set to English) I am wondering if it would help to pass --lang=en_US to the process?

Request interception

Is request interception supported? I see NetworkManager functions that seem like they may enable request interception (they appear to be related to Handler and not Page), but I don't see how to implement them in my own code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.