mattsse / chromiumoxide Goto Github PK

View Code? Open in Web Editor NEW

701.0 9.0 68.0 511 KB

Chrome Devtools Protocol rust API

License: Apache License 2.0

Rust 100.00%

chromiumoxide's Issues

How to use Network.webSocketFrameReceived

Is it possible to use the Network.webSocketFrameReceived cdp event? If yes, how could you use it?

How to use contexts etc?

I went to examples and couldn't find any code which creates contexts etc?

I want to translate this code

let browser = await firefox.launch({headless:false})
let context = await browser.newContext();
let page = await context.newPage();
await page.goto("https://jsonip.com");

I've on macOS and set up chromiumoxide for use with async-std. I tried to scrape text from a webpage, and it looks like it hangs when calling browser.new_page(). Chrome says "Establishing secure connection..." and stays like that infinitely. It only terminates if I kill the process or Cmd+Q Chrome itself.

(I've also tried other URLs such as https://www.google.com too to see what happens and it loads fully (no throbber or "Establishing secure connection..." message), but still stays stuck endlessly.)

Is there something I'm missing that's causing things to stay running infinitely like this?

This is the code I'm running — it's an adaptation of the Usage example on the crates.io page:

use chromiumoxide::browser::{Browser, BrowserConfig};
use futures::StreamExt;

#[async_std::main]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url =
        "https://www.allrecipes.com/recipe/283601/sesame-seared-tuna-and-sushi-bar-spinach-salad/"
            .to_owned();
    let (browser, mut handler) =
        Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
    println!("Created browser");
    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });
    println!("Created handle"); // Last print statement that actually executes.
    let page = browser.new_page(&url).await?;
    println!("Opened page to URL: {}", &url);
    let title = page
        .find_element("h1.headline.heading-content")
        .await?
        .inner_text()
        .await?
        .map(|text| text.trim().to_owned());
    println!(
        "title = {}",
        match title {
            Some(text) => text,
            None => "".to_owned(),
        }
    );
    handle.await;
    Ok(())
}

Set language to english

I am not a chrome expert but concerning the README notice for english (A: Check that your chromium language settings are set to English) I am wondering if it would help to pass --lang=en_US to the process?

Never get timed out from `BrowserConfigBuilder`

Never met the end time of request_timeout from chromiumoxide::browser::BrowserConfigBuilder.

Environment

Chromiumoxide version: git+https://github.com/mattsse/chromiumoxide?branch=main#dd1003e68e9d9914c28cbb7d6022996833a465d9

Steps to Replicate

Code:

use {
    chromiumoxide::browser::{Browser, BrowserConfig},
    futures::StreamExt,
    std::time::{Duration, Instant}
};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (browser, mut handler) = Browser::launch(
        BrowserConfig::builder()
            .request_timeout(Duration::from_secs(1))
            .build()?
        ).await?;

    let _handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let start = Instant::now();

    let site = "https://photricity.com/flw/"; // this is endless loading website prank
    let page = browser.new_page(site).await?;
    let _html = page.wait_for_navigation().await?.content().await?;

    println!("Time elapsed: {:?}", start.elapsed());

    Ok(())
}

Proof of Concept

Expectations don't exceed the specified timeout (1sec), but quite the opposite.

Additional context

The same thing happened to timeout field in chromiumoxide::cdp::js_protocol::runtime::EvaluateParams.

Add the option to download chromium bin on demand

Currenlty chromiumoxide relies on chromium being already installed on the system.
Like puppeteer an option to bundle a chromium executeable directly should be supported

Does this support waitUntil: networkidle?

I saw references to NetworkIdle in the source, I wonder if it's supported yet to wait until network has been idle X amount of time. This is a huge benefit of puppeteer vs webdriver, especially for JS web crawlers, or to just simplify actions on dynamic content. If it is supported and I'm not blind/stupid- examples or documentation on this particular feature would really help boost this library I believe! :)

Navigation result

Hello,
Is there a way to tell if page has successfully navigated or not? Ideally i want a response code. Puppeteers goto and wait_for_navigation seem to resolve to HTTPResponse Is that accessible somewhere or not implemented yet?

Drop stderr/stdout reader may cause some unexpected behavior

From this function, we took and drop stderr reader, which in my test, not a good idea.

In my simple test, parent process drop reader from child stdout, and child would got a broken pipe error if it's trying to put something to stdout, In Rust, this would cause panic because of this line, but I don't know what would happen in CPP program.

parent code:

fn main() {
    let file = "target/release/child";
    let mut result = std::process::Command::new(file)
        .stdout(Stdio::piped())
        .spawn()
        .unwrap();
    let stdout = result.stdout.take().unwrap();
    let mut reader = BufReader::new(stdout);
    let mut line = String::new();
    reader.read_line(&mut line).unwrap();
    println!("read {} bytes", line.len());
    drop(reader);
    loop {
        println!("waiting");
        sleep(Duration::from_secs(1));
    }
}

child code

fn main() {
    loop {
        println!("{}", "FOOFOOFOOFOOFOO".repeat(128));
        std::fs::write(
            "child.timestamp",
            UNIX_EPOCH.elapsed().unwrap().as_secs().to_string(),
        )
        .unwrap();
        sleep(Duration::from_secs(1));
    }
}

How to download picture like downloading other file types?

I can't understand how to download images, need some example like this:

chromedp/chromedp#660 (comment)

deadlock found

Hi, I was trying to impl a HTML-to-PNG service on this crate, and with your great work, the service is almost done, but unfortunately, before service to be used in production environment, we would test it with million random requests. At first, everything looks fine, but after some time, maybe an hour, maybe a day, service would hang forever.

Each sevice instance create 8 chromium instances, each request to my service would generate HTML to be render, with debug=true in query string, generated HTML would be respond, without this flag, my service would random choose an chromium instance and set content for it.

When this issue happend, my service thread is busy(100% cpu used in htop), and some( 1 ~ 2 ) chromium threads with 30% ~ 70% cpu used, remain of them are sleeping. Requests with debug=true, which means it respond without calling chromiumoxide, response successfully. Requests without debug=true, hang forever without any error, even if I've set request_timeout to 5 secs.

GDB attach my service thread, paused it and print thread backtrace, paused it and print thread backtrace ... . Repeat this a dozen times, here is my result:

(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d12a in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff9c in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d17f in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d192 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff94 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d1a1 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d101 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x00005647913a220e in core::tuple::<impl core::cmp::Ord for (A,B)>::cmp () at /rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs:58
58	/rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs: No such file or directory.
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d187 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d18d in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.

Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()

browser.pages() doesn't return pages that are opened manually (via browser UI)

Running the following main.rs file will only print 1 (the one opened via browser.new_page) even if other pages are already open or new ones get opened manually through the browser UI:

use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;

mod get_debug_ws_url;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
    let (browser, mut handler) = 
    Browser::connect(debug_ws_url).await?;

    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let my_handle = async_std::task::spawn(async move {
        let page = browser.new_page("https://en.wikipedia.org").await;
        loop {
            let pages = browser.pages().await;
            let pages = pages.unwrap_or_default();
            println!("{:?}", pages.len());
            async_std::task::sleep(std::time::Duration::from_secs(5)).await;
        }
    });
        handle.await;
    Ok(())
}

Edit: Also, this is the get_debug_ws_url.rs:

use std::collections::HashMap;

pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
    let resp = reqwest::get("http://127.0.0.1:9222/json/version")
        .await?
        .json::<HashMap<String, String>>()
        .await?;
        let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
    Ok(web_socket_debugger_url)
}

Validate URL before sending navigation requests

Ref #64

illegal urls should be detected before even attempting to navigate to them.

Improve javascript function evaluation

The Page::evaluate interface expects an expression to be evaluated in the browser

page.evaluate("1+2") results in a value of 3 where as page.evaluate("() => { return 1+2;}") results in a value of Function, because the Runtime.evaluate command expects an expression.

To evaluate (async) functions a way to determine whether the expression passed to Page::evaluate is in fact a javascript function, if so we send this as Runtime.callFunctionOn instead.

This could be achieved by writing a poor parser to distinguish between expression | function, or change the type of the expression param that the evaluate functions accept to a new trait (something like Evaluable) and two types Expresion and JsFunction that are bundled in an enum.

There should be a new api function: Page::evaluate_function that uses Runtime.callFunctionOn.
The most intuitive solution would probably be Page::evaluate strictly for Runtime.evaluate and Page::evaluate_function for Runtime.callFunctionOn and possibly additional checks on Page::evaluate

html page does not expand to window extents

Below is the code I am using to test oxide.

    let baseurl = "http://www.google.com".to_owned();
    let (browser, mut handler) =
      Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
    let handle = tokio::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    let _p = browser.new_page(baseurl.clone()).await?;
    handle.await?;
    Ok(())

The page comes up and the viewport/webpage is not expanding to the window extents. Is there something that would cause this resize issue?

Handle Websocket disconnect gracefully

In the event the websocket crashes no more commands can be send or received so in that we case we should

drop the Handler entirely short term
add the option to try a reconnect, number of retries should be configureable.

Add support for browser.on events

I'm needing to use browser.on events similar to what puppeteer documents here:
https://pptr.dev/#?product=Puppeteer&version=v5.5.0&show=api-class-browser

The main ones I'm interested in are browser.on('targetchanged') and browser.on('targetcreated').
I think puppeteer sends Target.setDiscoverTargets [1] on start up and and emits those events when it receives Target.targetInfoChanged and Target.targetCreated respectivly.

I'm willing to take this on however I'm unsure on the correct way to send messages to the connection.

Create pdf from static html content instead of URL

Hi Team,

Need to generate pdf file from the static HTML content instead of URL.

Navigation timeouts

There does not seem to be a way to set timeout for navigation

Add support for incognito sessions

Support for several BrowserContextIds is missing.

Allow activating incognito mode for an already running browser as well as from the get go.

Getting entries from the browser console

Working on a project where it would be super useful to know what gets logged to the browser console, and we're handling it like so:

let mut events = page.event_listener::<EventEntryAdded>().await?;

eprintln!("Logging to console...");

page
  .evaluate("console.log('foo')")
  .await?;

while let Some(event) = events.next().await {
  eprintln!("{:?}", event);
}

The problem is that events.next() seems to be hanging forever.

Is this the correct way to get messages from the browser console?

Detect more browsers

Headless chrome added support to detect more browsers (https://github.com/atroche/rust-headless-chrome/blob/aaa6cb03efefbe67996f35802860c6d7fce3b14b/src/browser/mod.rs#L430-L442), do you want me to port those?
I don't think this would be an issue, but we probably want to rework the order in which they are tried.
The biggest gain is most likely from the support of edge.

Introduce a type alias that is basically type MethodId = Cow<'static, str>

that's it.

Miss "Break on start" pause reason for paused event

if launch node inspect process with --inspect-brk, the first paused event was fire with pause reason "Break on start", bug PauseReson missing this feature variant.

my env:

node 16.x
chromiumoxide_cdp 0.3.1

publish new version?

Hi, There are some changes since last publish, is there any plan to publish chromiumoxide 0.3.0?

unresolved import `crate::browser::process::get_chrome_path_from_registry

windows 10
cargo run
Updating git repository https://github.com/mattsse/chromiumoxide
Updating git repository https://github.com/sdroege/async-tungstenite
Updating crates.io index
Blocking waiting for file lock on package cache
Blocking waiting for file lock on package cache
Downloaded input_buffer v0.4.0
Downloaded tungstenite v0.12.0
Downloaded 2 crates (62.1 KB) in 2.28s
Blocking waiting for file lock on package cache
Compiling syn v1.0.60
Compiling winapi v0.3.9
Compiling getrandom v0.2.2
Compiling input_buffer v0.4.0
Compiling rand_core v0.6.1
Compiling rand_chacha v0.3.0
Compiling rand v0.8.3
Compiling nb-connect v1.0.2
Compiling winapi-util v0.1.5
Compiling async-process v1.0.2
Compiling atty v0.2.14
Compiling termcolor v1.1.2
Compiling ctor v0.1.19
Compiling serde_derive v1.0.123
Compiling futures-macro v0.3.12
Compiling thiserror-impl v1.0.23
Compiling async-attributes v1.1.2
Compiling value-bag v1.0.0-alpha.6
Compiling log v0.4.14
Compiling futures-util v0.3.12
Compiling polling v2.0.2
Compiling kv-log-macro v1.0.7
Compiling tungstenite v0.12.0
Compiling env_logger v0.7.1
Compiling thiserror v1.0.23
Compiling async-io v1.3.1
Compiling which v4.0.2
Compiling pretty_env_logger v0.4.0
Compiling async-global-executor v2.0.2
Compiling async-std v1.9.0
Compiling futures-executor v0.3.12
Compiling futures v0.3.12
Compiling serde v1.0.123
Compiling serde_json v1.0.62
Compiling async-tungstenite v0.12.0 (https://github.com/sdroege/async-tungstenite#f51f0744)
Compiling chromiumoxide_types v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_pdl v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_cdp v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
error[E0432]: unresolved import crate::browser::process::get_chrome_path_from_registry
--> C:\Users\52752.cargo\bin\git\checkouts\chromiumoxide-b0a6fd9b32d92502\7d7ef5a\src\browser.rs:560:13
|
560 | use crate::browser::process::get_chrome_path_from_registry;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no get_chrome_path_from_registry in process

error: aborting due to previous error

For more information about this error, try rustc --explain E0432.
error: could not compile chromiumoxide

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

F:\web\xiaolu\rust\greetings\web>

Closing a page should also close the receiver half of the message channel in the target

I'm not sure if waiting for Target.targetdestroyed is necessary if assuming target eventually gets cleaned up. Page gets consumed and if you are dealing with raw ids you are on your own.

Agree. But there is a potential race condition for following scenario:

let el =  page.find_element("input#searchInput").await?;
...
page.close().await?;
...
// this will fail because page is already closed.
el.click().await?;

This el.click() will fail regardless, but at the moment the reason it fails is because the click command is sent to the browser and the browser returns an error. This happens because the Target of the Page is not removed yet and is still able to receive events sent from an Element. Preferably the el.click() should fail with a SendError because the receiver should already be dropped at this point, preventing the overhead of sending a request that is guaranteed to fail.

I think it's more important to figure out a way to just destroy the page target since for majority of cases you don't care about unload and just want to clean up.

This can be done with Target.closeTarget instead, that's how puppeteer does it.

Originally posted by @mattsse in #12 (comment)

examples "wiki-tokio" can't click correctly "input#searchInput"

i try to run cargo run --example wiki-tokio --features="tokio-runtime",it would be click input#searchInput and input text about "Rust programming language".But it actually click the element that is #ca-history,so the exmaples "wiki-tokio" finally will open https://en.wikipedia.org/w/index.php?title=Main_Page&action=history.I've been thinking for a long time, but I don't know why.My chrome version is Version 89.0.4389.90 (Official Build) (64-bit)

pdl build: don't panic on missing rustfmt

the pdl generator runs rustfmt internally, this shouldn't result in a panic if rustfmt is not installed.

Add support for event subscriptions

Listening on specfifc events is currently not supported. Instead the Handler ist a Stream of CdpEvents, however its doesn't actually yield them, but only the CdpErrors.

It would be nice to listen to a specific kind of event like Page.animationCreated asynchronously.
The Handler should probably refactored as Stream<Item = CdpError> or as simple Future the never finishes.

The individual Targets should allow for registering event listeners that subscribe to a specific event and get the event via a channel. Since there can be several event listeners for each topic, the subscriptions are basically HashMap<Topic, Vec<EventListener>>. Also we don't want to clone the event for every listener, instead the event in question should be wrapped in an Arc and sent to every event listener channel without cloning.
Sending as event as trait objects (something along the lines of Arc<dyn Event>) would be useful and would make using event listeners more convenient.

Also support for introducing custom events should be considered. In order to prevent mass cloning of serde_json::Value, somehow we need to send some converter that turns serde_json::Value into Arc<dyn Event> along with the registration of the event listener.

Ws(Capacity(MessageTooLong { .. }))

I get this error whilst using Page::pdf, I assume the PDF is too large for the internal channel that is used to communicate data?

thread 'actix-server worker 1' panicked at 'called `Result::unwrap()` on an `Err` value: Ws(Capacity(MessageTooLong { size: 23800033, max_size: 16777216 }))', src/main.rs:28:43`

I don't have a backtrace unfortunately because async destroys any useful information in it.

Relevant code

actix_web::rt::spawn(async move {
    loop {
        handler.next().await.unwrap().unwrap(); // panics here
    }
});

Update Readme to correct dependency line

Because cargo --git still assumes "master" branch instead of HEAD and because github has changed all repositories over to main the dependency lines in the README will fail with:

cannot locate remote-tracking branch 'origin/master'; class=Reference (4); code=NotFound

I had to specify the branch with:
chromiumoxide = { git = "https://github.com/mattsse/chromiumoxide", branch = "main"}

Browser hangs indefinitely if new_page() fails

I'm seeing an issue that appears to be similar to the closed issue #52 in which browsers failing to open to new_page don't use the browser's timeout and instead hang indefinitely. It should be noted that I'm on v0.3.1 and so should have the fix from PR #56.

I have a timeout on the new_page() call (longer than the browser's request timeout).
If this timeout is hit and I move on, the browser refuses to allow any other requests until it is torn down.

Here's a minimal repro:

use futures_util::stream::{StreamExt};

use chromiumoxide::browser::{Browser, BrowserConfig};
use tokio::time::{timeout, sleep, Duration};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Started...");

    let (browser, mut handler) = Browser::launch(
        BrowserConfig::builder()
            .request_timeout(Duration::from_secs(3))
            .build()?,
    )
    .await?;

    tokio::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });

    // Navigation to a valid sample page succeeds
    attempt_navigation("https://example.com".to_owned(), &browser).await;
    
    // Navigation to invalid url fails (and should). This is a silly example, it 
    // is simply to demonstrate the behaviour. When the browser cannot successfully
    // open the page, it hangs indefinitely. Follow up requests (even those that
    // previously worked) will fail until the browser is torn down
    println!("Attempting to navigate to invalid page, should fail and timeout");
    attempt_navigation("https://%wxample.com".to_owned(), &browser).await;

    println!("Sleeping for a few seconds to allow timeout to happen");
    sleep(Duration::from_secs(5)).await;

    // Attempting the same page navigation that previously succeeded.
    // -> should now fail
    println!("Re-attempting to open previously successful page...");
    attempt_navigation("https://example.com".to_owned(), &browser).await;

    return Ok(());
}

async fn attempt_navigation(url: String, browser: &Browser) {
    let page_result = timeout(Duration::from_secs(5), browser.new_page(&url)).await;
    let _page = match page_result {
        Ok(_page) => println!("Page {} navigated to successfully", &url),
        Err(_) => println!("Request for new page timeout"),
    };
    return ;
}

find_element and CSS selector

Hello

I just start playing with this tool.
But i can't get any selector to work beside [name='j_username']

th code bellow will faile with error "-32000: DOM Error while querying"
same thing for [id='j_username']

did i miss somthing?

let page = browser.new_page("https://www.google.com/").await?; std::thread::sleep(std::time::Duration::from_secs(3)); page.find_element("a[href=https://policies.google.com/privacy?hl=en-MA&fg=1]").await? .click() .await?;

Tnx

Add the same options for screenshots as Puppeteer

Would it be of any interest to add the same options for screenshots that Puppeteer has? Specifically, this is the option to take full-page screenshots and omit the page background.

Here's the options, which are the same as CaptureScreenshotParams, but with the added fullPage and omitBackground properties:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L141-L149

Here's where full-page and omitting of background screenshots happen:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L1657-L1725

An example of taking a full-page screenshot:

use chromiumoxide::{
    browser::{Browser, BrowserConfig},
    cdp::browser_protocol::{
        dom::Rect,
        emulation::{ScreenOrientation, ScreenOrientationType, SetDeviceMetricsOverrideParams},
        page::{CaptureScreenshotFormat, CaptureScreenshotParams, GetLayoutMetricsReturns},
    },
};
use futures::StreamExt;

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (browser, mut handler) = Browser::launch(BrowserConfig::builder().build()?).await?;

    async_std::task::spawn(async move {
        loop {
            let _event = handler.next().await.unwrap();
        }
    });

    let page = browser.new_page("https://en.wikipedia.org").await?;

    page.wait_for_navigation().await?;

    // Set the emulated device to the same size as the page
    let GetLayoutMetricsReturns {
        content_size: Rect { width, height, .. },
        ..
    } = page.layout_metrics().await?;
    page.execute(
        SetDeviceMetricsOverrideParams::builder()
            .mobile(false)
            .width(width.ceil() as i64)
            .height(height.ceil() as i64)
            .device_scale_factor(1.0)
            .screen_orientation(
                ScreenOrientation::builder()
                    .angle(0)
                    .r#type(ScreenOrientationType::PortraitPrimary)
                    .build()?,
            )
            .build()?,
    )
    .await?;

    page.save_screenshot(
        CaptureScreenshotParams::builder()
            .format(CaptureScreenshotFormat::Png)
            .build(),
        "wiki.png",
    )
    .await?;

    Ok(())
}

async_trait is allowed or not

Hi, I was trying to implement network hook like headless_chrome::enable_request_interception, and I hope to expose a trait Hook, like this:

#[async_trait]
pub trait Hook {
    async fn call(&self, request, output);
}

And as you can see, this would introduce async_trait as dependency, and I'm not sure is this being allowed or not.

How to implement the 'Copy' trait for `chromiumoxide::Browser`?

The code looks like this:

async fn main() {
...
	let (browser, _handler) = Browser::launch(
		BrowserConfig::builder()
			.build()
			.unwrap()
		)
		.await
		.unwrap();
...
	for u in urls {
		check(browser, &u).await // it's supposed to run with threading
	}
...
}

async fn check(browser: Browser, url: &str) {
	let page = browser.new_page(url).await.unwrap();
	let check: bool = page.evaluate("some_script")
		.await
		.unwrap()
		.into_value()
		.unwrap();
...
}

I'm really new into Rust, but - this confuses me for threading/concurrent purposes.

Any workarounds for those?

Change Handler Stream Item

Right now the Handler is a Stream<Item= Result<CdpEventMessage>>, however, nothing meaningful is returned. Since event subscriptions are supported now with #4 the handler should be Stream<Item= Result<()>>, so basically a stream of errors.

tokio v1.0 migration

Upgrade tokio to v1, sdroege/async-tungstenite#72

Browser launch viewport option

In nodejs if start Puppeteer with a null viewport, the viewport is set to fit the window size.

await puppeteer.launch( {
  defaultViewport : null //viewport auto adjust to window size
} )

So i suggest

BrowserConfigBuilder::viewport(vp:Viewport)

BrowserConfigBuilder::viewport(vp:Option<Viewport>)

Method trait shouldn't take `self`

The Method trait requires an instance for the functions that return the identifier.
Since all identifiers are known before hand and are constants anyways, making this a type function would make things easier and allows for more convenience at some parts of the api.

Drawback of this approach: custom events face limitations in determine their identifiers, since it is restricted to static.
However that can't be used dynamically when listening to events, since the identifiers must be known before hand.

Best solution would be to introduce another trait that focuses on type level.

Allow disabling of viewport emulation

Currently, the viewport will emulate a specified size or the default (800x600).
I'd like to be able to disable this when using the Browser::connect() as it looks weird otherwise.

Maybe we can pass a BrowserConfig as a second parameter to Browser::connect()
- And add disable_viewport_emulation field to the Viewport struct (set to true to disable viewport emulation.)

Similar to what Puppeteer does when you set defaultViewport to null.

`find_elements` sometimes can't find `<a>` nodes

I've been having problems with find_elements, but only some pages like this one, where it returns an error:

Chrome(Error { code: -32000, message: "Could not find node with given id" })`.

I'm basically doing:

page.goto("https://www.newegg.ca/westinghouse-wh27fx9019-27-full-hd/p/N82E16824569002").await?;
page.find_elements("a").await.unwrap();

Selecting other tags (ex <button>) on that page works fine.

Given there are a sizeable amount of <a> tags (approx 1900) on the page I thought that could be a problem, but it can successfully find them in wikipedia pages with a lot more links (upwards of 2500).

I tried to wait_for_navigation() but that also didn't pan out.

Any idea what could be the problem?

Request interception

Is request interception supported? I see NetworkManager functions that seem like they may enable request interception (they appear to be related to Handler and not Page), but I don't see how to implement them in my own code.

Add support for navigation history

Support for API functions Page::go_back and Page::go_forward should be added.

This would require to track the the history in the Target and probably needs additional internal Message variants.
Size of the stored history should be configurable.

frame support

this project support frame yet

`Browser::start_incognito_context` doesn't start incognito context if browser is started in incognito mode

Hi,

If a browser is started with the following config:

BrowserConfig::builder().incognito().with_head().build()

...later calls of Browser::start_incognito_context do not open an incognito context.

If .incognito() above is removed, everything works as expected.

Page::event_listener() doesn't receive any events

Page::event_listener() doesn't receive events. The following is a small test listening for Fetch.requestPaused: (I've also tried listening for other events and receive nothing).

use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;
use chromiumoxide_cdp::cdp::browser_protocol::fetch;
mod get_debug_ws_url;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
    let (browser, mut handler) = 
    Browser::connect(debug_ws_url).await?;

    let handle = async_std::task::spawn(async move {
        loop {
            let _ = handler.next().await.unwrap();
        }
    });
    //Open new page to get page context
    let page = browser.new_page("http://www.example.com").await?;
    //send `Fetch.enable` - tells browser to pause on request to pattern (http://test/)
    page.execute(fetch::EnableParams {
        patterns: vec![
            fetch::RequestPattern {
            url_pattern: "http://test/*".to_string().into(),
            resource_type: None,
            request_stage: fetch::RequestStage::Request.into(),
        }].into(),
        handle_auth_requests: None,
    }).await?;
    //listen for `Fetch.requestPaused`
    let mut events = page.event_listener::<fetch::EventRequestPaused>().await?;
    async_std::task::spawn(async move {
        while let Some(event) = events.next().await {
            println!("Fetch.requestPaused");// Fetch.requestPaused is never recieved
        }
    });
    //navigate to http://test/
    page.goto("http://test/").await?; //this correctly pauses
        handle.await;
    Ok(())
}

Edit: Also, here's get_debug_ws_url.rs.

use std::collections::HashMap;

pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
    let resp = reqwest::get("http://127.0.0.1:9222/json/version")
        .await?
        .json::<HashMap<String, String>>()
        .await?;
        let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
    Ok(web_socket_debugger_url)
}

HeapProfiler support

Is it supported?

Running browser.execute(EnableParams {}).await? fails with:
Error: Chrome(Error { code: -32601, message: "'HeapProfiler.enable' wasn't found" })

Stabilize API for a 0.2.0 release

There aren't any big changes expected.
After #17 and #16 this should be considered stable and ready for a version bump.

mattsse / chromiumoxide Goto Github PK

chromiumoxide's Issues

Environment

Steps to Replicate

Proof of Concept

Additional context

Recommend Projects

Recommend Topics

Recommend Org