mattsse / chromiumoxide Goto Github PK
View Code? Open in Web Editor NEWChrome Devtools Protocol rust API
License: Apache License 2.0
Chrome Devtools Protocol rust API
License: Apache License 2.0
Is it possible to use the Network.webSocketFrameReceived cdp event? If yes, how could you use it?
I went to examples and couldn't find any code which creates contexts etc?
I want to translate this code
let browser = await firefox.launch({headless:false})
let context = await browser.newContext();
let page = await context.newPage();
await page.goto("https://jsonip.com");
I've on macOS and set up chromiumoxide
for use with async-std
. I tried to scrape text from a webpage, and it looks like it hangs when calling browser.new_page()
. Chrome says "Establishing secure connection..." and stays like that infinitely. It only terminates if I kill the process or Cmd+Q Chrome itself.
(I've also tried other URLs such as https://www.google.com too to see what happens and it loads fully (no throbber or "Establishing secure connection..." message), but still stays stuck endlessly.)
Is there something I'm missing that's causing things to stay running infinitely like this?
This is the code I'm running โ it's an adaptation of the Usage example on the crates.io page:
use chromiumoxide::browser::{Browser, BrowserConfig};
use futures::StreamExt;
#[async_std::main]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
let url =
"https://www.allrecipes.com/recipe/283601/sesame-seared-tuna-and-sushi-bar-spinach-salad/"
.to_owned();
let (browser, mut handler) =
Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
println!("Created browser");
let handle = async_std::task::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
println!("Created handle"); // Last print statement that actually executes.
let page = browser.new_page(&url).await?;
println!("Opened page to URL: {}", &url);
let title = page
.find_element("h1.headline.heading-content")
.await?
.inner_text()
.await?
.map(|text| text.trim().to_owned());
println!(
"title = {}",
match title {
Some(text) => text,
None => "".to_owned(),
}
);
handle.await;
Ok(())
}
I am not a chrome expert but concerning the README
notice for english (A: Check that your chromium language settings are set to English
) I am wondering if it would help to pass --lang=en_US
to the process?
Never met the end time of request_timeout
from chromiumoxide::browser::BrowserConfigBuilder
.
Chromiumoxide version: git+https://github.com/mattsse/chromiumoxide?branch=main#dd1003e68e9d9914c28cbb7d6022996833a465d9
Code:
use {
chromiumoxide::browser::{Browser, BrowserConfig},
futures::StreamExt,
std::time::{Duration, Instant}
};
#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let (browser, mut handler) = Browser::launch(
BrowserConfig::builder()
.request_timeout(Duration::from_secs(1))
.build()?
).await?;
let _handle = async_std::task::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
let start = Instant::now();
let site = "https://photricity.com/flw/"; // this is endless loading website prank
let page = browser.new_page(site).await?;
let _html = page.wait_for_navigation().await?.content().await?;
println!("Time elapsed: {:?}", start.elapsed());
Ok(())
}
Expectations don't exceed the specified timeout (1sec), but quite the opposite.
The same thing happened to timeout
field in chromiumoxide::cdp::js_protocol::runtime::EvaluateParams
.
Currenlty chromiumoxide
relies on chromium
being already installed on the system.
Like puppeteer an option to bundle a chromium executeable directly should be supported
I saw references to NetworkIdle in the source, I wonder if it's supported yet to wait until network has been idle X amount of time. This is a huge benefit of puppeteer vs webdriver, especially for JS web crawlers, or to just simplify actions on dynamic content. If it is supported and I'm not blind/stupid- examples or documentation on this particular feature would really help boost this library I believe! :)
Hello,
Is there a way to tell if page has successfully navigated or not? Ideally i want a response code. Puppeteers goto
and wait_for_navigation
seem to resolve to HTTPResponse
Is that accessible somewhere or not implemented yet?
From this function, we took and drop stderr reader, which in my test, not a good idea.
In my simple test, parent process drop reader from child stdout, and child would got a broken pipe error if it's trying to put something to stdout, In Rust, this would cause panic because of this line, but I don't know what would happen in CPP program.
parent code:
fn main() {
let file = "target/release/child";
let mut result = std::process::Command::new(file)
.stdout(Stdio::piped())
.spawn()
.unwrap();
let stdout = result.stdout.take().unwrap();
let mut reader = BufReader::new(stdout);
let mut line = String::new();
reader.read_line(&mut line).unwrap();
println!("read {} bytes", line.len());
drop(reader);
loop {
println!("waiting");
sleep(Duration::from_secs(1));
}
}
child code
fn main() {
loop {
println!("{}", "FOOFOOFOOFOOFOO".repeat(128));
std::fs::write(
"child.timestamp",
UNIX_EPOCH.elapsed().unwrap().as_secs().to_string(),
)
.unwrap();
sleep(Duration::from_secs(1));
}
}
I can't understand how to download images, need some example like this:
Hi, I was trying to impl a HTML-to-PNG service on this crate, and with your great work, the service is almost done, but unfortunately, before service to be used in production environment, we would test it with million random requests. At first, everything looks fine, but after some time, maybe an hour, maybe a day, service would hang forever.
Each sevice instance create 8 chromium instances, each request to my service would generate HTML to be render, with debug=true
in query string, generated HTML would be respond, without this flag, my service would random choose an chromium instance and set content for it.
When this issue happend, my service thread is busy(100% cpu used in htop
), and some( 1 ~ 2 ) chromium threads with 30% ~ 70% cpu used, remain of them are sleeping. Requests with debug=true
, which means it respond without calling chromiumoxide
, response successfully. Requests without debug=true
, hang forever without any error, even if I've set request_timeout to 5 secs.
GDB attach my service thread, paused it and print thread backtrace, paused it and print thread backtrace ... . Repeat this a dozen times, here is my result:
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d12a in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff9c in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d17f in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d192 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dcff94 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d1a1 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d101 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x00005647913a220e in core::tuple::<impl core::cmp::Ord for (A,B)>::cmp () at /rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs:58
58 /rustc/88f19c6dab716c6281af7602e30f413e809c5974/library/core/src/tuple.rs: No such file or directory.
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d187 in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790e8d18d in chromiumoxide::handler::target::Target::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0214 in chromiumoxide::cmd::CommandChain::poll ()
(gdb) c
Continuing.
Thread 1 "fireshot" received signal SIGINT, Interrupt.
0x0000564790dd0032 in chromiumoxide::cmd::CommandChain::poll ()
Running the following main.rs
file will only print 1
(the one opened via browser.new_page
) even if other pages are already open or new ones get opened manually through the browser UI:
use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;
mod get_debug_ws_url;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
let (browser, mut handler) =
Browser::connect(debug_ws_url).await?;
let handle = async_std::task::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
let my_handle = async_std::task::spawn(async move {
let page = browser.new_page("https://en.wikipedia.org").await;
loop {
let pages = browser.pages().await;
let pages = pages.unwrap_or_default();
println!("{:?}", pages.len());
async_std::task::sleep(std::time::Duration::from_secs(5)).await;
}
});
handle.await;
Ok(())
}
Edit: Also, this is the get_debug_ws_url.rs
:
use std::collections::HashMap;
pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
let resp = reqwest::get("http://127.0.0.1:9222/json/version")
.await?
.json::<HashMap<String, String>>()
.await?;
let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
Ok(web_socket_debugger_url)
}
Ref #64
illegal urls should be detected before even attempting to navigate to them.
The Page::evaluate
interface expects an expression to be evaluated in the browser
page.evaluate("1+2")
results in a value of 3
where as page.evaluate("() => { return 1+2;}")
results in a value of Function
, because the Runtime.evaluate
command expects an expression.
To evaluate (async) functions a way to determine whether the expression
passed to Page::evaluate
is in fact a javascript function, if so we send this as Runtime.callFunctionOn
instead.
This could be achieved by writing a poor parser to distinguish between expression | function, or change the type of the expression
param that the evaluate
functions accept to a new trait (something like Evaluable
) and two types Expresion
and JsFunction
that are bundled in an enum.
There should be a new api function: Page::evaluate_function
that uses Runtime.callFunctionOn
.
The most intuitive solution would probably be Page::evaluate
strictly for Runtime.evaluate
and Page::evaluate_function
for Runtime.callFunctionOn
and possibly additional checks on Page::evaluate
Below is the code I am using to test oxide.
let baseurl = "http://www.google.com".to_owned();
let (browser, mut handler) =
Browser::launch(BrowserConfig::builder().with_head().build()?).await?;
let handle = tokio::task::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
let _p = browser.new_page(baseurl.clone()).await?;
handle.await?;
Ok(())
The page comes up and the viewport/webpage is not expanding to the window extents. Is there something that would cause this resize issue?
In the event the websocket crashes no more commands can be send or received so in that we case we should
I'm needing to use browser.on
events similar to what puppeteer documents here:
https://pptr.dev/#?product=Puppeteer&version=v5.5.0&show=api-class-browser
The main ones I'm interested in are browser.on('targetchanged')
and browser.on('targetcreated')
.
I think puppeteer sends Target.setDiscoverTargets
[1] on start up and and emits those events when it receives Target.targetInfoChanged
and Target.targetCreated
respectivly.
I'm willing to take this on however I'm unsure on the correct way to send messages to the connection.
Hi Team,
Need to generate pdf
file from the static HTML
content instead of URL
.
There does not seem to be a way to set timeout for navigation
Support for several BrowserContextId
s is missing.
Allow activating incognito mode for an already running browser as well as from the get go.
Working on a project where it would be super useful to know what gets logged to the browser console, and we're handling it like so:
let mut events = page.event_listener::<EventEntryAdded>().await?;
eprintln!("Logging to console...");
page
.evaluate("console.log('foo')")
.await?;
while let Some(event) = events.next().await {
eprintln!("{:?}", event);
}
The problem is that events.next() seems to be hanging forever.
Is this the correct way to get messages from the browser console?
Headless chrome added support to detect more browsers (https://github.com/atroche/rust-headless-chrome/blob/aaa6cb03efefbe67996f35802860c6d7fce3b14b/src/browser/mod.rs#L430-L442), do you want me to port those?
I don't think this would be an issue, but we probably want to rework the order in which they are tried.
The biggest gain is most likely from the support of edge.
that's it.
if launch node inspect process with --inspect-brk
, the first paused event was fire with pause reason "Break on start", bug PauseReson missing this feature variant.
my env:
Hi, There are some changes since last publish, is there any plan to publish chromiumoxide 0.3.0?
windows 10
cargo run
Updating git repository https://github.com/mattsse/chromiumoxide
Updating git repository https://github.com/sdroege/async-tungstenite
Updating crates.io index
Blocking waiting for file lock on package cache
Blocking waiting for file lock on package cache
Downloaded input_buffer v0.4.0
Downloaded tungstenite v0.12.0
Downloaded 2 crates (62.1 KB) in 2.28s
Blocking waiting for file lock on package cache
Compiling syn v1.0.60
Compiling winapi v0.3.9
Compiling getrandom v0.2.2
Compiling input_buffer v0.4.0
Compiling rand_core v0.6.1
Compiling rand_chacha v0.3.0
Compiling rand v0.8.3
Compiling nb-connect v1.0.2
Compiling winapi-util v0.1.5
Compiling async-process v1.0.2
Compiling atty v0.2.14
Compiling termcolor v1.1.2
Compiling ctor v0.1.19
Compiling serde_derive v1.0.123
Compiling futures-macro v0.3.12
Compiling thiserror-impl v1.0.23
Compiling async-attributes v1.1.2
Compiling value-bag v1.0.0-alpha.6
Compiling log v0.4.14
Compiling futures-util v0.3.12
Compiling polling v2.0.2
Compiling kv-log-macro v1.0.7
Compiling tungstenite v0.12.0
Compiling env_logger v0.7.1
Compiling thiserror v1.0.23
Compiling async-io v1.3.1
Compiling which v4.0.2
Compiling pretty_env_logger v0.4.0
Compiling async-global-executor v2.0.2
Compiling async-std v1.9.0
Compiling futures-executor v0.3.12
Compiling futures v0.3.12
Compiling serde v1.0.123
Compiling serde_json v1.0.62
Compiling async-tungstenite v0.12.0 (https://github.com/sdroege/async-tungstenite#f51f0744)
Compiling chromiumoxide_types v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_pdl v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide_cdp v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
Compiling chromiumoxide v0.2.0 (https://github.com/mattsse/chromiumoxide?branch=main#7d7ef5a5)
error[E0432]: unresolved import crate::browser::process::get_chrome_path_from_registry
--> C:\Users\52752.cargo\bin\git\checkouts\chromiumoxide-b0a6fd9b32d92502\7d7ef5a\src\browser.rs:560:13
|
560 | use crate::browser::process::get_chrome_path_from_registry;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no get_chrome_path_from_registry
in process
error: aborting due to previous error
For more information about this error, try rustc --explain E0432
.
error: could not compile chromiumoxide
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed
F:\web\xiaolu\rust\greetings\web>
I'm not sure if waiting for Target.targetdestroyed is necessary if assuming target eventually gets cleaned up. Page gets consumed and if you are dealing with raw ids you are on your own.
Agree. But there is a potential race condition for following scenario:
let el = page.find_element("input#searchInput").await?;
...
page.close().await?;
...
// this will fail because page is already closed.
el.click().await?;
This el.click()
will fail regardless, but at the moment the reason it fails is because the click command is sent to the browser and the browser returns an error. This happens because the Target
of the Page
is not removed yet and is still able to receive events sent from an Element
. Preferably the el.click()
should fail with a SendError
because the receiver should already be dropped at this point, preventing the overhead of sending a request that is guaranteed to fail.
I think it's more important to figure out a way to just destroy the page target since for majority of cases you don't care about unload and just want to clean up.
This can be done with Target.closeTarget
instead, that's how puppeteer does it.
Originally posted by @mattsse in #12 (comment)
i try to run cargo run --example wiki-tokio --features="tokio-runtime"
,it would be click input#searchInput
and input text about "Rust programming language".But it actually click the element that is #ca-history
,so the exmaples "wiki-tokio" finally will open https://en.wikipedia.org/w/index.php?title=Main_Page&action=history
.I've been thinking for a long time, but I don't know why.My chrome version is Version 89.0.4389.90 (Official Build) (64-bit)
the pdl generator runs rustfmt internally, this shouldn't result in a panic if rustfmt is not installed.
Listening on specfifc events is currently not supported. Instead the Handler
ist a Stream
of CdpEvent
s, however its doesn't actually yield them, but only the CdpError
s.
It would be nice to listen to a specific kind of event like Page.animationCreated
asynchronously.
The Handler
should probably refactored as Stream<Item = CdpError>
or as simple Future
the never finishes.
The individual Target
s should allow for registering event listeners that subscribe to a specific event and get the event via a channel. Since there can be several event listeners for each topic, the subscriptions are basically HashMap<Topic, Vec<EventListener>>
. Also we don't want to clone the event for every listener, instead the event in question should be wrapped in an Arc
and sent to every event listener channel without cloning.
Sending as event as trait objects (something along the lines of Arc<dyn Event>
) would be useful and would make using event listeners more convenient.
Also support for introducing custom events should be considered. In order to prevent mass cloning of serde_json::Value
, somehow we need to send some converter that turns serde_json::Value
into Arc<dyn Event>
along with the registration of the event listener.
I get this error whilst using Page::pdf
, I assume the PDF is too large for the internal channel that is used to communicate data?
thread 'actix-server worker 1' panicked at 'called `Result::unwrap()` on an `Err` value: Ws(Capacity(MessageTooLong { size: 23800033, max_size: 16777216 }))', src/main.rs:28:43`
I don't have a backtrace unfortunately because async destroys any useful information in it.
Relevant code
actix_web::rt::spawn(async move {
loop {
handler.next().await.unwrap().unwrap(); // panics here
}
});
Because cargo --git still assumes "master" branch instead of HEAD and because github has changed all repositories over to main the dependency lines in the README will fail with:
cannot locate remote-tracking branch 'origin/master'; class=Reference (4); code=NotFound
I had to specify the branch with:
chromiumoxide = { git = "https://github.com/mattsse/chromiumoxide", branch = "main"}
I'm seeing an issue that appears to be similar to the closed issue #52 in which browsers failing to open to new_page don't use the browser's timeout and instead hang indefinitely. It should be noted that I'm on v0.3.1 and so should have the fix from PR #56.
I have a timeout on the new_page() call (longer than the browser's request timeout).
If this timeout is hit and I move on, the browser refuses to allow any other requests until it is torn down.
Here's a minimal repro:
use futures_util::stream::{StreamExt};
use chromiumoxide::browser::{Browser, BrowserConfig};
use tokio::time::{timeout, sleep, Duration};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
println!("Started...");
let (browser, mut handler) = Browser::launch(
BrowserConfig::builder()
.request_timeout(Duration::from_secs(3))
.build()?,
)
.await?;
tokio::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
// Navigation to a valid sample page succeeds
attempt_navigation("https://example.com".to_owned(), &browser).await;
// Navigation to invalid url fails (and should). This is a silly example, it
// is simply to demonstrate the behaviour. When the browser cannot successfully
// open the page, it hangs indefinitely. Follow up requests (even those that
// previously worked) will fail until the browser is torn down
println!("Attempting to navigate to invalid page, should fail and timeout");
attempt_navigation("https://%wxample.com".to_owned(), &browser).await;
println!("Sleeping for a few seconds to allow timeout to happen");
sleep(Duration::from_secs(5)).await;
// Attempting the same page navigation that previously succeeded.
// -> should now fail
println!("Re-attempting to open previously successful page...");
attempt_navigation("https://example.com".to_owned(), &browser).await;
return Ok(());
}
async fn attempt_navigation(url: String, browser: &Browser) {
let page_result = timeout(Duration::from_secs(5), browser.new_page(&url)).await;
let _page = match page_result {
Ok(_page) => println!("Page {} navigated to successfully", &url),
Err(_) => println!("Request for new page timeout"),
};
return ;
}
Hello
I just start playing with this tool.
But i can't get any selector to work beside [name='j_username']
th code bellow will faile with error "-32000: DOM Error while querying"
same thing for [id='j_username']
did i miss somthing?
let page = browser.new_page("https://www.google.com/").await?; std::thread::sleep(std::time::Duration::from_secs(3)); page.find_element("a[href=https://policies.google.com/privacy?hl=en-MA&fg=1]").await? .click() .await?;
Tnx
Would it be of any interest to add the same options for screenshots that Puppeteer has? Specifically, this is the option to take full-page screenshots and omit the page background.
Here's the options, which are the same as CaptureScreenshotParams
, but with the added fullPage
and omitBackground
properties:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L141-L149
Here's where full-page and omitting of background screenshots happen:
https://github.com/puppeteer/puppeteer/blob/b57f3fcd5393c68f51d82e670b004f5b116dcbc3/src/common/Page.ts#L1657-L1725
An example of taking a full-page screenshot:
use chromiumoxide::{
browser::{Browser, BrowserConfig},
cdp::browser_protocol::{
dom::Rect,
emulation::{ScreenOrientation, ScreenOrientationType, SetDeviceMetricsOverrideParams},
page::{CaptureScreenshotFormat, CaptureScreenshotParams, GetLayoutMetricsReturns},
},
};
use futures::StreamExt;
#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let (browser, mut handler) = Browser::launch(BrowserConfig::builder().build()?).await?;
async_std::task::spawn(async move {
loop {
let _event = handler.next().await.unwrap();
}
});
let page = browser.new_page("https://en.wikipedia.org").await?;
page.wait_for_navigation().await?;
// Set the emulated device to the same size as the page
let GetLayoutMetricsReturns {
content_size: Rect { width, height, .. },
..
} = page.layout_metrics().await?;
page.execute(
SetDeviceMetricsOverrideParams::builder()
.mobile(false)
.width(width.ceil() as i64)
.height(height.ceil() as i64)
.device_scale_factor(1.0)
.screen_orientation(
ScreenOrientation::builder()
.angle(0)
.r#type(ScreenOrientationType::PortraitPrimary)
.build()?,
)
.build()?,
)
.await?;
page.save_screenshot(
CaptureScreenshotParams::builder()
.format(CaptureScreenshotFormat::Png)
.build(),
"wiki.png",
)
.await?;
Ok(())
}
Hi, I was trying to implement network hook like headless_chrome::enable_request_interception, and I hope to expose a trait Hook
, like this:
#[async_trait]
pub trait Hook {
async fn call(&self, request, output);
}
And as you can see, this would introduce async_trait as dependency, and I'm not sure is this being allowed or not.
The code looks like this:
async fn main() {
...
let (browser, _handler) = Browser::launch(
BrowserConfig::builder()
.build()
.unwrap()
)
.await
.unwrap();
...
for u in urls {
check(browser, &u).await // it's supposed to run with threading
}
...
}
async fn check(browser: Browser, url: &str) {
let page = browser.new_page(url).await.unwrap();
let check: bool = page.evaluate("some_script")
.await
.unwrap()
.into_value()
.unwrap();
...
}
I'm really new into Rust, but - this confuses me for threading/concurrent purposes.
Any workarounds for those?
Right now the Handler
is a Stream<Item= Result<CdpEventMessage>>
, however, nothing meaningful is returned. Since event subscriptions are supported now with #4 the handler should be Stream<Item= Result<()>>
, so basically a stream of errors.
Upgrade tokio to v1, sdroege/async-tungstenite#72
In nodejs if start Puppeteer with a null viewport, the viewport is set to fit the window size.
await puppeteer.launch( {
defaultViewport : null //viewport auto adjust to window size
} )
So i suggest
BrowserConfigBuilder::viewport(vp:Viewport)
->
BrowserConfigBuilder::viewport(vp:Option<Viewport>)
The Method
trait requires an instance for the functions that return the identifier.
Since all identifiers are known before hand and are constants anyways, making this a type function would make things easier and allows for more convenience at some parts of the api.
Drawback of this approach: custom events face limitations in determine their identifiers, since it is restricted to static.
However that can't be used dynamically when listening to events, since the identifiers must be known before hand.
Best solution would be to introduce another trait that focuses on type level.
Currently, the viewport will emulate a specified size or the default (800x600).
I'd like to be able to disable this when using the Browser::connect()
as it looks weird otherwise.
BrowserConfig
as a second parameter to Browser::connect()
disable_viewport_emulation
field to the Viewport struct (set to true
to disable viewport emulation.)Similar to what Puppeteer does when you set defaultViewport
to null
.
I've been having problems with find_elements
, but only some pages like this one, where it returns an error:
Chrome(Error { code: -32000, message: "Could not find node with given id" })`.
I'm basically doing:
page.goto("https://www.newegg.ca/westinghouse-wh27fx9019-27-full-hd/p/N82E16824569002").await?;
page.find_elements("a").await.unwrap();
Selecting other tags (ex <button>
) on that page works fine.
Given there are a sizeable amount of <a>
tags (approx 1900) on the page I thought that could be a problem, but it can successfully find them in wikipedia pages with a lot more links (upwards of 2500).
I tried to wait_for_navigation()
but that also didn't pan out.
Any idea what could be the problem?
Is request interception supported? I see NetworkManager
functions that seem like they may enable request interception (they appear to be related to Handler
and not Page
), but I don't see how to implement them in my own code.
Support for API functions Page::go_back
and Page::go_forward
should be added.
This would require to track the the history in the Target
and probably needs additional internal Message variants.
Size of the stored history should be configurable.
this project support frame yet
Hi,
If a browser is started with the following config:
BrowserConfig::builder().incognito().with_head().build()
...later calls of Browser::start_incognito_context
do not open an incognito context.
If .incognito()
above is removed, everything works as expected.
Page::event_listener()
doesn't receive events. The following is a small test listening for Fetch.requestPaused
: (I've also tried listening for other events and receive nothing).
use futures::StreamExt;
use chromiumoxide::error::Result;
use chromiumoxide::browser::Browser;
use chromiumoxide_cdp::cdp::browser_protocol::fetch;
mod get_debug_ws_url;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let debug_ws_url = get_debug_ws_url::get_ws_url().await?;
let (browser, mut handler) =
Browser::connect(debug_ws_url).await?;
let handle = async_std::task::spawn(async move {
loop {
let _ = handler.next().await.unwrap();
}
});
//Open new page to get page context
let page = browser.new_page("http://www.example.com").await?;
//send `Fetch.enable` - tells browser to pause on request to pattern (http://test/)
page.execute(fetch::EnableParams {
patterns: vec![
fetch::RequestPattern {
url_pattern: "http://test/*".to_string().into(),
resource_type: None,
request_stage: fetch::RequestStage::Request.into(),
}].into(),
handle_auth_requests: None,
}).await?;
//listen for `Fetch.requestPaused`
let mut events = page.event_listener::<fetch::EventRequestPaused>().await?;
async_std::task::spawn(async move {
while let Some(event) = events.next().await {
println!("Fetch.requestPaused");// Fetch.requestPaused is never recieved
}
});
//navigate to http://test/
page.goto("http://test/").await?; //this correctly pauses
handle.await;
Ok(())
}
Edit: Also, here's get_debug_ws_url.rs
.
use std::collections::HashMap;
pub async fn get_ws_url() -> Result<std::string::String, Box<dyn std::error::Error>> {
let resp = reqwest::get("http://127.0.0.1:9222/json/version")
.await?
.json::<HashMap<String, String>>()
.await?;
let web_socket_debugger_url = resp["webSocketDebuggerUrl"].clone();
Ok(web_socket_debugger_url)
}
Is it supported?
Running browser.execute(EnableParams {}).await?
fails with:
Error: Chrome(Error { code: -32601, message: "'HeapProfiler.enable' wasn't found" })
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.