Comments (12)
I would do that as follows:
- set the focus on the input element. We need to run Javascript in the browser to do that using the
Runtime.evaluate
method - insert text in the input field with the
Input.insertText
method - "press" the
Enter
key. We need to simulate both thekeydown
and thekeyup
event using theInput.dispatchKeyEvent
method
For instance,
# Helpers -----------------------------------------------------------------
# Set the focus on an element
set_focus <- function(client, selector) {
client$Runtime$evaluate(
sprintf("document.querySelector('%s').focus()", selector)
)
}
# Press the 'Enter' key
press_enter_key <- function(client) {
dispatch_enter_key_event <- function(client, type) {
client$Input$dispatchKeyEvent(
type = type,
windowsVirtualKeyCode = 13,
code = "Enter",
key = "Enter",
text = "\r",
unmodifiedText = "\r"
)
}
promises::then(
dispatch_enter_key_event(client, "keyDown"),
~ dispatch_enter_key_event(client, "keyUp")
)
}
# Main script -------------------------------------------------------------
library(crrri)
chrome <- Chrome$new(bin = find_chrome_binary())
client <- chrome$connect(callback = function(client) {
client$inspect()
})
Page <- client$Page
Input <- client$Input
Page$enable() %...>% {
Page$navigate(url = "https://orcid.org/")
Page$loadEventFired() # await the load event
} %...>% {
# the selector may be different for another website
# you may need to modify it
set_focus(client, selector = "input")
} %...>% {
Input$insertText("0000-0002-0721-5595") # type an ORCID
} %...>% {
press_enter_key(client)
}
from crrri.
You get this issue because this page has several input
elements. The JavaScript method document.querySelector()
returns the first matched element. In this case, the first input
element is not the input field that you want.
As you can see, the NΓMERO DE DOCUMENTO input element has an id
. You can pass this id
to the document.getElementById()
JavaScript method.
# Helpers -----------------------------------------------------------------
# Set the focus on an element
set_focus_on_id <- function(client, id) {
client$Runtime$evaluate(
sprintf("document.getElementById('%s').focus()", id)
)
}
# Press the 'Enter' key
press_enter_key <- function(client) {
dispatch_enter_key_event <- function(client, type) {
client$Input$dispatchKeyEvent(
type = type,
windowsVirtualKeyCode = 13,
code = "Enter",
key = "Enter",
text = "\r",
unmodifiedText = "\r"
)
}
promises::then(
dispatch_enter_key_event(client, "keyDown"),
~ dispatch_enter_key_event(client, "keyUp")
)
}
# Main script -------------------------------------------------------------
library(crrri)
chrome <- Chrome$new(bin = find_chrome_binary())
client <- chrome$connect(callback = function(client) {
client$inspect()
})
Page <- client$Page
Input <- client$Input
Page$enable() %...>% {
Page$navigate(url = "https://dependenciasectorpublico.trabajo.gob.ec/DependenciaLaboralSectorPublico/")
Page$loadEventFired() # await the load event
} %...>% {
# the id may be different for another website
# you may need to modify it
set_focus_on_id(client, id = "frmTodo:txtCedula")
} %...>% {
Input$insertText("0952330223") # type the chain
} %...>% {
press_enter_key(client)
}
from crrri.
Sorry, I haven't sufficient time for diving more in the captcha case.
In order to do that, I'd try to use the Network
domain to intercept the images. You can find an example with the chrome-remote-interface
Node module here: https://stackoverflow.com/a/46079635.
from crrri.
Thanks for using crrri
! I observed that when you type your ID in the search bar, it creates a new url of the form
https://orcid.org/orcid-search/search?searchQuery=0000-0001-6300-9350
That means you can replicate the search by creating this type of url with filling the ID in the query. (using paste()
function, glue
π¦ or urltools
π¦ will help to do that)
Then you'll get to the desired page where you can retreive the table I guess.
Dos it make sense ? Did you already tried that ?
from crrri.
Dear Dr. Christophe Dervieux thanks for your message. What you mention is correct. The easy way is to take that string and use a package to extract. But I am working with different web pages and most of them don't have that option.
That is why I would like to do the task with crrri
at least until the point where I have to send the string to the page and enable the search.
Many thanks for your help, I have been trying to do that extraction with your package but my knowledge is limited.
Kind regards!
from crrri.
I don't know how to do that. It requires some searching. It seems you can to that with puppeteer but I don't know how they do that.
@RLesur is there a JS way to do that ?
from crrri.
So it seems you can do that in JS.
With puppeteer, it seems very easy !
-
page.type
=> https://devdocs.io/puppeteer/index#pagetypeselector-text-options -
page.submit
=> https://devdocs.io/puppeteer/index#pageclickselector-options -
snippet: https://www.codota.com/code/javascript/functions/puppeteer/Page/type
Not easy to reproduce here I think :(
from crrri.
@cderv puppeteer has dozens of great high level functions!
page.type
, page.press
,... are defined here https://github.com/puppeteer/puppeteer/blob/main/src/common/Input.ts.
For the press_enter_key()
function, I've stolen the parameters here.
The page.click
method is definitely great and wouldn't be so easy to reproduce (but this is feasible).
from crrri.
@RLesur Dear Dr. Lesur. Many thanks for your time with this issue. It is amazing how to reach to the final output. As I mentioned, I am trying different pages. When I apply over the next page: https://dependenciasectorpublico.trabajo.gob.ec/DependenciaLaboralSectorPublico/
I am not able to get any results. Maybe, due to my lack of web page knowledge I am not using or defining proper features. I use next code. It is the same as yours:
# Helpers -----------------------------------------------------------------
# Set the focus on an element
set_focus <- function(client, selector) {
client$Runtime$evaluate(
sprintf("document.querySelector('%s').focus()", selector)
)
}
# Press the 'Enter' key
press_enter_key <- function(client) {
dispatch_enter_key_event <- function(client, type) {
client$Input$dispatchKeyEvent(
type = type,
windowsVirtualKeyCode = 13,
code = "Enter",
key = "Enter",
text = "\r",
unmodifiedText = "\r"
)
}
promises::then(
dispatch_enter_key_event(client, "keyDown"),
~ dispatch_enter_key_event(client, "keyUp")
)
}
# Main script -------------------------------------------------------------
library(crrri)
chrome <- Chrome$new(bin = find_chrome_binary())
client <- chrome$connect(callback = function(client) {
client$inspect()
})
Page <- client$Page
Input <- client$Input
Page$enable() %...>% {
Page$navigate(url = "https://dependenciasectorpublico.trabajo.gob.ec/DependenciaLaboralSectorPublico/")
Page$loadEventFired() # await the load event
} %...>% {
# the selector may be different for another website
# you may need to modify it
set_focus(client, selector = "input")
} %...>% {
Input$insertText("0952330223") # type the chain
} %...>% {
press_enter_key(client)
}
I have inspected the the web page and the selector is an input as seen next:
So set_focus()
should work but it doesn't.
The output I get is next:
Which is empty. I have tried changing the selector but it doesn't work.
I would thank for any kind of help in order to solve this issue. The code works perfectly for the first page but for this second page is not working.
from crrri.
Dear Dr. Lesur @RLesur infinite thanks for your help. Now it is easier for me managing the page structure to obtain the result. I am curious if it is possible to apply a similar method to pages with captcha. I have some of them in the bunch of pages I have to work. But if it is not possible I will understand and I would have to avoid those pages. I have next page where I used the code you kindly helped me. The page is this:
https://www.senescyt.gob.ec/web/guest/consultas
It has a field that can be completed with the same structure you shared but it includes a captcha option like this:
Here the code I used:
# Helpers -----------------------------------------------------------------
# Set the focus on an element
set_focus_on_id <- function(client, id) {
client$Runtime$evaluate(
sprintf("document.getElementById('%s').focus()", id)
)
}
# Press the 'Enter' key
press_enter_key <- function(client) {
dispatch_enter_key_event <- function(client, type) {
client$Input$dispatchKeyEvent(
type = type,
windowsVirtualKeyCode = 13,
code = "Enter",
key = "Enter",
text = "\r",
unmodifiedText = "\r"
)
}
promises::then(
dispatch_enter_key_event(client, "keyDown"),
~ dispatch_enter_key_event(client, "keyUp")
)
}
# Main script -------------------------------------------------------------
library(crrri)
chrome <- Chrome$new(bin = find_chrome_binary())
client <- chrome$connect(callback = function(client) {
client$inspect()
})
Page <- client$Page
Input <- client$Input
Page$enable() %...>% {
Page$navigate(url = "https://www.senescyt.gob.ec/web/guest/consultas")
Page$loadEventFired() # await the load event
} %...>% {
# the id may be different for another website
# you may need to modify it
set_focus_on_id(client, id = "formPrincipal:identificacion")
} %...>% {
#1720245768
Input$insertText("0952330223") # type the chain
} %...>% {
press_enter_key(client)
}
The same scheme works as the value for first field can be inserted using the previous functions. With that code I can reach until this stage:
After exploring the structure of this page, I noticed this:
The captcha is produced and saved into a .jpg
file. whose direction appears in the src
argument in the id
named formPrincipal:capimg
. I was thinking if there could be a way to obtain that image in the src
argument and then extract the text.
The text could be extracted from the image with functions from tesseract
package like this: eng <- tesseract("eng")
and text <- tesseract::ocr("thepathfromsrc", engine = eng)
. This could be a new string and then I could pass with Input$insertText
to another id.
What would be an approach to obtain the captcha image and then feed the extracted text to the page? Many thanks for your help.
from crrri.
Dear Dr. Lesur @RLesur many thanks with that info I will reseach about the topic. I have about if it is possible to modify press_enter_key
function into a function that makes click on a search button in a web page. Could you please recommend me what crrri
functions I can use to build a function of this style?
from crrri.
I've never tried to click on an element but I would adopt the following strategy:
DOM.getDocument
to obtain the rootnodeId
DOM.querySelector
to obtain thenodeId
of the selected element (the rootnodeId
is required here)DOM.getBoxModel
to obtain the coordinates of the element bounding box. Then, compute the element centroid.Input.dispatchMouseEvent
twice (the first one with themousePressed
event, the second one with themouseReleased
event). The element centroid is used here.
I hope this will help you.
from crrri.
Related Issues (20)
- Logic for returning results from a function that get XHR calls response body HOT 3
- Can't create chromium instance on debian linux 10 HOT 1
- chrome_read_html HOT 6
- Run in Github Actions HOT 4
- Is possible to extract data from Power BI dashboard using crrri package?
- Select Dropdown not working
- Upload File HOT 1
- Cannot open URL 'http://localhost:9222/json/new': HTTP status was '405 Method Not Allowed' HOT 4
- websites that don't like to be scraped HOT 1
- Document R6 class using new roxygen feature HOT 5
- Finalize a stable version ? HOT 1
- Add support and document about New Edge Chromium HOT 1
- Allow to load user profile in non-headless mode HOT 8
- add some more default flags to launch chromium
- Allow children to close parent connection HOT 2
- Suggestion: automatically find a free port if the specified is not. HOT 2
- Purge crrrri cache HOT 1
- switch CI to Github Actions HOT 2
- R 4.0.0 now runs donttest example
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crrri.