vdrmota / social-media-and-contact-info-extractor Goto Github PK
View Code? Open in Web Editor NEWRun this scraper for free: https://apify.com/vdrmota/contact-info-scraper
License: Apache License 2.0
Run this scraper for free: https://apify.com/vdrmota/contact-info-scraper
License: Apache License 2.0
Sometimes you want to ensure a minimum depth (like a subsection of a website) so it doesn't go to the top level pages, only deeper down according to the max depth parameter
Hey guys. Im noob. How can I install it? Thank you
We want to include additional information in the results that we pass in the request object.
{ url: 'https://example.com, userData: { foo: 'bar' } }
If i fork the repo will i be able to acces it in the handlePageFunction?
Create an INPUT_SCHEMA option under Advanced Configuration as Create a webhook task
that can extract urls for visiting from the target dataset field, like website
(seamless integration with Google Maps, Webscraper, Yelp, etc)
Can we limit the extractions? As now for some URLs it gives 10 phone numbers or twitter handle etc... but is it possible to limit the # of extractions?
in apify.com youtube has not been scraped from the page
https://my.apify.com/view/runs/T1h2s6zbYyGoEVHQN
What are your plans to upgrade to Apify SDK v3 (Crawlee)
Useful for adding specific page data along with the dataset
What does NormalizeURL method do?
Does it convert all url to http:// ?
This is really a dev question. I am developing and running locally. I am not using the apify platform.
I want to customize your Actor but do not want to lose the changes when I pull your updates into my local repo.
I guess git should handle that for me. so i guess thereis no question after all :)
Add options to make sure you extract at least something from the page. can set manually the counts as .length >=
for email, facebook, etc.
When i try to run this locally I am getting error:
(node:1723466) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 16)
(node:1723466) UnhandledPromiseRejectionWarning: ApifyApiError: By launching this job you would exceed the memory limit of 4096MB for all your actors (currently used: 4096MB, requested: 4096MB). Please upgrade to a paid plan to increase your actor memory limit.
clientMethod: undefined
statusCode: 402
type: actor-memory-limit-exceeded
attempt: 1
httpMethod: post
path: /v2/acts/vdrmota~contact-info-scraper/runs
stack:
at makeRequest (/home/msihadmin/Documents/apify-pricelocal/node_modules/apify-client/src/http_client.js:136:30)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:1723466) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 17)
First 10 digits of the mobile number getting extracted.
Suppose the mobile number is +91 95...(10digits) then only the first 10 digits(Starting country code) will get scrapped leaving the last 2 digits unscrapped and invalid mobile number.
Also, rename the current field Max pages
to Max pages total
Scraper can not work with requestsFromUrl in startUrls params.
Example input:
{
"startUrls": [
{
"requestsFromUrl": "https://apify-uploads-prod.s3.amazonaws.com/files-with-urls.txt"
}
],
"proxyConfig": {
"useApifyProxy": true
},
"maxDepth": 2,
"maxRequests": 100,
"sameDomain": false,
"liveView": false,
"considerChildFrames": true
}
Run with this input failed:
2019-07-19T13:03:45.052Z ERROR: The function passed to Apify.main() threw an exception: (error details: type=invalid-parameter)
2019-07-19T13:03:45.053Z ApifyClientError: Parameter "url" of type String must be provided
2019-07-19T13:03:45.055Z at exports.checkParamOrThrow (/home/myuser/node_modules/apify-client/build/utils.js:222:15)
2019-07-19T13:03:45.056Z at new Request (/home/myuser/node_modules/apify/build/request.js:137:34)
2019-07-19T13:03:45.058Z at input.startUrls.map (/home/myuser/src/main.js:16:21)
2019-07-19T13:03:45.059Z at Array.map (<anonymous>)
2019-07-19T13:03:45.061Z at Apify.main (/home/myuser/src/main.js:15:35)
2019-07-19T13:03:45.062Z at process._tickCallback (internal/process/next_tick.js:68:7)
I am running this locally on my computer running Linux.
When I run the script many chrome pages open up on my screen.
How can I run this headless?
Hello there,
Im trying to use the "Social-Media-and-Contact-Info-Extractor" actor from APIFY, but its extracting the incomplete phone number using apify-js function - brazilian has a different type of phone.
Can I change the regex code to (xx)xxxx-xxxx or (xx)xxxxx-xxxx?
It will help me so much.
Brazilian phone examples:
(51) 5667-9987
(19) 94138-9398
(11) 96944-2436
An example website (acasadospets.com) screenshots and the result:
Hey!
On pages where the actor could find phone numbers like:
0175/234234, 0160/345345 and +49151/456456
It just adds 234234, 345345 and 456456 to the result set "phonesUncertain".
Configuration per Puppeteer/Cheerio/Web scraper standards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.