linkedtales / scrapedin Goto Github PK
View Code? Open in Web Editor NEWLinkedIn Scraper (currently working 2020)
License: Apache License 2.0
LinkedIn Scraper (currently working 2020)
License: Apache License 2.0
Hi,
love the library so far, only missing part that I would like to have would be the option to click on the link ContactInfo and fetching the personal info like birthday + email.
Can you give me some clues how to add this functionality or is it already been planed ?
Cheers,
Christoph
When a user has many recommendations received and given we currently expand both sections and try to scrape the data however when one section is expanded the other can be "slightly" compressed loosing some of the items.
For example: a user with 11 received and 8 given is initially opened with 2 received and 2 given with "Show More". when you expand received with "Show More" button click all 11 show - however when you attempt to expand given all 8 given are shown but only 9 of the received are still visible - a NEW show 2 more button is displayed in received. Same thing happens when you then expand received fully again the given list is shortened to 4 items and the new button displays "show 4 more".
I believe the fix will need to update "seeMoreButtons" to alter the expansions from looping entirely through all sections to loop in other sections but to ONLY expand received (including the inline see more stuff) THEN collect the received item list (data). THEN switch to recommendations given and expand this section (including the inline see more stuff) THEN collect the recommendations given items.
Otherwise, currently the system can only collect 11 received and 2 given -OR- 8 received and 4 given out of a possible 11 received and 8 given.
The description
for each position
gets truncated
Example:
positions: [
{
...
description: "my very long descr... "
...
}
];
Is it possible to fix this or is it due to limitations on LinkedIn's APIs?
Just running this for the first time and received that error message with the second profile I scraped. Removed the actual profile for obvious reasons.
Here's the total error message:
(node:1944) UnhandledPromiseRejectionWarning: ReferenceError: relatedProfiles is not defined
at crawl (C:\Program Files\nodejs\scrapedin-linkedin-crawler-master\src\crawler.js:31:55)
at runMicrotasks ()
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:1944) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:1944) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
scrapedin: 2019-10-23T18:01:11.285Z info: [profile] finished scraping url: https://www.linkedin.com/in/xxxxxxxxxxxxxxxxxx
scrapedin: 2019-10-23T18:01:11.287Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2019-10-23T18:01:11.290Z error: error on crawling profile: https://www.linkedin.com/in/xxxxxxxxxxxxxxx
Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
(node:1944) UnhandledPromiseRejectionWarning: ReferenceError: relatedProfiles is not defined
at crawl (C:\Program Files\nodejs\scrapedin-linkedin-crawler-master\src\crawler.js:31:55)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:1944) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
I upgraded to recruiter account and I input the profile link , but I got this error on crawling profile message . since the link under recruiter account search looks longer than the normal profile links ,
not sure someone else have this issue before ?
It's look like the css rules have changed, i tried to fix it but i haven't enough time.
ERROR:
UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue
To avoid repetitive login process, the script could store cookies [2] and local storage data [1] in temporary files after the first time login and use them in the subsequent logins if the credentials are still valid so that the users of the module can avoid repetitive login and reduce the chances get their account blocked.
If the credentials in the temporary files have expired the page would be redirected to the default login page the script can repeat the process again.
This also accelerates the crawling process because the login setup can be skipped.
References
[1] https://stackoverflow.com/questions/51789038/set-localstorage-items-before-page-loads-in-puppeteer
[2] https://pptr.dev/#?product=Puppeteer&version=v1.19.0&show=api-pagesetcookiecookies
Hi @leonardiwagner, i'm still working on the accomplishments however the behaviour of the deployment of its subsections seems to be quite different from other sections. Actually that's what I'm able to obtain for the following profile linkedin.com/in/elinlehrmann:
{"contact":[{"type":"Elin’s Profile","values":["linkedin.com/in/elinlehrmann"]},{"type":"Website","values":["orcid.org/0000-0002-9869-9475 (Portfolio)"]}],
"profileAlternative":{"name":"Elin Lehrmann, Ph.D.","headline":"Scientist | Versatilist | Mentor | Life-long learner |","imageurl":"https://media.licdn.com/dms/image/C5603AQElAkv6ZVoRUg/profile-displayphoto-shrink_200_200/0?e=1578528000&v=beta&t=IjvhkUhwXMoYY3pmmPqTKV7JJ5IwqHNOLAijZn6N-ZI","location":"Baltimore, Maryland","connections":"406","summary":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."},
"aboutAlternative":{"text":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."},
"positions":[{"title":"Biologist","companyName":"National Institutes of Health (NIH): Intramural Research Program (IRP)","date1":"Oct 2008 – Present","date2":"11 yrs 2 mos"},{"title":"Staff Fellow","companyName":"National Institute on Drug Abuse (NIDA IRP), NIH","date1":"2000 – 2008","date2":"8 yrs"},{"title":"Postdoctoral Research Fellow","companyName":"Maryland Psychiatric Research Center (MRPC), University of Maryland School of Medicine","date1":"1996 – 2000","date2":"4 yrs"},{"title":"Ph.D. (Neuroscience)","companyName":"University of Southern Denmark, School of Medicine","date1":"1992 – 1996","date2":"4 yrs"},{"title":"Department of Anatomy (SOM), Aarhus University","location":"Aarhus, Denmark","date1":"1990 – 1992","date2":"2 yrs","roles":[{"title":"Research Assistant","date1":"1990 – 1992","date2":"2 yrs","location":"Aarhus, Denmark"},{"title":"Cand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs"}]},{"title":"Title\nResearch Assistant","location":"Aarhus, Denmark","date1":"1990 – 1992","date2":"2 yrs","roles":[{"title":"Research Assistant","date1":"1990 – 1992","date2":"2 yrs","location":"Aarhus, Denmark"}]},{"title":"Title\nCand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs","roles":[{"title":"Cand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs"}]}],
"educations":[{"title":"University of Southern Denmark","degree":"Ph.D.","fieldofstudy":"Neuroscience","date1":"1992","date2":"1996"},{"title":"Aarhus University","degree":"Cand. Scient.","fieldofstudy":"Biology (MSc), Chemistry (BSc)","date1":"1984","date2":"1992"}],
"skills":[{"title":"Analysis","count":"1"},{"title":"Problem Solving","count":"1"},{"title":"Project Planning","count":"1"},{"title":"Project Management"},{"title":"Foreign Languages"},{"title":"Nonprofit Organizations"},{"title":"Lifesciences","count":"3"},{"title":"Neuroscience","count":"41"},{"title":"Molecular Biology","count":"34"},{"title":"Genetics","count":"15"},{"title":"Bioinformatics","count":"6"},{"title":"Strategic Planning"},{"title":"Immunohistochemistry","count":"22"},{"title":"qPCR","count":"23"},{"title":"PCR","count":"14"},{"title":"Research","count":"6"},{"title":"Genomics","count":"6"},{"title":"Biotechnology","count":"6"},{"title":"Clinical Research","count":"3"},{"title":"Writing"},{"title":"Life Sciences","count":"19"},{"title":"Data Analysis","count":"2"},{"title":"Social Media"},{"title":"Marketing"},{"title":"Spreadsheets"},{"title":"Event Planning"},{"title":"Event Management"},{"title":"PowerPoint"},{"title":"Microsoft Word"},{"title":"Microsoft Excel"},{"title":"Microsoft Office"},{"title":"Time Management"},{"title":"Team Leadership"},{"title":"Public Speaking"},{"title":"Leadership"},{"title":"Teamwork"},{"title":"Management"},{"title":"Personal Development"},{"title":"Communication"},{"title":"Professional Ethics"},{"title":"Cross-disciplinary collaboration"},{"title":"Leading Meetings"},{"title":"In Vivo","count":"19"},{"title":"Life Skills"},{"title":"Sequencing","count":"3"},{"title":"Translational Medicine","count":"1"},{"title":"Toxicology","count":"1"},{"title":"Peer Reviews"},{"title":"Peer Mentoring"},{"title":"Computational Biology","count":"1"}],
"recommendations":{"givenCount":"0","receivedCount":"0","given":[],"received":[]},
"accomplishments":[{"Publications":["Elin has 75 publications\n75\nPublications\npublication title\nLoss of miR-451a enhances SPARC production during myogenesis.\n\npublication date\nMar 29, 2019 \npublication description\nPLoS One. 2019 Mar 29;14(3):e0214301.\n\npublication description\nAbstract. MicroRNAs (miRNAs) are small noncoding RNAs that critically regulate gene expression. Their abundance and function have been linked to a range of physiologic and pathologic processes. In aged monkey muscle, miR-451a and miR-144-3p were far more abundant than in young monkey muscle. This observation led us to hypothesize that miR-451a and miR-144-3p may influence muscle homeostasis. To test if these conserved microRNAs were implicated in myogenesis, we investigated their function in the mouse myoblast line C2C12. The levels of both microRNAs declined with myogenesis; however, only overexpression of miR-451a, but not miR-144-3p, robustly impeded C2C12 differentiation, suggesting an inhibitory role for miR-451a in myogenesis. Further investigation of the regulatory influence of miR-451a identified as one of the major targets Sparc mRNA, which encodes a secreted protein acidic and rich in cysteine (SPARC) that functions in wound healing and cellular differentiation. In mouse myoblasts, miR-451a suppressed Sparc mRNA translation. Together, our findings indicate that miR-451a is downregulated in differentiated myoblasts and suggest that it decreases C2C12 differentiation at least in part by suppressing SPARC biosynthesis.\n\nLoss of miR-451a enhances SPARC production during myogenesis. Munk R, Martindale JL, Yang X, Yang JH, Grammatikakis I, Di Germanio C, Mitchell SJ, de Cabo R, Lehrmann E, Zhang Y, Becker KG, Raz V, Gorospe M, Abdelmohsen K, Panda AC. PLoS One. 2019 Mar 29;14(3):e0214301. doi: 10.1371/journal.pone.0214301. eCollection 2019.\nPMID: 30925184\n\nSee publication Loss of miR-451a enhances SPARC production during myogenesis.\nSee publication\npublication title\nMuscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism.\n\npublication date\nFeb 6, 2019 \npublication description\nFASEB J. 2019 Feb 6:fj201801145R. doi: 10.1096/fj.201801145R. [Epub ahead of print]\n\npublication description\nAbstract. Sarcopenic obesity, the combination of skeletal muscle mass and function loss with an increase in body fat, is associated with physical limitations, cardiovascular diseases, metabolic stress, and increased risk of mortality. Cannabinoid receptor type 1 (CB1R) plays a critical role in the regulation of whole-body energy metabolism because of its involvement in controlling appetite, fuel distribution, and utilization. Inhibition of CB1R improves insulin secretion and insulin sensitivity in pancreatic β-cells and hepatocytes. We have now developed a skeletal muscle–specific CB1R-knockout (Skm-CB1R−/−) mouse to study the specific role of CB1R in muscle. Muscle-CB1R ablation prevented diet-induced and age-induced insulin resistance by increasing IR signaling. Moreover, muscle-CB1R ablation enhanced AKT signaling, reducing myostatin expression and increasing IL-6 secretion. Subsequently, muscle-CB1R ablation increased myogenesis through its action on MAPK-mediated myogenic gene expression. Consequently, Skm-CB1R−/− mice had increased muscle mass and whole-body lean/fat ratio in obesity and aging. Muscle-CB1R ablation improved mitochondrial performance, leading to increased whole-body muscle energy expenditure and improved physical endurance, with no change in body weight. These results collectively show that CB1R in muscle is sufficient to regulate whole-body metabolism and physical performance and is a novel target for the treatment of sarcopenic obesity.\n\nGonzález-Mariscal I, Montoro RA, O'Connell JF, Kim Y, Gonzalez-Freire M, Liu QR, Alfaras I, Carlson OD, Lehrmann E, Zhang Y, Becker KG, Hardivillé S, Ghosh P, Egan JM. Muscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism. FASEB J. 2019 Feb 6:fj201801145R. doi: 10.1096/fj.201801145R. [Epub ahead of print]\nPMID: 30726112\n\nSee publication Muscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism.\nSee publication\npublication title\nTopoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila.\n\npublication date\nNov 23, 2018 \npublication description\nNat Commun .9(1): 4946.\n\npublication description\nLee SK, Xue Y, Shen W, Zhang Y, Joo Y, Ahmad M, Chinen M, Ding Y, Ku WL, De S, Lehrmann E, Becker KG, Lei EP, Zhao K, Zou S, Sharov A, Wang W. Topoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila. Nat Commun. 2018 Nov 23;9(1):4946. doi: 10.1038/s41467-018-07101-4.\nPMID: 30470739\nDOI: 10.1038/s41467-018-07101-4\n\nAbstract. Topoisomerases solve topological problems during DNA metabolism, but whether they participate in RNA metabolism remains unclear. Top3β represents a family of topoisomerases carrying activities for both DNA and RNA. Here we show that in Drosophila, Top3β interacts biochemically and genetically with the RNAi-induced silencing complex (RISC) containing AGO2, p68 RNA helicase, and FMRP. Top3β and RISC mutants are similarly defective in heterochromatin formation and transcriptional silencing by position-effect variegation assay. Moreover, both Top3β and AGO2 mutants exhibit reduced levels of heterochromatin protein HP1 in heterochromatin. Furthermore, expression of several genes and transposable elements in heterochromatin is increased in the Top3β mutant. Notably, Top3β mutants defective in either RNA binding or catalytic activity are deficient in promoting HP1 recruitment and silencing of transposable elements. Our data suggest that Top3β may act as an RNA topoisomerase in siRNA-guided heterochromatin formation and transcriptional silencing.\n\nSee publication Topoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila.\nSee publication\npublication title\nHydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\n\npublication date\nAug 29, 2018 \npublication description\nNeurobiology of Aging\n\npublication description\nRD Brose, E Lehrmann, Y Zhang, RH Reeves, KD Smith, MP Mattson (2018).\nHydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\nNeurobiology of Aging\nhttps://doi.org/10.1016/j.neurobiolaging.2018.08.021 [Epub ahead of print]\n\nSee publication Hydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\nSee publication\npublication title\nMIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\n\npublication date\nAug 8, 2018 \npublication description\nNucleic Acids Res.\n\npublication description\nSun Q, Tripathi V, Yoon JH, Singh DK, Hao Q, Min KW, Davila S, Zealy RW, Li XL, Polycarpou-Schwarz M, Lehrmann E, Zhang Y, Becker KG, Freier SM, Zhu Y, Diederichs S, Prasanth SG, Lal A, Gorospe M, Prasanth KV. MIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\nNucleic Acids Res. 2018 Aug 8. doi: 10.1093/nar/gky696. [Epub ahead of print]\nPMID: 30102375\n\nSee publication MIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\nSee publication\nShow more"]},{},{}],
"peopleAlsoViewed":[{"user":"https://www.linkedin.com/in/kate-wilson-0a42407/","text":"Senior Director of Sustainability at Vail Resorts"},{"user":"https://www.linkedin.com/in/maire-doyle-b85a262b/","text":"Scientist (C) Principal Investigator, Role of insulin in taste transduction, cultivation of taste cell precursors"},{"user":"https://www.linkedin.com/in/marcellebergeron/","text":"Drug Development Professional"},{"user":"https://www.linkedin.com/in/saeed-azimi-a5bb1544/","text":"Immunotherapy"},{"user":"https://www.linkedin.com/in/hyun-k-1435627/","text":"Drug development"},{"user":"https://www.linkedin.com/in/jared-kartchner-b5229770/","text":"Attorney at Northern Virginia Estate Planning Services"},{"user":"https://www.linkedin.com/in/emily-hm-wong-919b7975/","text":"Scientist II, Computational Biology"},{"user":"https://www.linkedin.com/in/tomrohmann/","text":"Service Order Program Manager at J&J Worldwide Services"},{"user":"https://www.linkedin.com/in/lukecartin/","text":"Environmental Sustainability Manager at Park City Municipal Corporation"},{"user":"https://www.linkedin.com/in/martina-molsbergen-ba9b438/","text":"CEO at C14 Consulting Group, LLC"}],
"volunteerExperience":[{"title":"Organizing Committee member","experience":"Rhodesian Ridgeback World Congress 2016","description":"Animal Welfare","date1":"Sep 2014 – Aug 2016","date2":"2 yrs"},{"title":"Vice-President","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Jul 2015 – Present","date2":"4 yrs 5 mos"},{"title":"Treasurer","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Aug 2013 – Nov 2014","date2":"1 yr 4 mos"},{"title":"Member","experience":"Rhodesian Ridgeback World Congress Health Committee","description":"Animal Welfare","date1":"Jul 2016 – Present","date2":"3 yrs 5 mos"},{"title":"Regional News Column","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Aug 2017 – Present","date2":"2 yrs 4 mos"}],"profile":{"name":"Elin Lehrmann, Ph.D.","headline":"Scientist | Versatilist | Mentor | Life-long learner |","imageurl":"https://media.licdn.com/dms/image/C5603AQElAkv6ZVoRUg/profile-displayphoto-shrink_200_200/0?e=1578528000&v=beta&t=IjvhkUhwXMoYY3pmmPqTKV7JJ5IwqHNOLAijZn6N-ZI","location":"Baltimore, Maryland","connections":"406","summary":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."}}
As you can see, I only scrap 5 of the 75 publications display on that profile, I cannot obtain the link to the publications, and the languages part is not scraped.... I think this issue is related to two others issue, the number #1 and the number #36 . I have try to make some modifications to the seemore functions without any results.... I join the modified seemore file, I think making the code a bit more sequential could help us to tackle that issue but I don't know how to do that.
seemore.txt
Thanks man!
Protocol error (Runtime.evaluate): Target closed.
error when running scrapper
Positions title are coming as "Title\nPhD Student in Computer Science"
at page.waitFor.then.catch (/home/ubuntu/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:109:7)
(node:8341) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 2): Error: Protocol error (Runtime.callFunctionOn): Target closed.
(node:8341) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
I verified the password manually , it was no problem , what is the reason ? thanks
again! we need to find a more generic way to ensure that's a profile page
Attempt to run it in a remote server like heroku, encountered this error:
(node:9167) UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working man
ually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:9167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async
function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:9167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not
handled will terminate the Node.js process with a non-zero exit code.
(node:9167) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.
at Promise (/var/app/current/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise ()
at CDPSession.send (/var/app/current/node_modules/puppeteer/lib/Connection.js:182:12)
at ExecutionContext.evaluateHandle (/var/app/current/node_modules/puppeteer/lib/ExecutionContext.js:106:44)
at ExecutionContext. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at ElementHandle.$ (/var/app/current/node_modules/puppeteer/lib/JSHandle.js:378:50)
at ElementHandle. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at DOMWorld.$ (/var/app/current/node_modules/puppeteer/lib/DOMWorld.js:114:34)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/var/app/current/node_modules/puppeteer/lib/helper.js:108:27)
at Page.$ (/var/app/current/node_modules/puppeteer/lib/Page.js:300:29)
at Page. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:60:16)
at process._tickCallback (internal/process/next_tick.js:68:7)
It is not possible to do manual verification as it is not possible to open a browser in such server and provide the verification. Is there any workaround to this issue?
First, Thank you for the time and the work you have used to develop that solution
I added code requested at "https://github.com/jontewks/puppeteer-heroku-buildpack" to use puppeteer
const args = proxyAddress && [--proxy-server = $ {proxyAddress}
, '--no-sandbox', '--disable-setuid-sandbox']
But I get this error:
28 Feb 2019 05: 31: 29.549168 <190> 1 2019-02-28T10: 31: 28.908987 + 00: 00 app web.1 - - (node: 4) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
Heroku / Syslog drain
28 Feb 2019 05: 31: 29.549447 <190> 1 2019-02-28T10: 31: 28.909012 + 00: 00 app web.1 - - [0228 / 103126.902390: FATAL: zygote_host_impl_linux.cc (116)] Not usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
Heroku / Syslog drain
Do you know any solution to this problem with the google kernel? I appreciate any help.
Regards
error while loading shared libraries: libX11-xcb.so.1: cannot open shared object file: No such file or directory
TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose (/home/scrapedin-linkedin-crawler/node_modules/puppeteer/lib/Launcher.js:348:14)
at Interface.helper.addEventListener (/home/scrapedin-linkedin-crawler/node_modules/puppeteer/lib/Launcher.js:337:50)
at Interface.emit (events.js:203:15)
at Interface.close (readline.js:397:8)
at Socket.onend (readline.js:173:10)
at Socket.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1143:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
(node:2999) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:2999) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Attempted scrap of my own profile and see that both Company name and Location are not being collected due to changes in the .css of the Linkedin website.
Modifications to the "profilescrapertemplate.js" file in two places correct the issue. I have the fix in place if you would like me to submit a pull request.
otherwise the two lines should be:
companyName: 'p.pv-entity__secondary-title',
location: 'h4.pv-entity__location span:nth-child(2)',
error about not finding the profile selector is thrown
When I use the scrapper with login/password options it works fine.
But when I use my cookie, I get an error of type 'linkedin: profile not found'
const fs = require('fs');
const scrapedin = require("scrapedin");
// const options = { email: "[email protected]", password: "abcdefgh" };
const cookies = fs.readFileSync('./cookie.txt');
const options = {
cookies: JSON.parse(cookies)
}
console.log(options);
const url = 'https://www.linkedin.com/in/wassimazirar/';
scrapedin(options)
.then(profileScraper => profileScraper(url, 2000))
.then(profile => {
console.log(profile);
});
(node:13224) UnhandledPromiseRejectionWarning: Error: linkedin: profile not found
at ..\node_modules\scrapedin\src\profile.js:19:13
at async module.exports (..\node_modules\scrapedin\src\profile.js:16:3)
(node:13224) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:13224) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
When the user has a lot of items on sections as positions (experiences) or recommendations, the "show more" button only displays plus 5 positions. If the user has more than 5, another "show more X" positions buttons are displayed, so we need to click in that button again before gathering the positions.
I'm labeling this issue as "low" since it's being gathered already the last 10 items.
Is there any interest in adding this sort of scraping? It would require changing the API as it's pretty much centered around profiles only. I have a potential use-case for it, though.
Seems like no matter what I set waitTimeMs
to it is ignored and the recaptcha is closed before I can finish it — it shows me more than three recaptchas sometimes.
Hi - thank you for putting this together, I'm excited to use it!
The issue I am having at the moment is that when I am scraping my personal profile and since I have multiple positions at the same company, it's not giving a clean data structure for that company. What is returning is one value with the company name and the most recent description, and then there are 2 separate values for the individual positions, that don't have the company name, and the Title has "Title" preceding the name of the position.
My profile is https://www.linkedin.com/in/alexandercaulfield/
Response
[ { title: 'Wayfair',
[0] location: 'Boston, MA',
[0] description:
[0] '- Mentor for 25 entry level engineers on a month long project, holding office hour sessions and giving code reviews (for PHP & React)',
[0] date1: 'Jul 2019 – Present',
[0] date2: '5 mos',
[0] roles: [ [Object], [Object] ] },
[0] { title: 'Title\nSoftware Engineer II',
[0] location: 'Boston, MA',
[0] description:
[0] '- Mentor for 25 entry level engineers on a month long project, holding office hour sessions and giving code reviews (for PHP & React)',
[0] date1: 'Jul 2019 – Present',
[0] date2: '5 mos',
[0] roles: [ [Object] ] },
[0] { title: 'Title\nSoftware Engineer I',
[0] location: 'Boston, MA',
[0] description:
[0] '- Full stack development for Wayfair\'s customer-facing storefront (Shop the Look) in React, PHP, SQL and GraphQL\n- Developed the backend for a large A/B test that resulted in reduced exit rate on the page by 3% and an improved bounce rate by -4.5% by leveraging recommended content to users based on their likes and dislikes\n- Tech lead the implementation of a feature on the new page redesign. This included scoping out tickets, gathering requirements with product, constructing the frontend architecture, executing on creating the UI and delegating tasks to other engineers when appropriate\n- Proactive about sharing new ideas for Shop the Look\'s product and supported ideas with data and customer insights\n...\nsee more',
[0] date1: 'Sep 2018 – Jul 2019',
[0] date2: '11 mos',
[0] roles: [ [Object] ] },
Let me know if I can do anything to help!
Since gathering all recommendations can be faulty, getting the number would be great!
when I use "args":["--no-sandbox","--proxy-server=http://server:port"]}, can not login linkedin.
always prompt "manual check was required".
just want to know, is there anybody scrape profile with proxy successfuly?
@leonardiwagner thanks for this great stuff i am using your library to crawl but it says no profile to crawl right now can you please fix this
after v1 milestone, we need to update README docs
Since it's a background image, it needs to be gathered via JS using page.$eval
hi everyone , when I use this scraper , the past and current position information was not kept , any one know what caused this issue ? Thanks
(node:7117) UnhandledPromiseRejectionWarning: Error: Page crashed!
at Page._onTargetCrashed (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Page.js:215:24)
at CDPSession.Page.client.on.event (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Page.js:123:56)
at CDPSession.emit (events.js:198:13)
at CDPSession._onMessage (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Connection.js:200:12)
at Connection._onMessage (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Connection.js:112:17)
at WebSocketTransport._ws.addEventListener.event (/home/tony/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/WebSocketTransport.js:44:24)
at WebSocket.onMessage (/home//scrapedin-linkedin-crawler/node_modules/ws/lib/event-target.js:120:16)
at WebSocket.emit (events.js:198:13)
at Receiver.receiverOnMessage (/home/tony/scrapedin-linkedin-crawler/node_modules/ws/lib/websocket.js:789:20)
at Receiver.emit (events.js:198:13)
(node:7117) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue
Hello there 👋
The library works like a charm to retrieve information about the profile general info, thanks!!
I was wondering if I could also use it to get the latest posts of a person that I follow or that has a public profile.
Or maybe, if it's easier, the posts related to a certain hashtag, pointing this kind of URL: https://www.linkedin.com/feed/hashtag/crypto/
Thank you 😊
https://github.com/linkedtales/scrapedin/blob/master/src/login.js#L32 where does node come from
When a LinkedIn user has multiple positions in same company, the company name scraped is correct but when the user just has one position in a company, the company name which gets scraped is the position of the user in the company
As described on #18
Hello there 👋
The library works pretty well to retrieve information about the profile general info, thanks!!
I was wondering if I could also use it to get more detailed picture of accomplishments, i.e Publications, Patents, Language, etc...
Or maybe, if it's easier to visualize the suggestion you can have a look at this profile which contained many Publications https://www.linkedin.com/in/elinlehrmann/. In that case it's will be useful for some applications to have the 75 publications descriptions.
Thank you 😊
Put the code in the readme in a scrap.js
file and then run node scrap.js
but been getting the error
const profileScraper = await scrapedin({ email: '[email protected]', password: 'pass' })
^^^^^
SyntaxError: await is only valid in async function
How can I bypass this?
correct me if I am wrong
const clickAll = async(page) => {
for(let i = 0; i < seeMoreButtons.length; i ++){
const button = seeMoreButtons[i]
if (button.selector) {
const elem = await page.$(button.selector)
if (elem) {
if a selector matches more than one result, then we need to loop to click.
I observed, for example, only one see_more is being clicked and the rest is ignored.
Hi Team,
This is working fine in local server. but same code is not working in live server. So pls check and help me for solve this problem.
Error: Protocol error (Runtime.callFunctionOn): Target closed.
ScrapedIn is able to open the profiles in browser but does not scrape or save anything.
the code for my profileScraper is below, it opens the Chromium window but closes before I can finish the recaptcha. if you know what's going on and if you could fix it that would be amazing.
const profileScraper = await scrapedin({
email: LINKEDIN_EMAIL, password: LINKEDIN_PASSWORD, isHeadless: false, hasToLog: true
})
.catch(err => {
console.error(err);
});
this is what it logs and the error message
scrapedin: 2019-10-01T13:56:57.535Z info: [login] logging at: https://www.linkedin.com/login
scrapedin: 2019-10-01T13:57:13.591Z warn: [login] successful login element was not found
scrapedin: 2019-10-01T13:57:13.594Z warn: [login] manual check was required
Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/Users/nickedwards/RupieNetwork/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
there is also another message
Unhandled Rejection at: Error: Protocol error (Runtime.callFunctionOn): Target closed.
@leonardiwagner this template works wells, I have try it but I have some trouble when it comes to deploy the sections entirely. Maybe you could had a look and tell me where I should put efforts now.
If someone had an idea to extract link to co-author or co-inventors it will be very useful also to integrate this.
Thanks man
Template_function.txt
it is that possible to scrape the "Contact Info" in every profile? Thanks a lot
Currently dates are raw as displayed on website
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.