Giter Club home page Giter Club logo

scrapedin's People

Contributors

avifatal avatar emasuriano avatar fgobin avatar gautierdag avatar kf6kjg avatar kristianlauttamus avatar lcalvy avatar leonardiwagner avatar mstniy avatar muchai-mercy avatar mvegter avatar netgator avatar radamant24 avatar yosefc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapedin's Issues

Feature Request: Contact info click and retrieve data like Email

Hi,

love the library so far, only missing part that I would like to have would be the option to click on the link ContactInfo and fetching the personal info like birthday + email.

Can you give me some clues how to add this functionality or is it already been planed ?

Cheers,
Christoph

Recommendations Received and Given will not work the way data is being collected right now.

When a user has many recommendations received and given we currently expand both sections and try to scrape the data however when one section is expanded the other can be "slightly" compressed loosing some of the items.

For example: a user with 11 received and 8 given is initially opened with 2 received and 2 given with "Show More". when you expand received with "Show More" button click all 11 show - however when you attempt to expand given all 8 given are shown but only 9 of the received are still visible - a NEW show 2 more button is displayed in received. Same thing happens when you then expand received fully again the given list is shortened to 4 items and the new button displays "show 4 more".

I believe the fix will need to update "seeMoreButtons" to alter the expansions from looping entirely through all sections to loop in other sections but to ONLY expand received (including the inline see more stuff) THEN collect the received item list (data). THEN switch to recommendations given and expand this section (including the inline see more stuff) THEN collect the recommendations given items.

Otherwise, currently the system can only collect 11 received and 2 given -OR- 8 received and 4 given out of a possible 11 received and 8 given.

Position descriptions get truncated

The description for each position gets truncated

Example:

positions: [
  {
    ...
    description: "my very long descr... "
    ...
  }
];

Is it possible to fix this or is it due to limitations on LinkedIn's APIs?

Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues

Just running this for the first time and received that error message with the second profile I scraped. Removed the actual profile for obvious reasons.

Here's the total error message:

(node:1944) UnhandledPromiseRejectionWarning: ReferenceError: relatedProfiles is not defined
at crawl (C:\Program Files\nodejs\scrapedin-linkedin-crawler-master\src\crawler.js:31:55)
at runMicrotasks ()
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:1944) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:1944) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
scrapedin: 2019-10-23T18:01:11.285Z info: [profile] finished scraping url: https://www.linkedin.com/in/xxxxxxxxxxxxxxxxxx
scrapedin: 2019-10-23T18:01:11.287Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2019-10-23T18:01:11.290Z error: error on crawling profile: https://www.linkedin.com/in/xxxxxxxxxxxxxxx
Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
(node:1944) UnhandledPromiseRejectionWarning: ReferenceError: relatedProfiles is not defined
at crawl (C:\Program Files\nodejs\scrapedin-linkedin-crawler-master\src\crawler.js:31:55)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:1944) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)

Error: linkedin : profile not found

I upgraded to recruiter account and I input the profile link , but I got this error on crawling profile message . since the link under recruiter account search looks longer than the normal profile links ,

https://www.linkedin.com/recruiter/profile/509045058,VFKE,CAP?searchController=smartSearch&searchId=2047071505&total=215768&searchCacheKey=5c443134-3b67-41d1-a2f8-432927baafb0%2CNIlV&searchRequestId=88512f30-2445-45bc-b4d6-efd08af5f746%2CQBJL&searchSessionId=2047071505&origin=GHDS&memberAuth=509045058%2CVFKE%2CCAP

not sure someone else have this issue before ?

Profile not found

It's look like the css rules have changed, i tried to fix it but i haven't enough time.

ERROR:

scrapedin: 2019-05-29T13:03:43.554Z warn: [profile] profile selector was not found

(node:21488) UnhandledPromiseRejectionWarning: Error: linkedin: profile not found

Store cookies and local storage data to avoid repetitive login

To avoid repetitive login process, the script could store cookies [2] and local storage data [1] in temporary files after the first time login and use them in the subsequent logins if the credentials are still valid so that the users of the module can avoid repetitive login and reduce the chances get their account blocked.

If the credentials in the temporary files have expired the page would be redirected to the default login page the script can repeat the process again.

This also accelerates the crawling process because the login setup can be skipped.

References
[1] https://stackoverflow.com/questions/51789038/set-localstorage-items-before-page-loads-in-puppeteer
[2] https://pptr.dev/#?product=Puppeteer&version=v1.19.0&show=api-pagesetcookiecookies

Seemore Accomplishments

Hi @leonardiwagner, i'm still working on the accomplishments however the behaviour of the deployment of its subsections seems to be quite different from other sections. Actually that's what I'm able to obtain for the following profile linkedin.com/in/elinlehrmann:
{"contact":[{"type":"Elin’s Profile","values":["linkedin.com/in/elinlehrmann"]},{"type":"Website","values":["orcid.org/0000-0002-9869-9475 (Portfolio)"]}],
"profileAlternative":{"name":"Elin Lehrmann, Ph.D.","headline":"Scientist | Versatilist | Mentor | Life-long learner |","imageurl":"https://media.licdn.com/dms/image/C5603AQElAkv6ZVoRUg/profile-displayphoto-shrink_200_200/0?e=1578528000&v=beta&t=IjvhkUhwXMoYY3pmmPqTKV7JJ5IwqHNOLAijZn6N-ZI","location":"Baltimore, Maryland","connections":"406","summary":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."},
"aboutAlternative":{"text":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."},
"positions":[{"title":"Biologist","companyName":"National Institutes of Health (NIH): Intramural Research Program (IRP)","date1":"Oct 2008 – Present","date2":"11 yrs 2 mos"},{"title":"Staff Fellow","companyName":"National Institute on Drug Abuse (NIDA IRP), NIH","date1":"2000 – 2008","date2":"8 yrs"},{"title":"Postdoctoral Research Fellow","companyName":"Maryland Psychiatric Research Center (MRPC), University of Maryland School of Medicine","date1":"1996 – 2000","date2":"4 yrs"},{"title":"Ph.D. (Neuroscience)","companyName":"University of Southern Denmark, School of Medicine","date1":"1992 – 1996","date2":"4 yrs"},{"title":"Department of Anatomy (SOM), Aarhus University","location":"Aarhus, Denmark","date1":"1990 – 1992","date2":"2 yrs","roles":[{"title":"Research Assistant","date1":"1990 – 1992","date2":"2 yrs","location":"Aarhus, Denmark"},{"title":"Cand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs"}]},{"title":"Title\nResearch Assistant","location":"Aarhus, Denmark","date1":"1990 – 1992","date2":"2 yrs","roles":[{"title":"Research Assistant","date1":"1990 – 1992","date2":"2 yrs","location":"Aarhus, Denmark"}]},{"title":"Title\nCand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs","roles":[{"title":"Cand. Scient. (M.Sc.; Biology/Chemistry)","date1":"1984 – 1992","date2":"8 yrs"}]}],
"educations":[{"title":"University of Southern Denmark","degree":"Ph.D.","fieldofstudy":"Neuroscience","date1":"1992","date2":"1996"},{"title":"Aarhus University","degree":"Cand. Scient.","fieldofstudy":"Biology (MSc), Chemistry (BSc)","date1":"1984","date2":"1992"}],
"skills":[{"title":"Analysis","count":"1"},{"title":"Problem Solving","count":"1"},{"title":"Project Planning","count":"1"},{"title":"Project Management"},{"title":"Foreign Languages"},{"title":"Nonprofit Organizations"},{"title":"Lifesciences","count":"3"},{"title":"Neuroscience","count":"41"},{"title":"Molecular Biology","count":"34"},{"title":"Genetics","count":"15"},{"title":"Bioinformatics","count":"6"},{"title":"Strategic Planning"},{"title":"Immunohistochemistry","count":"22"},{"title":"qPCR","count":"23"},{"title":"PCR","count":"14"},{"title":"Research","count":"6"},{"title":"Genomics","count":"6"},{"title":"Biotechnology","count":"6"},{"title":"Clinical Research","count":"3"},{"title":"Writing"},{"title":"Life Sciences","count":"19"},{"title":"Data Analysis","count":"2"},{"title":"Social Media"},{"title":"Marketing"},{"title":"Spreadsheets"},{"title":"Event Planning"},{"title":"Event Management"},{"title":"PowerPoint"},{"title":"Microsoft Word"},{"title":"Microsoft Excel"},{"title":"Microsoft Office"},{"title":"Time Management"},{"title":"Team Leadership"},{"title":"Public Speaking"},{"title":"Leadership"},{"title":"Teamwork"},{"title":"Management"},{"title":"Personal Development"},{"title":"Communication"},{"title":"Professional Ethics"},{"title":"Cross-disciplinary collaboration"},{"title":"Leading Meetings"},{"title":"In Vivo","count":"19"},{"title":"Life Skills"},{"title":"Sequencing","count":"3"},{"title":"Translational Medicine","count":"1"},{"title":"Toxicology","count":"1"},{"title":"Peer Reviews"},{"title":"Peer Mentoring"},{"title":"Computational Biology","count":"1"}],
"recommendations":{"givenCount":"0","receivedCount":"0","given":[],"received":[]},
"accomplishments":[{"Publications":["Elin has 75 publications\n75\nPublications\npublication title\nLoss of miR-451a enhances SPARC production during myogenesis.\n\npublication date\nMar 29, 2019 \npublication description\nPLoS One. 2019 Mar 29;14(3):e0214301.\n\npublication description\nAbstract. MicroRNAs (miRNAs) are small noncoding RNAs that critically regulate gene expression. Their abundance and function have been linked to a range of physiologic and pathologic processes. In aged monkey muscle, miR-451a and miR-144-3p were far more abundant than in young monkey muscle. This observation led us to hypothesize that miR-451a and miR-144-3p may influence muscle homeostasis. To test if these conserved microRNAs were implicated in myogenesis, we investigated their function in the mouse myoblast line C2C12. The levels of both microRNAs declined with myogenesis; however, only overexpression of miR-451a, but not miR-144-3p, robustly impeded C2C12 differentiation, suggesting an inhibitory role for miR-451a in myogenesis. Further investigation of the regulatory influence of miR-451a identified as one of the major targets Sparc mRNA, which encodes a secreted protein acidic and rich in cysteine (SPARC) that functions in wound healing and cellular differentiation. In mouse myoblasts, miR-451a suppressed Sparc mRNA translation. Together, our findings indicate that miR-451a is downregulated in differentiated myoblasts and suggest that it decreases C2C12 differentiation at least in part by suppressing SPARC biosynthesis.\n\nLoss of miR-451a enhances SPARC production during myogenesis. Munk R, Martindale JL, Yang X, Yang JH, Grammatikakis I, Di Germanio C, Mitchell SJ, de Cabo R, Lehrmann E, Zhang Y, Becker KG, Raz V, Gorospe M, Abdelmohsen K, Panda AC. PLoS One. 2019 Mar 29;14(3):e0214301. doi: 10.1371/journal.pone.0214301. eCollection 2019.\nPMID: 30925184\n\nSee publication Loss of miR-451a enhances SPARC production during myogenesis.\nSee publication\npublication title\nMuscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism.\n\npublication date\nFeb 6, 2019 \npublication description\nFASEB J. 2019 Feb 6:fj201801145R. doi: 10.1096/fj.201801145R. [Epub ahead of print]\n\npublication description\nAbstract. Sarcopenic obesity, the combination of skeletal muscle mass and function loss with an increase in body fat, is associated with physical limitations, cardiovascular diseases, metabolic stress, and increased risk of mortality. Cannabinoid receptor type 1 (CB1R) plays a critical role in the regulation of whole-body energy metabolism because of its involvement in controlling appetite, fuel distribution, and utilization. Inhibition of CB1R improves insulin secretion and insulin sensitivity in pancreatic β-cells and hepatocytes. We have now developed a skeletal muscle–specific CB1R-knockout (Skm-CB1R−/−) mouse to study the specific role of CB1R in muscle. Muscle-CB1R ablation prevented diet-induced and age-induced insulin resistance by increasing IR signaling. Moreover, muscle-CB1R ablation enhanced AKT signaling, reducing myostatin expression and increasing IL-6 secretion. Subsequently, muscle-CB1R ablation increased myogenesis through its action on MAPK-mediated myogenic gene expression. Consequently, Skm-CB1R−/− mice had increased muscle mass and whole-body lean/fat ratio in obesity and aging. Muscle-CB1R ablation improved mitochondrial performance, leading to increased whole-body muscle energy expenditure and improved physical endurance, with no change in body weight. These results collectively show that CB1R in muscle is sufficient to regulate whole-body metabolism and physical performance and is a novel target for the treatment of sarcopenic obesity.\n\nGonzález-Mariscal I, Montoro RA, O'Connell JF, Kim Y, Gonzalez-Freire M, Liu QR, Alfaras I, Carlson OD, Lehrmann E, Zhang Y, Becker KG, Hardivillé S, Ghosh P, Egan JM. Muscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism. FASEB J. 2019 Feb 6:fj201801145R. doi: 10.1096/fj.201801145R. [Epub ahead of print]\nPMID: 30726112\n\nSee publication Muscle cannabinoid 1 receptor regulates Il-6 and myostatin expression, governing physical performance and whole-body metabolism.\nSee publication\npublication title\nTopoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila.\n\npublication date\nNov 23, 2018 \npublication description\nNat Commun .9(1): 4946.\n\npublication description\nLee SK, Xue Y, Shen W, Zhang Y, Joo Y, Ahmad M, Chinen M, Ding Y, Ku WL, De S, Lehrmann E, Becker KG, Lei EP, Zhao K, Zou S, Sharov A, Wang W. Topoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila. Nat Commun. 2018 Nov 23;9(1):4946. doi: 10.1038/s41467-018-07101-4.\nPMID: 30470739\nDOI: 10.1038/s41467-018-07101-4\n\nAbstract. Topoisomerases solve topological problems during DNA metabolism, but whether they participate in RNA metabolism remains unclear. Top3β represents a family of topoisomerases carrying activities for both DNA and RNA. Here we show that in Drosophila, Top3β interacts biochemically and genetically with the RNAi-induced silencing complex (RISC) containing AGO2, p68 RNA helicase, and FMRP. Top3β and RISC mutants are similarly defective in heterochromatin formation and transcriptional silencing by position-effect variegation assay. Moreover, both Top3β and AGO2 mutants exhibit reduced levels of heterochromatin protein HP1 in heterochromatin. Furthermore, expression of several genes and transposable elements in heterochromatin is increased in the Top3β mutant. Notably, Top3β mutants defective in either RNA binding or catalytic activity are deficient in promoting HP1 recruitment and silencing of transposable elements. Our data suggest that Top3β may act as an RNA topoisomerase in siRNA-guided heterochromatin formation and transcriptional silencing.\n\nSee publication Topoisomerase 3β interacts with RNAi machinery to promote heterochromatin formation and transcriptional silencing in Drosophila.\nSee publication\npublication title\nHydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\n\npublication date\nAug 29, 2018 \npublication description\nNeurobiology of Aging\n\npublication description\nRD Brose, E Lehrmann, Y Zhang, RH Reeves, KD Smith, MP Mattson (2018).\nHydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\nNeurobiology of Aging\nhttps://doi.org/10.1016/j.neurobiolaging.2018.08.021 [Epub ahead of print]\n\nSee publication Hydroxyurea attenuates oxidative, metabolic, and excitotoxic stress in rat hippocampal neurons and improves spatial memory in a mouse model of Alzheimer’s disease\nSee publication\npublication title\nMIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\n\npublication date\nAug 8, 2018 \npublication description\nNucleic Acids Res.\n\npublication description\nSun Q, Tripathi V, Yoon JH, Singh DK, Hao Q, Min KW, Davila S, Zealy RW, Li XL, Polycarpou-Schwarz M, Lehrmann E, Zhang Y, Becker KG, Freier SM, Zhu Y, Diederichs S, Prasanth SG, Lal A, Gorospe M, Prasanth KV. MIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\nNucleic Acids Res. 2018 Aug 8. doi: 10.1093/nar/gky696. [Epub ahead of print]\nPMID: 30102375\n\nSee publication MIR100 host gene-encoded lncRNAs regulate cell cycle by modulating the interaction between HuR and its target mRNAs.\nSee publication\nShow more"]},{},{}],

"peopleAlsoViewed":[{"user":"https://www.linkedin.com/in/kate-wilson-0a42407/","text":"Senior Director of Sustainability at Vail Resorts"},{"user":"https://www.linkedin.com/in/maire-doyle-b85a262b/","text":"Scientist (C) Principal Investigator, Role of insulin in taste transduction, cultivation of taste cell precursors"},{"user":"https://www.linkedin.com/in/marcellebergeron/","text":"Drug Development Professional"},{"user":"https://www.linkedin.com/in/saeed-azimi-a5bb1544/","text":"Immunotherapy"},{"user":"https://www.linkedin.com/in/hyun-k-1435627/","text":"Drug development"},{"user":"https://www.linkedin.com/in/jared-kartchner-b5229770/","text":"Attorney at Northern Virginia Estate Planning Services"},{"user":"https://www.linkedin.com/in/emily-hm-wong-919b7975/","text":"Scientist II, Computational Biology"},{"user":"https://www.linkedin.com/in/tomrohmann/","text":"Service Order Program Manager at J&J Worldwide Services"},{"user":"https://www.linkedin.com/in/lukecartin/","text":"Environmental Sustainability Manager at Park City Municipal Corporation"},{"user":"https://www.linkedin.com/in/martina-molsbergen-ba9b438/","text":"CEO at C14 Consulting Group, LLC"}],
"volunteerExperience":[{"title":"Organizing Committee member","experience":"Rhodesian Ridgeback World Congress 2016","description":"Animal Welfare","date1":"Sep 2014 – Aug 2016","date2":"2 yrs"},{"title":"Vice-President","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Jul 2015 – Present","date2":"4 yrs 5 mos"},{"title":"Treasurer","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Aug 2013 – Nov 2014","date2":"1 yr 4 mos"},{"title":"Member","experience":"Rhodesian Ridgeback World Congress Health Committee","description":"Animal Welfare","date1":"Jul 2016 – Present","date2":"3 yrs 5 mos"},{"title":"Regional News Column","experience":"Chesapeake Bay Area Rhodesian Ridgeback Club","description":"Animal Welfare","date1":"Aug 2017 – Present","date2":"2 yrs 4 mos"}],"profile":{"name":"Elin Lehrmann, Ph.D.","headline":"Scientist | Versatilist | Mentor | Life-long learner |","imageurl":"https://media.licdn.com/dms/image/C5603AQElAkv6ZVoRUg/profile-displayphoto-shrink_200_200/0?e=1578528000&v=beta&t=IjvhkUhwXMoYY3pmmPqTKV7JJ5IwqHNOLAijZn6N-ZI","location":"Baltimore, Maryland","connections":"406","summary":"Ph.D.-level biomedical scientist skilled in designing and implementing effective cross-disciplinary collaborations with individuals and teams from diverse backgrounds and organizational levels that translate novel information into validated data and peer-reviewed knowledge using complex analytical and problem-solving abilities to optimize workflow, information quality, and project delivery."}}

As you can see, I only scrap 5 of the 75 publications display on that profile, I cannot obtain the link to the publications, and the languages part is not scraped.... I think this issue is related to two others issue, the number #1 and the number #36 . I have try to make some modifications to the seemore functions without any results.... I join the modified seemore file, I think making the code a bit more sequential could help us to tackle that issue but I don't know how to do that.
seemore.txt

Thanks man!

Login issue on server

image
Getting login issue on server but if I run this project on local then its working fine

Positions: roles are not gathered

  • Linkedin profiles may have more than one role in each position
  • Currently, it's only displaying the description, but it may have a description or (and?) roles

Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue:

at page.waitFor.then.catch (/home/ubuntu/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:109:7)
(node:8341) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 2): Error: Protocol error (Runtime.callFunctionOn): Target closed.
(node:8341) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I verified the password manually , it was no problem , what is the reason ? thanks

profile div changed

again! we need to find a more generic way to ensure that's a profile page

Not able to deploy it in remote servers

Attempt to run it in a remote server like heroku, encountered this error:

(node:9167) UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working man
ually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:9167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async
function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:9167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not
handled will terminate the Node.js process with a non-zero exit code.
(node:9167) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.
at Promise (/var/app/current/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise ()
at CDPSession.send (/var/app/current/node_modules/puppeteer/lib/Connection.js:182:12)
at ExecutionContext.evaluateHandle (/var/app/current/node_modules/puppeteer/lib/ExecutionContext.js:106:44)
at ExecutionContext. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at ElementHandle.$ (/var/app/current/node_modules/puppeteer/lib/JSHandle.js:378:50)
at ElementHandle. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at DOMWorld.$ (/var/app/current/node_modules/puppeteer/lib/DOMWorld.js:114:34)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/var/app/current/node_modules/puppeteer/lib/helper.js:108:27)
at Page.$ (/var/app/current/node_modules/puppeteer/lib/Page.js:300:29)
at Page. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:60:16)
at process._tickCallback (internal/process/next_tick.js:68:7)

It is not possible to do manual verification as it is not possible to open a browser in such server and provide the verification. Is there any workaround to this issue?

Deploy in heroku - chrome error

First, Thank you for the time and the work you have used to develop that solution

I added code requested at "https://github.com/jontewks/puppeteer-heroku-buildpack" to use puppeteer

const args = proxyAddress && [--proxy-server = $ {proxyAddress}, '--no-sandbox', '--disable-setuid-sandbox']

But I get this error:
28 Feb 2019 05: 31: 29.549168 <190> 1 2019-02-28T10: 31: 28.908987 + 00: 00 app web.1 - - (node: 4) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
Heroku / Syslog drain
28 Feb 2019 05: 31: 29.549447 <190> 1 2019-02-28T10: 31: 28.909012 + 00: 00 app web.1 - - [0228 / 103126.902390: FATAL: zygote_host_impl_linux.cc (116)] Not usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
Heroku / Syslog drain

Do you know any solution to this problem with the google kernel? I appreciate any help.

Regards

libX11-xcb.so.1: cannot open shared object file: No such file or directory

error while loading shared libraries: libX11-xcb.so.1: cannot open shared object file: No such file or directory

TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

at onClose (/home/scrapedin-linkedin-crawler/node_modules/puppeteer/lib/Launcher.js:348:14)
at Interface.helper.addEventListener (/home/scrapedin-linkedin-crawler/node_modules/puppeteer/lib/Launcher.js:337:50)
at Interface.emit (events.js:203:15)
at Interface.close (readline.js:397:8)
at Socket.onend (readline.js:173:10)
at Socket.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1143:12)
at process._tickCallback (internal/process/next_tick.js:63:19)

(node:2999) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:2999) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Company Name and Location are not being collected due to changes in css

Attempted scrap of my own profile and see that both Company name and Location are not being collected due to changes in the .css of the Linkedin website.

Modifications to the "profilescrapertemplate.js" file in two places correct the issue. I have the fix in place if you would like me to submit a pull request.

otherwise the two lines should be:

 companyName: 'p.pv-entity__secondary-title',
  location: 'h4.pv-entity__location span:nth-child(2)',

linkedin: profile not found when using cookies

When I use the scrapper with login/password options it works fine.
But when I use my cookie, I get an error of type 'linkedin: profile not found'

const fs = require('fs');
const scrapedin = require("scrapedin");

// const options = { email: "[email protected]", password: "abcdefgh" };
const cookies = fs.readFileSync('./cookie.txt');
const options = {
  cookies: JSON.parse(cookies)
}

console.log(options);

const url = 'https://www.linkedin.com/in/wassimazirar/';
scrapedin(options)
    .then(profileScraper => profileScraper(url, 2000))
    .then(profile => {
        console.log(profile);
    });
(node:13224) UnhandledPromiseRejectionWarning: Error: linkedin: profile not found
    at ..\node_modules\scrapedin\src\profile.js:19:13
    at async module.exports (..\node_modules\scrapedin\src\profile.js:16:3)
(node:13224) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:13224) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Not gathering + 10 items in some sections

When the user has a lot of items on sections as positions (experiences) or recommendations, the "show more" button only displays plus 5 positions. If the user has more than 5, another "show more X" positions buttons are displayed, so we need to click in that button again before gathering the positions.

I'm labeling this issue as "low" since it's being gathered already the last 10 items.

waitTimeMs not respected

Seems like no matter what I set waitTimeMs to it is ignored and the recaptcha is closed before I can finish it — it shows me more than three recaptchas sometimes.

unable to select nested positions (multiple positions under the same company)

Hi - thank you for putting this together, I'm excited to use it!

The issue I am having at the moment is that when I am scraping my personal profile and since I have multiple positions at the same company, it's not giving a clean data structure for that company. What is returning is one value with the company name and the most recent description, and then there are 2 separate values for the individual positions, that don't have the company name, and the Title has "Title" preceding the name of the position.

My profile is https://www.linkedin.com/in/alexandercaulfield/

Response

[ { title: 'Wayfair',
[0]     location: 'Boston, MA',
[0]     description:
[0]      '- Mentor for 25 entry level engineers on a month long project, holding office hour sessions and giving code reviews (for PHP & React)',
[0]     date1: 'Jul 2019 – Present',
[0]     date2: '5 mos',
[0]     roles: [ [Object], [Object] ] },
[0]   { title: 'Title\nSoftware Engineer II',
[0]     location: 'Boston, MA',
[0]     description:
[0]      '- Mentor for 25 entry level engineers on a month long project, holding office hour sessions and giving code reviews (for PHP & React)',
[0]     date1: 'Jul 2019 – Present',
[0]     date2: '5 mos',
[0]     roles: [ [Object] ] },
[0]   { title: 'Title\nSoftware Engineer I',
[0]     location: 'Boston, MA',
[0]     description:
[0]      '- Full stack development for Wayfair\'s customer-facing storefront (Shop the Look) in React, PHP, SQL and GraphQL\n- Developed the backend for a large A/B test that resulted in reduced exit rate on the page by 3% and an improved bounce rate by -4.5% by leveraging recommended content to users based on their likes and dislikes\n- Tech lead the implementation of a feature on the new page redesign. This included scoping out tickets, gathering requirements with product, constructing the frontend architecture, executing on creating the UI and delegating tasks to other engineers when appropriate\n- Proactive about sharing new ideas for Shop the Look\'s product and supported ideas with data and customer insights\n...\nsee more',
[0]     date1: 'Sep 2018 – Jul 2019',
[0]     date2: '11 mos',
[0]     roles: [ [Object] ] },

Let me know if I can do anything to help!

proxy problem

when I use "args":["--no-sandbox","--proxy-server=http://server:port"]}, can not login linkedin.
always prompt "manual check was required".

just want to know, is there anybody scrape profile with proxy successfuly?

Update docs

after v1 milestone, we need to update README docs

(node:7117) UnhandledPromiseRejectionWarning: Error: Page crashed!

(node:7117) UnhandledPromiseRejectionWarning: Error: Page crashed!
at Page._onTargetCrashed (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Page.js:215:24)
at CDPSession.Page.client.on.event (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Page.js:123:56)
at CDPSession.emit (events.js:198:13)
at CDPSession._onMessage (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Connection.js:200:12)
at Connection._onMessage (/home/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/Connection.js:112:17)
at WebSocketTransport._ws.addEventListener.event (/home/tony/scrapedin-linkedin-crawler/node_modules/scrapedin/node_modules/puppeteer/lib/WebSocketTransport.js:44:24)
at WebSocket.onMessage (/home//scrapedin-linkedin-crawler/node_modules/ws/lib/event-target.js:120:16)
at WebSocket.emit (events.js:198:13)
at Receiver.receiverOnMessage (/home/tony/scrapedin-linkedin-crawler/node_modules/ws/lib/websocket.js:789:20)
at Receiver.emit (events.js:198:13)
(node:7117) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)

Accomplishments only collect basic data

  • It's only getting the basic data before clicking on 'show' more
  • Is also known that the accomplishment items can be lower than actual items if there are too many items LinkedIn doesn't show all before clicking on 'show more'

manual check was required

Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue

More than just profile information

Hello there 👋
The library works like a charm to retrieve information about the profile general info, thanks!!
I was wondering if I could also use it to get the latest posts of a person that I follow or that has a public profile.
Or maybe, if it's easier, the posts related to a certain hashtag, pointing this kind of URL: https://www.linkedin.com/feed/hashtag/crypto/

Thank you 😊

Falsy Company Name in positions coming as title

When a LinkedIn user has multiple positions in same company, the company name scraped is correct but when the user just has one position in a company, the company name which gets scraped is the position of the user in the company

More detailled accomplishments

Hello there 👋
The library works pretty well to retrieve information about the profile general info, thanks!!
I was wondering if I could also use it to get more detailed picture of accomplishments, i.e Publications, Patents, Language, etc...
Or maybe, if it's easier to visualize the suggestion you can have a look at this profile which contained many Publications https://www.linkedin.com/in/elinlehrmann/. In that case it's will be useful for some applications to have the 75 publications descriptions.

Thank you 😊

await is only valid in async function

Put the code in the readme in a scrap.js file and then run node scrap.js but been getting the error

const profileScraper = await scrapedin({ email: '[email protected]', password: 'pass' })
                       ^^^^^

SyntaxError: await is only valid in async function

How can I bypass this?

lt-line-clamp__more only click for the first match

correct me if I am wrong

const clickAll = async(page) => {
for(let i = 0; i < seeMoreButtons.length; i ++){
const button = seeMoreButtons[i]

if (button.selector) {
  const elem = await page.$(button.selector)
  if (elem) {

if a selector matches more than one result, then we need to loop to click.
I observed, for example, only one see_more is being clicked and the rest is ignored.

Issue in live server

Hi Team,

This is working fine in local server. but same code is not working in live server. So pls check and help me for solve this problem.

Error: Protocol error (Runtime.callFunctionOn): Target closed.

Manual check not working

the code for my profileScraper is below, it opens the Chromium window but closes before I can finish the recaptcha. if you know what's going on and if you could fix it that would be amazing.

const profileScraper = await scrapedin({
  email: LINKEDIN_EMAIL, password: LINKEDIN_PASSWORD, isHeadless: false, hasToLog: true
})
  .catch(err => {
    console.error(err);
  });

this is what it logs and the error message

scrapedin: 2019-10-01T13:56:57.535Z info: [login] logging at: https://www.linkedin.com/login
scrapedin: 2019-10-01T13:57:13.591Z warn: [login] successful login element was not found
scrapedin: 2019-10-01T13:57:13.594Z warn: [login] manual check was required
Error: linkedin: manual check was required, verify if your login is properly working manually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
    at page.waitFor.then.catch (/Users/nickedwards/RupieNetwork/node_modules/scrapedin/src/login.js:62:31)
    at process._tickCallback (internal/process/next_tick.js:68:7)

there is also another message
Unhandled Rejection at: Error: Protocol error (Runtime.callFunctionOn): Target closed.

New template to integrate

@leonardiwagner this template works wells, I have try it but I have some trouble when it comes to deploy the sections entirely. Maybe you could had a look and tell me where I should put efforts now.
If someone had an idea to extract link to co-author or co-inventors it will be very useful also to integrate this.
Thanks man
Template_function.txt

Format dates

Currently dates are raw as displayed on website

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.