Giter Club home page Giter Club logo

scholarly-python-package / scholarly Goto Github PK

View Code? Open in Web Editor NEW
1.3K 27.0 292.0 6.58 MB

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!

Home Page: https://scholarly.readthedocs.io/

License: The Unlicense

Python 99.76% Shell 0.24%
scholar googlescholar scholarly-articles scholarly-communications python3 python-3 python citation-network citation-analysis citations

scholarly's Introduction

Python package codecov Documentation Status DOI

scholarly

scholarly is a module that allows you to retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to solve CAPTCHAs.

Installation

Anaconda-Server Badge PyPI version

scholarly can be installed either with conda or with pip. To install using conda, simply run

conda install -c conda-forge scholarly

Alternatively, use pip to install the latest release from pypi:

pip3 install scholarly

or pip to install from github:

pip3 install -U git+https://github.com/scholarly-python-package/scholarly.git

We are constantly developing new features. Please update your local package regularly. scholarly follows Semantic Versioning. This means your code that uses an earlier version of scholarly is guaranteed to work with newer versions.

Optional dependencies

  • Tor:

    scholarly comes with a handful of APIs to set up proxies to circumvent anti-bot measures. Tor methods are deprecated since v1.5 and are not actively tested or supported. If you wish to use Tor, install scholarly using the tor tag as

    pip3 install scholarly[tor]

    If you use zsh (which is now the default in latest macOS), you should type this as

    pip3 install scholarly'[tor]'

    Note: Tor option is unavailable with conda installation.

Tests

To check if your installation is succesful, run the tests by executing the test_module.py file as:

python3 test_module

or

python3 -m unittest -v test_module.py

Documentation

Check the documentation for a complete API reference and a quickstart guide.

Examples

from scholarly import scholarly

# Retrieve the author's data, fill-in, and print
# Get an iterator for the author results
search_query = scholarly.search_author('Steven A Cholewiak')
# Retrieve the first result from the iterator
first_author_result = next(search_query)
scholarly.pprint(first_author_result)

# Retrieve all the details for the author
author = scholarly.fill(first_author_result )
scholarly.pprint(author)

# Take a closer look at the first publication
first_publication = author['publications'][0]
first_publication_filled = scholarly.fill(first_publication)
scholarly.pprint(first_publication_filled)

# Print the titles of the author's publications
publication_titles = [pub['bib']['title'] for pub in author['publications']]
print(publication_titles)

# Which papers cited that publication?
citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication_filled)]
print(citations)

IMPORTANT: Making certain types of queries, such as scholarly.citedby or scholarly.search_pubs, will lead to Google Scholar blocking your requests and may eventually block your IP address. You must use proxy services to avoid this situation. See the "Using proxies" section in the documentation for more details. Here's a short example:

from scholarly import ProxyGenerator

# Set up a ProxyGenerator object to use free proxies
# This needs to be done only once per session
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

# Now search Google Scholar from behind a proxy
search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query))

scholarly also has APIs that work with several premium (paid) proxy services. scholarly is smart enough to know which queries need proxies and which do not. It is therefore recommended to always set up a proxy in the beginning of your application.

Disclaimer

The developers use ScraperAPI to run the tests in Github Actions. The developers of scholarly are not affiliated with any of the proxy services and do not profit from them. If your favorite service is not supported, please submit an issue or even better, follow it up with a pull request.

Contributing

We welcome contributions from you. Please create an issue, fork this repository and submit a pull request. Read the contributing document for more information.

Acknowledging scholarly

If you have used this codebase in a scientific publication, please cite this software as following:

@software{cholewiak2021scholarly,
  author  = {Cholewiak, Steven A. and Ipeirotis, Panos and Silva, Victor and Kannawadi, Arun},
  title   = {{SCHOLARLY: Simple access to Google Scholar authors and citation using Python}},
  year    = {2021},
  doi     = {10.5281/zenodo.5764801},
  license = {Unlicense},
  url = {https://github.com/scholarly-python-package/scholarly},
  version = {1.5.1}
}

License

The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with this mentality, all code is released under the Unlicense.

scholarly's People

Contributors

1ucian0 avatar abspoel avatar arunkannawadi avatar bielsnohr avatar bryant1410 avatar cako avatar firefly-cpp avatar franciscoknebel avatar guicho271828 avatar ipeirotis avatar jjshoots avatar jonasengelmann avatar klipitkaspro avatar louiskirsch avatar ltalirz avatar marcoscarpetta avatar mmontevil avatar nikitabalabin avatar organicirradiation avatar papr avatar percolator avatar programize-admin avatar remram44 avatar rnatella avatar silvavn avatar spferical avatar stefanct avatar tallalnparis4ev avatar tombrien avatar waynehuu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scholarly's Issues

Simple example usage fails on fresh install of Python 3.7.2 (AttributeError: 'NoneType' object has no attribute 'text')

This is running on Windows with Python 3 and a fresh install of scholarly from GitHub.

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scholarly
>>> print(next(scholarly.search_author('Steven A. Cholewiak')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\hackr\Python64\Python372\lib\site-packages\scholarly\scholarly.py", line 110, in _search_citation_soup
    yield Author(row)
  File "C:\Users\hackr\Python64\Python372\lib\site-packages\scholarly\scholarly.py", line 226, in __init__
    self.name = __data.find('h3', class_='gsc_oai_name').text
AttributeError: 'NoneType' object has no attribute 'text'

search_pubs_query() did not work ( Was I banned by the google scholar?)

Background

It is tricky to use this scholarly in China(mainland). So I use a VPS in US (ubuntu18.04 py3.6) to make it work.
I try to put the name of my teacher of the institute in the query, and collect the result to make it into a .json file.
Everything was fine when I succeed to read my first json. So I make a iteration to put all the teachers of the institute in the query and collect the result. But I find the following result is 0.(it likes [])
I wondered why and test it through terminal. I find that it will return like

Traceback() : File "<stdin>", line 1, in <module>
StopIteration

Doubt

The other functions work properly. I try the example after this situation. I doubted that I test the 'search_pubs_query()' too many times to find how to extract the data with the generator. Google banned the ip of my vps because I required too many results.
( I set a limit in my code that it will just get 100 results at most. The number of the teachers in my institute is 75. I find a empty result at the second teacher)
I will test the function again tomorrow, maybe Google set a limit for one day?

Any suggestion for error 404?

Hi! Thanks for your great work!

It seems that Goo block me after collecting 100 papers (2 keywords * 50 papers).
Any advises for handling this? Should I add more request headers? Thanks!

Cannot show citeBy with the tutorial example

Cannot show citeBy with the tutorial example:

search_query = scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects')
print(next(search_query))

output:
{'_filled': False,
'bib': {'abstract': 'Humans can judge from vision alone whether an object is '
'physically stable or not. Such judgments allow observers '
'to predict the physical behavior of objects, and hence '
'to guide their motor actions. We investigated the visual '
'estimation of physical stability of 3-D objects (shown '
'in stereoscopically viewed rendered scenes) and how it '
'relates to visual estimates of their center of mass '
'(COM). In Experiment 1, observers viewed an object near '
'the edge of a table and adjusted its tilt to the '
'perceived critical angle, ie, the tilt angle at which '
'the object …',
'author': 'SA Cholewiak and RW Fleming and M Singh',
'eprint': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
'title': 'Perception of physical stability and center of mass of 3-D '
'objects',
'url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254'},
'source': 'scholar'}

fill() is returning TypeError!

Hi there,
I've been trying to follow the search_author example as shown.

results = scholarly.search_author('Steven A Cholewiak')
author = next(results).fill()

it keeps throwing:
TypeError: 'unicode' object is not callable

any help would be really appreciated
Thanks

any result for repetitive searches

hello,
I have a script to search a list of keywords using scholarly.
But it will not return anything after about 50 searches.
Any solution?

import scholarly
import json
import numpy as np
import time


with open("list2.txt") as fp:
	line = fp.readline()
	cnt = 1
	delays = [7, 4, 6, 2, 10, 19]
	while line:
		delay = np.random.choice(delays)
		print("delay is {}".format(delay))
		time.sleep(delay)
		search_query = scholarly.search_pubs_query(line)
		search_query = list(search_query)
		res = {}
		res['articles'] = []
		for s in search_query:
			article = s.__str__()
			# print(article)
			# print("+++++++++++++++++")
			res['articles'].append(article)
		print(len(search_query))
		if len(search_query) > 0:
			filepath = line.strip() + ".json"
			with open(filepath, 'w') as f:
				jsonfile = json.dumps(res)
				json.dump(jsonfile, f)
				f.close()
		cnt += 1
		line = fp.readline()

How to get list of co-authors?

Thank you for the nice module.

I would like to get a list of co-authors of an author as follows.

author = next(scholarly.search_author('Steven A. Cholewiak')).fill();
for ca in author.coauthors:
  print ca;

However, the profile object does not have the list but only the one of publications.
How can I get the list of co-authors?

Store the retrieved data from print(next(search_query)l

Edited: Sorry if this is a naive question.

The module runs amazingly well. It would be of great help if I could store the results from query into a variable.

print(next(search_query)

Seeing around, I realized that the result regards a "not JSON serializable" object.

Any help for that?

Connection over a proxy server

Dear all,

The module doesn't support connection over a proxy server. It would be very useful if the module includes support for proxy-based internet connection.

scholarly has no attribute use_proxy()

I initially installed the PyPI distribution, then realized that use_proxy() wasn't included in the PyPI distribution but only in the GitHub repo. So I uninstalled the PyPI distribution with

pip uninstall scholarly

and reinstalled from GitHub with

pip install git+https://github.com/OrganicIrradiation/scholarly.git

I've looked at the scholarly.py file in my library now and confirmed that it includes the use_proxy() function.

However, when I try to call scholarly.use_proxy(), I get the following error:

AttributeError: module 'scholarly' has no attribute 'use_proxy'

I'm not sure what could be causing this. Any suggestions?

A more complete abstract?

Hello,
Thanks for this awesome piece of work! I am wondering if there is any suggestion for how to retrieve the entire abstract? RIght now I see a lot of broken sentences. Assume that I have proper certificate to retrieve the document following the url. Thanks!

Exception: Error: 503 Service Unavailable while searching authors

This strange kind of problem is happening while I am searching for authors.
I am using
search = next(scholarly.search_author())
to get author's info.? But for some authors, it is returning search but for others this
Traceback (most recent call last): File "/home/rajeev/Desktop/work/Desktop/Rajeev/codestogeneratefeatures/write_auth_affl_file.py", line 42, in <module> print hindex('Zbigniew S Szewczak') File "/home/rajeev/Desktop/work/Desktop/Rajeev/codestogeneratefeatures/affl.py", line 29, in hindex search = next(scholarly.search_author(nameaffiliation)) File "/home/rajeev/.local/lib/python2.7/site-packages/scholarly.py", line 296, in search_author soup = _get_soup(_HOST+url) File "/home/rajeev/.local/lib/python2.7/site-packages/scholarly.py", line 92, in _get_soup html = _get_page(pagerequest) File "/home/rajeev/.local/lib/python2.7/site-packages/scholarly.py", line 79, in _get_page raise Exception('Error: {0} {1}'.format(resp.status_code, resp.reason)) Exception: Error: 503 Service Unavailable
error is coming.

Retrieving "Related articles" list

How can we retrieve the "Related articles" list of any given paper using scholarly?

Is it possible to get related article list (provided by scholar google) for any article? I want to retrieve related articles and their order in which they are displayed.

Tests failing due to search queries changing over time

These tests look for equality in the length of a google scholar query, and currently fail because the length of the query is now different.

  • test_empty_keyword
  • test_multiple_publications
  • test_multiple_authors

Perhaps these tests should be framed as greater than, rather than strictly equal. For example:

 self.assertEqual(len(authors), 54)

could be

 self.assertTrue(len(authors) >= 10)

But maybe someone else has a better idea on how to do this.

search_author_custom_url function throws an error.

I am trying to run search author by url and get author profile details using fill but next method ins't working.

When run search_author_custom_url it returns a generator object but
author = search_author_custom_url('some author url here')
next(search_author_custom_url)
throws an error what could be the possible cause

Here's the code I am trying to run

author = scholarly.search_author_custom_url('/citations?user=Uyz-xYwAAAAJ&hl=en&oi=sra')
author1 = next(author)
Traceback (most recent call last):
File "", line 1, in
StopIteration

scholarly.search_pubs_query() not working

When I do

print(next(scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects')))

in an iPython kernel, I get a StopIteration error.

Scholarly.py stopped working in .get_citedby() function

Hi,
("'NoneType' object has no attribute 'find'",) generated when trying to get cited_by information.
function _search_scholar_soup(soup) ,
line : yield Publication(row, 'scholarly')
I think google scholar format changed or something, I have tried tracking issue upto the line
databox = __data.find('div', class_='gs_ri') in Publication class. However, I am not that familiar with beautiful soup, please help.
Many thanks

Publications search error in Jupyter

Hello,

The following commands are successful in Jupyter:

!pip install scholarly
import scholarly
print(next(scholarly.search_author('Mircea Trifan')))

However, the article query by title fails as below:

pub = next(scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects'),None)
print(pub)

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-13-1bb21bdece30> in <module>()
----> 1 pub = next(scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects'),None)
      2 print(pub)

/opt/conda/lib/python3.6/site-packages/scholarly.py in search_pubs_query(query)
    278     """Search by scholar query and return a generator of Publication objects"""
    279     url = _PUBSEARCH.format(requests.utils.quote(query))
--> 280     soup = _get_soup(_HOST+url)
    281     return _search_scholar_soup(soup)
    282 

/opt/conda/lib/python3.6/site-packages/scholarly.py in _get_soup(pagerequest)
     89 def _get_soup(pagerequest):
     90     """Return the BeautifulSoup for a page on scholar.google.com"""
---> 91     html = _get_page(pagerequest)
     92     return BeautifulSoup(html, 'html.parser')
     93 

/opt/conda/lib/python3.6/site-packages/scholarly.py in _get_page(pagerequest)
     76     if resp.status_code == 503:
     77         # Inelegant way of dealing with the G captcha
---> 78         raise Exception('Error: {0} {1}'.format(resp.status_code, resp.reason))
     79         # TODO: Need to fix captcha handling
     80         # dest_url = requests.utils.quote(_SCHOLARHOST+pagerequest)

Exception: Error: 503 Service Unavailable

It appears that _get_soup with https://scholar.google.ca/scholar?q=... does not work anymore:

soup = scholarly._get_soup( 'https://scholar.google.ca/scholar?q=A%20Graph%20Digital%20Signal%20Processing%20Method%20for%20Semantic%20Analysis')
#soup = scholarly._get_soup( 'https://scholar.google.ca/scholar?q=A+Graph+Digital+Signal+Processing+Method+for+Semantic+Analysis')

The author query is OK:

soup = scholarly._get_soup( 'https://scholar.google.ca/citations?view_op=search_authors&hl=en&mauthors=Mircea+Trifan')
print( soup )

Thanks,
Mircea

Get one paper bibliography

Hi,

Thanks for the nice tool! :)

I was wondering whether it's possible to retrieve all the papers cited (bibliography) in one paper.

Cheers,
Mathieu

search_author didn't work

Traceback (most recent call last):
File "~project/test.py", line 4, in
print(next(search_query))
File "~project\venv\lib\site-packages\scholarly.py", line 99, in _search_citation_soup
yield Author(tablerow)
File "~project\venv\lib\site-packages\scholarly.py", line 209, in init
self.name = _data.find('h3', class='gsc_1usr_name').text
AttributeError: 'NoneType' object has no attribute 'text'

Got this error when I try to search author, but search publication still work. Please fix this issue, thankyou

publication dates parsing

arrow parser fails when publication date is e.g. 2013/7. need to explicitly add YYYY/M:

line 201 of scholarly.py should be:

self.bib['year'] = arrow.get(val.text, ['YYYY/M','YYYY/MM/DD', 'YYYY', 'YYYY/M/DD', 'YYYY/M/D', 'YYYY/MM/D']).year

module 'scholarly' has no attribute 'use_proxy'

First of all, thank you for your amazing work! I am interested in running scholarly through a proxy but when I install it I always obtain the same error: module 'scholarly' has no attribute 'use_proxy'. I have tried everything: installing through pip, cloning the repository, etc. I would appreciate any help.

Error: 403 Forbidden Error

Hello,

I am trying to use scholarly, but I got 403 error

The code I run is below 2 lines:
search_query = scholarly.search_author('Marty Banks, Berkeley')
print(next(search_query))

The error I got is:

----> 1 search_query = scholarly.search_author('Marty Banks, Berkeley')
2 print(next(search_query))

~/.local/lib/python3.6/site-packages/scholarly/scholarly.py in search_author(name)
309 """Search by author name and return a generator of Author objects"""
310 url = _AUTHSEARCH.format(requests.utils.quote(name))
--> 311 soup = _get_soup(_HOST+url)
312 return _search_citation_soup(soup)
313

~/.local/lib/python3.6/site-packages/scholarly/scholarly.py in _get_soup(pagerequest)
87 def _get_soup(pagerequest):
88 """Return the BeautifulSoup for a page on scholar.google.com"""
---> 89 html = _get_page(pagerequest)
90 html = html.replace(u'\xa0', u' ')
91 return BeautifulSoup(html, 'html.parser')

~/.local/lib/python3.6/site-packages/scholarly/scholarly.py in _get_page(pagerequest)
82 # return _get_page(re.findall(r'https://(?:.?)(/.)', resp)[0])
83 else:
---> 84 raise Exception('Error: {0} {1}'.format(resp.status_code, resp.reason))
85
86

Exception: Error: 403 Forbidden

I've been searching online on this for hours. Could someone kindly point me to the right direction?

Thank you a lot in advance!

Koala

Any strategies to speed up search time?

Sorry this may not be the best place to post this but it takes a good 10-15 seconds to retrieve results when calling search_pubs_query(). Is there any way to speed this up? (e.g. specify to only retrieve the first n results or something to that effect)

edit: it seems to also take a very long time to simply go next(generator_object_of_pubs).fill()

Improvements to test suite

It has been a perennial problem for scholarly that the data used for tests is constantly changing, meaning tests fail not because there is an error in the code, but because the static test numbers are out of date with what Google Scholar returns. The current solution is #82 (which copies from #60).

Longer term, it would be nice to have a static set of pages that return constant data for each test, as mentioned in #77 (comment) .

Of course, this has its own downside in that the static page will also become out of date with Google Scholar. A hybrid approach will likely be best. Thoughts welcome.

Retrieving all results publication information

Guys, I've trying to return the results of the query to a CSV file.

I can do that for authors' names and profiles but not for bibliographic information. Does anyone have an idea to deal with that?

Thanks!

Did google scholar changes its format?

I have been running my script periodically, it throws the following error today, looks like the page returned from Google Scholar might have a different format now? Thanks!

Traceback (most recent call last):
File "./crawl.py", line 27, in
for x in rslt:
File "/usr/local/lib/python3.6/site-packages/scholarly.py", line 100, in search_scholar_soup
yield Publication(row, 'scholar')
File "/usr/local/lib/python3.6/site-packages/scholarly.py", line 138, in init
title = databox.find('h3', class
='gs_rt')
AttributeError: 'NoneType' object has no attribute 'find'

empty get_citedby()

The method pub.get_citedby() returns an empty set even for publications with more than zero citations. I suspect that the attribute id_scholarcitedby is not set by fill() as expected.

Showing wrong publication year

I tried using scholarly, it is greatly helping to extract the publications information from google scholar. However, I am facing a problem while showing year of publication. It is showing 1970 as the year of publication to many papers. Consider the publication title "Trust and reputation aware geographic routing method for wireless ad hoc networks", it was published in 2018, but the scholarly showing it as 1970. Kindly resolve this issue. I have used the following code to get the publication year

author = scholarly.Author('scholar_id').fill()
for pub in author.publications:
pub_data = pub.fill()
if 'year' in pub_data:
print(pub_data.bib['title'])
print(pub_data.bib['year'])

Backlog of pull requests

I was about to make a pull request from my fork https://github.com/bielsnohr/scholarly, but then I checked the pending pull requests on this repository and realised that there is a large backlog. A good number are close to being 1 year old, and there hasn't been any activity from the owner for nearly the same amount of time.

The number of pull requests with useful features suggests that there is quite an active community that could support and improve this package a great deal, but it seems to me everything is being held back by an absent maintainer. Not an accusation or criticism, just the reality of the situation. Life happens.

I'm wondering what the best way to rectify this would be. Create a new fork hosted by one of the more active contributors and then transfer PyPI to point at that one? Or is it possible to get more contributors with write access on this repository?

Adding number of citations from web of science

It would be a great addition to also query the number of citations from the web of science core collection, which I am sure a lot of people would appreciate as it shows the number of citations in the peer-reviewed literature only.

The number of web of science citations are automatically a part of the google scholar search result if one is using an institutional Wifi connection (so no need for passwords) and if it is included in the core collection.

I am asking for this tool, as I am unable the use the web of science API, as I, and I think most people, have only institutional access to the web of science. Unfortunately, I am too much of a novice to web data mining to add this feature myself. So, I would be very grateful!

No "citedby" key

I wanted to retrieve citations of an article. But bib dictionary has no "citedby" key:

import scholarly
search_query = scholarly.search_pubs_query("Perception of physical stability and center of mass of 3D objects")
p = next(search_query)
print(p)

it shows this:

{'_filled': False,
'bib': {'abstract': 'Humans can judge from vision alone whether an object is '
'physically stable or not. Such judgments allow observers '
'to predict the physical behavior of objects, and hence '
'to guide their motor actions. We investigated the visual '
'estimation of physical stability of 3-D objects (shown '
'in stereoscopically viewed rendered scenes) and how it '
'relates to visual estimates of their center of mass '
'(COM). In Experiment 1, observers viewed an object near '
'the edge of a table and adjusted its tilt to the '
'perceived critical angle, ie, the tilt angle at which '
'the object …',
'author': 'SA Cholewiak and RW Fleming and M Singh',
'eprint': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
'title': 'Perception of physical stability and center of mass of 3-D '
'objects',
'url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254'},
'source': 'scholar'}

I found that _KEYWORDSEARCH should be as follows to solve the problem:
_KEYWORDSEARCH = '/citations?view_op=search_authors&hl=en&mauthors=label:{0}'
But in my version of scholarly.py it is already _KEYWORDSEARCH = '/citations?view_op=search_authors&hl=en&mauthors=label:{0}'.

Does anybody know how to get citations of an article using scholarly.py?

Rate limit

Is there any rate limit? Is it possible to hit a cptcha or other limitation?

AttributeError: 'NoneType' object has no attribute 'text'

Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scholarly
>>> print(next(scholarly.search_author('Steven A. Cholewiak')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/scholarly/scholarly.py", line 110, in _search_citation_soup
    yield Author(row)
  File "/usr/lib/python3.7/site-packages/scholarly/scholarly.py", line 226, in __init__
    self.name = __data.find('h3', class_='gsc_oai_name').text
AttributeError: 'NoneType' object has no attribute 'text'

OS: Manjaro Linux x86_64

--citation yields blank output

Looks like this command does not work any more:
py scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt

Import Error

I have all the dependencies installed correctly, but I am getting an import error when I call "import scholarly". Please help if you can. Here is the error trace:

import arrow
File "/usr/local/lib/python2.7/site-packages/arrow/init.py", line 3, in
from .arrow import Arrow
File "/usr/local/lib/python2.7/site-packages/arrow/arrow.py", line 19, in
from arrow import util, locales, parser, formatter
File "/usr/local/lib/python2.7/site-packages/arrow/parser.py", line 12, in
from backports.functools_lru_cache import lru_cache # pragma: no cover
ImportError: No module named functools_lru_cache

SSLError

Hi,

I am constantly getting the following error:

requests.exceptions.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",)

All requisite packages are installed - is it Google's policy on Scholar?

Thank you in advance!

Scholarly stopped working.

The error happens when trying to fill author information:

Traceback (most recent call last):
File "test_scholarly.py", line 4, in <module>
author = next(scholarly.search_author('Einstein')).fill()
File "scholarly.py", line 111, in _search_citation_soup
yield Author(row)
File "scholarly.py", line 227, in __init__
self.name = __data.find('h3', class_='gsc_1usr_name').text
AttributeError: 'NoneType' object has no attribute 'text'

I'm not familiar enough with beautifulsoup to track the problem, but my best guess is a change of the Google Scholar html format.

How to get year wise citations of a publication?

I have paper "Systematic Design of Trust Management Systems for Wireless Sensor Networks: A Review" which was published in 2014. Now, I want to see the year wise citation of this publication (i.e., number of citations in 2014, 2015,2016, 2017, 2018, and 2019). Is it possible with this package?

AttributeError when publication was not yet cited

When trying to use pub.citedby on a publication that has not yet been cited, scholarly throws an AttributeError.

I believe that initializing "citedby" per default with 0 during init would be beneficial.

-->

    self.bib = dict()
    self.source = pubtype
    """Init citedby with 0 to avoid AttributeError"""
    self.citedby = int(0)

When year is in format "YYYY/M" throws error

arrow.parser.ParserError: Could not match input '2011/5' to any of the following formats: YYYY-MM-DD, YYYY-M-DD, YYYY-M-D, YYYY/MM/DD, YYYY/M/DD, YYYY/M/D, YYYY.MM.DD, YYYY.M.DD, YYYY.M.D, YYYYMMDD, YYYY-DDDD, YYYYDDDD, YYYY-MM, YYYY/MM, YYYY.MM, YYYY

bibtex information, Publication fill,and url_scholarbib

Hi,
I am always receiving AttributeError: 'Publication' object has no attribute 'url_scholarbib' when trying to do pub.fill()

I believe that the error lies in https://github.com/OrganicIrradiation/scholarly/blob/867f740f61d05c10925e502b56d4bcdf1b1849cb/scholarly/scholarly.py#L168

Scholar changed their simple link to a modal created via js. (class gs_or_cit gs_nph)
I don't really know how would you like to implement this kind of click, but would be glad to help.

pub.fill() fails to parse because it does not enforce hl=en

I noticed that pub.fill() couldn't retrieve all the fields. I realized it was fetching a localized version of the page, which was in German. Could fix the problem adding a hl=en option to all the search strings (lines 25-31 in scholarly.py) that didn't have it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.