osome-iu / botometer-python Goto Github PK
View Code? Open in Web Editor NEWA Python API for Botometer by OSoMe
Home Page: https://botometer.osome.iu.edu
License: MIT License
A Python API for Botometer by OSoMe
Home Page: https://botometer.osome.iu.edu
License: MIT License
I am often running into an error where BotOMeter must be checking the followers of a user (or something else?) and is returning an error (see below) if that user has not tweeted rather than skipping that user and moving on. To be clear, this user is not the one being searched. How might I avoid this sort of problem in the future?
Error in py_call_impl(callable, dots$args, dots$keywords) : NoTimelineError: user '{'id_str': '557225288', 'screen_name': 'luanods'}' has no tweets in timeline
. Do you know the reason?
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://botometer-pro.p.rapidapi.com/2/check_account.
Do you know the reason?
Hi, my research into bot detection led me here – well, actually, your website; it unfortunately took me forever to actually find your GitHub (your usage of "repository" on the overview site combined with not actually linking to here from the "Tools" page is confusing).
Having had a look at your API, and having skimmed the code in here, I'm a bit confused about... well the structure of the API itself as well as your usage of "Python API" (or "our official Python client libary" on the website) for what's presented here. From what I can see, what you are doing here is to mostly use Tweepy to get info from Twitter, which you eventually send off to the Botometer API. Which I guess is nice and convenient for people not used to working with Twitter, but not equally helpful when you are already using (and intending to stick to) a different Python library to connect to the Twitter API. So basically I'm wondering if you are going to offer a simple, library agnostic wrapper as well?
Relatedly – because all the data is collected beforehand anyway – I was wondering what the API's actual limits are, or what JSON structure the code on your end expects. How many tweets can I feed into your API? Which of they key/value pairs of tweet objects need to be present for your code to be able to interpret them as such? I imagine you don't make use of all the (meta) data points in full tweet objects, so it might make sense to limit queries to what's absolutely necessary (... particularly if that then means I can send more tweet objects over to be examined).
This also ties into my next question: what about an option to exclude individual calculations/scores to reduce the number of calculations needed to be done on your side, which I assume would in turn speed up things on the client side, for irrelevant scores? Are there any plans for that, or any plans to make it possible to only query some scores? This is actually more what I'd expect of an API: to also offer finer grained queries/splitting of requests. I'd find this particularly relevant for the analysis of non-English tweets – the sentiment analyis seems to play an important role in your calculations, but is of course completely redundant for non-English tweets. It would generally be nice if it were possible to exclude calculations because I find the (continued) significance of some of them unclear or questionable.
With regard to that last bit: are there any plans to also document the calculations you are doing in any way outside of the research papers you reference? I'm asking because not all these resources are freely available, and to have to read through them all and then try to piece together/make an educated guess about what is currently/still being used as basis for the calculations – nevermind how they are actually done – is... well, not ideal. As said, I'd have to ditch some calculations and redo them incorporating missing bits.
I'd generally be interested to know if the project is still being worked on, or being developed further, though discussion of that is probably better saved for another channel.
According to rapidAPI's announcement: https://p.mashape.com => https://p.rapidapi.com . Need to be done before Sep.
This might be a beginner's issue, sorry about that:
To make full use of the Pro API I followed the guide on this wiki page (https://github.com/IUNetSci/botometer-python/wiki/Using-the-Pro-API). Yet, when I omit the 'access_token' and 'access_token_secret' in the Botometer constructor, I receive the following error for the last line of code:
Traceback (most recent call last):
File "<ipython-input-2-16cf1b0c8b83>", line 15, in <module>
**twitter_app_auth)
TypeError: __init__() missing 2 required positional arguments: 'access_token' and 'access_token_secret'"
Any ideas?
The code I use:
import botometer
twitter_app_auth = {
'consumer_key': 'XX',
'consumer_secret': 'XX',
}
botometer_api_url = 'https://botometer-pro.p.mashape.com'
bom = botometer.Botometer(botometer_api_url=botometer_api_url,
mashape_key='XX',
**twitter_app_auth)
The readme of this repo states, that the Ultra plan allows up to 450 requests per 15-minute window, which would mean a daily quota of 450 * 4 * 24 = 43200
. However, the pricing section on the Rapid-API website states that daily requests are capped at 17280, but with no 15 min rate limit between requests.
Which of these quotas is correct, and is there in fact a 450 requests per 15 min rate limit?
I purchased BotometerLite, it can return botscore, tweet_id and user_id. My problem is that my twitter data only has user_name but no user_id, and in BotometerLite batch analysis, accounts that have been cancelled will be automatically skipped, so I have no way to match the robot analysis result with user_name. How can I add user_name to the returned result?
After properly configuring the application, I'm getting no output from the python program.
Also, hitting the REST endpoint returns a 500.
Hello, according to your pages over at https://botometer.iuni.iu.edu/#!/api, there is an R package link: https://github.com/marsha5813/botcheck but is very out of date, so is it possible to have it updated? Last update was 2 years ago, and many things have changed since then. I would really appreciate your help!
We probably need a brief guide to errors coming from the server. The best example is the 502 error mentioned in #4; this is indeed the server throwing 502, we should probably document why.
This is as opposed to the 500 error in #2, which does probably warrant a code fix. 500 error fixed with a7e9916.
Hi, I have a .csv file containing 17200 usernames (and one for loop that iterates through those usernames and gets the bot score). I have Pro key but it took more than a day just to process little over 3000 accounts?
I am doing as per pro-api documentation:
# Pro API endpoint
botometerapiurl = 'https://botometer-pro.p.rapidapi.com'
# make a method of botometer
bom = botometer.Botometer(botometerapiurl=botometerapiurl,
waitonratelimit=True,
rapidapikey=rapidapikey,
**twitterappauth)
I know your servers are under stress so is that the reason? I am using my institution's internet to do this task. I don't think IO is a culprit. Should waitonratelimit
be False??
I have a huge amount of data to collect therefore I created 4 botometer pro APIs. For all 4 of them I am using consumer keys from different twitter users. However, when I am trying to track per day API calls it is barely touching 4k (approx 12% usage of pro quota) calls which is less than even the freemium version. What is the possible reason for this? How should I change the code so that I can use the 100% of botometer pro API?
My code looks like this:
# Pro API endpoint
botometer_api_url = 'https://botometer-pro.p.mashape.com'
twitter_app_auth = {
'consumer_key': "xxxxxxxxxxxxxxxxxxxxx",
'consumer_secret': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
}
api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
bom = botometer.Botometer(botometer_api_url=botometer_api_url,
mashape_key=api_key,
wait_on_ratelimit=True,
**twitter_app_auth)
for id in list_of_ids:
result = bom.check_account(id)
# save result
Now that the paid API is released, it's about time for some work on this, our official Python API implementation. Share with us anything you'd like to see included!
The most important focus for version 2 at this moment is to enable better error handling for bulk workloads.
{
"cap": {
"english": 0.0011785984309163565,
"universal": 0.0016912294273666159
},
"categories": {
"content": 0.058082395351262375,
"friend": 0.044435259626385865,
"network": 0.07064549990637549,
"sentiment": 0.07214003430676995,
"temporal": 0.07924665710801207,
"user": 0.027817972609638725
},
"display_scores": {
"content": 0.3,
"english": 0.1,
"friend": 0.2,
"network": 0.4,
"sentiment": 0.4,
"temporal": 0.4,
"universal": 0.1,
"user": 0.1
},
"scores": {
"english": 0.0215615093045025,
"universal": 0.0254864249403189
},
"user": {
"id_str": "1548959833",
"screen_name": "clayadavis",
"...": "..."
}
}
From the result, how to interpret the results? Which field indicates the bot?
Can you please provide any description?
Is there a standard threshold for determining whether an account is a bot, assuming the returned scores are probabilities?
Which score is commonly used?
raise HTTPError(http_error_msg, response=self)
HTTPError: 502 Server Error: Bad Gateway for url: http://truthy.indiana.edu/botornot/api/1/check_account
I have a pro-api key fpr botometer but I am keep getting:
raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://osome-botometer.p.rapidapi.com/2/check_account
I followed the example on the readme.md :( And the wiki pages have old info too. Thank you!
Dear botornot's authors,
I am sorry if my issue is too basic.
I tried to follow the quick guide to check a list of 180 twitter user accounts in Python. I got the message
AttributeError: 'BotOrNot' object has no attribute 'check_accounts_in'
I then switched to use 'check_account' using list comprehension, but when the list had more than 1 user, it gave the error
tweepy.error.TweepError: Not authorized.
I installed botornot from pip and its version is 0.2.
Thanks,
Hieu
I'm running botometer on a dataset scrapped from twitter. It's a relativity large dataset.
I'm wondering if there's any known ways to speed up botometer pinging the twitter sever?
I think the documentation of the x-mashaped-key is outdated and the key has a new name X-RapidAPI-Key.
I clicked on the hyperlink of the mashaped marketplace and got redirected to the website RapidAPI. I still could find the botometer api on that marketplace but not the mentioned x-mashaped-key.
Therefore I tried to use the X-RapidAPI-Key and it worked. That's why I think, they changed the name of the key.
It looks like when trying to use the pro api and the check_accounts_in
method the api_url
is set back to the basic endpoint. If you would like a pull, happy to give it a try.
https://github.com/IUNetSci/botometer-python/blob/master/botometer/__init__.py#L130
Steps to reproduce:
curl -X POST --include 'https://osome-botometer.p.mashape.com/2/check_account' -H 'X-Mashape-Key: mSXhy96RQmmshCpHVswh4bZis3nzp1Q1v6mjsnoyDK4jFVc1ki' -H 'Content-Type: application/json' -H 'Accept: application/json' --data-binary '@data.json'
or:
headers = {
'X-Mashape-Key': MASHAPE_KEY
}
resp = requests.post('https://osome-botometer.p.mashape.com/2/check_account', json=data, headers=headers)
resp.raise_for_status()
Result:
HTTP/1.1 100 Continue
HTTP/1.1 500 INTERNAL SERVER ERROR
Content-Type: text/html; charset=UTF-8
Date: Fri, 29 Dec 2017 21:02:47 GMT
Server: Mashape/5.0.6
Vary: Accept-Encoding,User-Agent
X-RateLimit-requests-Limit: 17280
X-RateLimit-requests-Remaining: 17270
Content-Length: 291
Connection: keep-alive
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
Note that these were tweets that were extracted from an existing dataset that was collected from the Twitter stream API. I can successfully call check_account for the same account using botometer-python.
I know this is the repo for botomoter-python, not botometer. If I should report elsewhere, please let me know. Thanks in advance for your assistance.
data.json.zip
I am on free account and getting this error while running the starter code from git repo:
requests.exceptions.HTTPError: 401 Client Error: UNAUTHORIZED for url: https://botometer.osome.iu.edu/api-test/4/check_account
I have been observing some strange behavior from the API today and thought it was a problem on my side, but just now I got a 504 Server Error: GATEWAY_TIMEOUT. Is the API server going through some maintenance now is there any way I can debug this problem?
Thanks much!
my full code is below:
import botometer
import tweepy
name_list = ['username']
mashape_key = "" # now it's called rapidapi key
twitter_app_auth = {
'consumer_key': '',
'consumer_secret': '',
'access_token': '',
'access_token_secret': '**',
}
bom = botometer.Botometer(wait_on_ratelimit=True,
mashape_key=mashape_key,
**twitter_app_auth)
bot = []
for name in name_list:
try:
result = bom.check_account(name)
if result["cap"]["english"]>0.5:
bot.append(name)
print(name)
except tweepy.TweepError as e1:
pass
except botometer.NoTimelineError as e2:
pass
print(bot)
and I got this error:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='osome-botometer.p.mashape.com', port=443): Max retries exceeded with url: /2/check_account (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x10b9da2e8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
How can I solve this issue? This worked well before update.
I used to use BotOrNot API when it was with Mashape but now my code does not work. SO I created an app with RapidAPI and obtained the rapidapi_key. I also did a new twitter app. I installed all the dependencies and my code still does not work! So I decided to run the example provided to make sure the API works but still no luck when I run it. I keep on getting this error
Traceback (most recent call last): File "BotOrNot.py", line 16, in <module> result = bom.check_account('@ACCOUNTNAME') File "~/pythonEnvs/Google_Env/lib/python3.7/site-packages/botometer/__init__.py", line 132, in check_account bom_resp.raise_for_status() File "~/pythonEnvs/Google_Env/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://botometer-pro.p.rapidapi.com/2/check_account
which does not seems to be in the list of possible error provided here I would really appreciate anyhelp on why I am getting this error!
The following are the (anonymised) data returned for a user by the Botometer Python API, using the sample code provided in the README:
{'scores': {'universal': 0.22, 'english': 0.19}, 'user': {'id_str': '135791113', 'screen_name':
'abcdefg'}, 'categories': {'network': 0.15, 'content': 0.31, 'temporal': 0.16, 'sentiment': 0.43, 'friend': 0.3, 'user': 0.14}}
The following are the (anonymised) data returned from the Botometer browser client, using the "download data" option:
[
{
"user": {
"screen_name": "abcdefg",
"id_str": "135791113",
"lang": "en-gb"
},
"categories": {
"content": 0.29,
"friend": 0.26,
"network": 0.14,
"sentiment": 0.44,
"temporal": 0.13,
"user": 0.15
},
"score": 0.22,
"scores": {
"english": 0.22,
"universal": 0.21
}
}
]
These were created within a minute of each other.
Note that the downloaded JSON contains two extra elements:
"lang" in the user section, and
"score" at the same level as "user","categories" and "scores"
Note also that the values of nearly all elements are at slight variance between the two methods. Possibly not enough to be grossly misleading but nevertheless not insignificant.
Hi,
recently I am getting the following error during batch operations using check_accounts_in
:
500 Server Error: Internal Server Error for url: https://osome-botometer.p.mashape.com/2/check_account
I was trying to figure it, but no success.
requests.exceptions.HTTPError: 404 Client Error: Not Found for URL:
"Missing RapidAPI application key.
Is it because of the new version? What part of my code should I change?
Hi guys! I have been trying to make the example script work. Installed all the dependencies, got the all the keys and stuff. Here is the error I am getting:
Traceback (most recent call last):
File "twitter_bot.py", line 13, in <module>
result = bom.check_account('@clayadavis')
File "/Users/aabhishek/anaconda/lib/python2.7/site-packages/botometer/__init__.py", line 113, in check_account
full_user_object=full_user_object)
File "/Users/aabhishek/anaconda/lib/python2.7/site-packages/botometer/__init__.py", line 67, in _get_twitter_data
user_timeline = self.twitter_api.user_timeline(user, count=200)
File "/Users/aabhishek/anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 245, in _call
return method.execute()
File "/Users/aabhishek/anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 229, in execute
raise TweepError(error_msg, resp, api_code=api_error_code)
tweepy.error.TweepError: [{u'message': u'Bad Authentication data.', u'code': 215}]
Any help will be appreciated!
Is there any way to know how much of the daily quota I have used while running a code? Can I automatically wait until the next day when the API refreshes the quota limit? How and when is this limit refreshed?
Currently, I cannot retrieve any results. Using the Python API, I encountered the error 502 (HTTPError: 502 Server Error: Bad Gateway for url: https://osome-botometer.p.mashape.com/2/check_account). Using the website, I also receive the same error (bad gateway or bad proxy). Testing my mashape keys, it produced the same error saying:
<title>502 Proxy Error</title>The proxy server received an invalid
response from an upstream server.
The proxy server could not handle the request POST /api/2/check_account.
Reason: Error reading from remote server
I used different laptops, different OS, and different keys/tokens (Twitter/mashape).
I have noticed that the quota of the basic plan is reached too early, without overpassing the number of 500 requests. For example, with a total of 6 requests (seen on RapidAPI), the limit error is raised (on code and also in the notification of RapidAPI interface). I have also employed a delay on requests, so it seems not to be a problem of Twitter rate limit...
PD: Version 4 of Botometer is installed and executed properly, and the successful requests are working fine
The API Overview on Mashape does not cover the entire response object, including information regarding how to interpret the CAP scores.
Hello,
I tried to get the bot score only for 400 accounts in 24 hours, Why am I getting the below error?
Failed to send request: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/statuses/user_timeline.json?id=%40defenceforces&include_rts=True&count=200 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001D037C04F88>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
This is a Quality of Life ticket.
Google search on Tweepy's "Not Authorized" Errors gives discussion on failure in fetching private friend list. However, the set of users I got this error for are those that have user ids unable to be looked up at https://tweeterid.com/ . They are most likely deleted/suspended users. Currently these two types of error cannot be distinguished by the error thrown by bom
.
One potential way (not tested) to separate cases of deleted users is to catch 404. Twitter API says 404 is returned for lookup queries with empty results. I suspect that when user cannot be found, that's the same.
If Tweepy doesn't allow catching 404 before it throws TweepError
, it may be too much work:( Forget about it then.
There is an edge case failure that is not being handled gracefully. A 500 server error doesn't seem right in this case. I believe it is caused by a user not having any tweets in their timeline. See error below:
HTTPError Traceback (most recent call last)
<ipython-input-80-8584ca4403c4> in <module>()
----> 1 bon_response = bon.check_account('@bennewmannnn')
2
/usr/local/lib/python2.7/site-packages/botornot/__init__.pyc in check_account(self, user)
104 def check_account(self, user):
105 user_data, tweets = self._get_user_and_tweets(user)
--> 106 classification = self._check_account(user_data, tweets)
107
108 return classification
/usr/local/lib/python2.7/site-packages/botornot/__init__.pyc in _check_account(self, user_data, tweets)
98
99 bon_resp = self._bon_post(self._bon_api_method('check_account'),
--> 100 data=post_body)
101 return bon_resp.json()
102
/usr/local/lib/python2.7/site-packages/botornot/__init__.pyc in wrapper(*args, **kwargs)
36 resp = func(*args, **kwargs)
37 try:
---> 38 resp.raise_for_status()
39 except requests.HTTPError as e:
40 if resp.status_code == 429:
/usr/local/lib/python2.7/site-packages/requests/models.pyc in raise_for_status(self)
823
824 if http_error_msg:
--> 825 raise HTTPError(http_error_msg, response=self)
826
827 def close(self):
HTTPError: 500 Server Error: INTERNAL SERVER ERROR
The API was working well but suddenly I am getting the client error,
404 Client Error: Not Found for url: https://osome-botometer.p.rapidapi.com/2/check_account
How to resolve it?
Hello Botometer,
I keep on receiving random 502 errors while checking lists of accounts.
Is this situation circumstancial or is there something I can do to improve the request rate / day ?
It sharply fell from around 15k-17k to less than 3k per day + I need to manually adapt and restart the requests a dozen times / day.
Thanks in advance for your help,
DV7
Error format:
(...)HTTPError (http_error_msg, response=self)
HTTPError: 502 Server Error: Bad Gateway for url: https://botometer-pro.p.rapidapi.com/2/check_account
Original code:
low_memory=False
import nest_asyncio
nest_asyncio.apply()
import pandas as pd
import os
import glob
import csv
import botometer
import json
import time
column_usernames_corrected = pd.read_csv('column_usernames_corrected.csv')
column_usernames_corrected = column_usernames_corrected.astype(str)
error_accounts = ['error accounts']
botometer_api_url = 'https://botometer-pro.p.rapidapi.com'
twitter_app_auth = {
'consumer_key': 'XXX',
'consumer_secret': 'XXX',
'access_token': 'XXX',
'access_token_secret': 'XXX',
}
rapidapi_key = 'XXX'
bom = botometer.Botometer(botometer_api_url=botometer_api_url,
rapidapi_key=rapidapi_key,
wait_on_ratelimit=True,
**twitter_app_auth)
print(time.time())
print("Starting...")
for i in range(len(column_usernames_corrected)):
try:
result = bom.check_account('@' + column_usernames_corrected.loc[i, 'username'])
with open(column_usernames_corrected.loc[i, 'username'] +'.json', 'a', encoding='utf-8') as js:
json.dump(result, js, ensure_ascii=False, indent=4)
except botometer.TweepError:
error_accounts.append(column_usernames_corrected.loc[i, 'username'])
error_accounts_csv = pd.DataFrame(error_accounts, columns=['error accounts'])
error_accounts_csv.to_csv('error_accounts.csv', index=False, encoding='utf-8-sig')
pass
print("Step 1 - Done")
print(time.time())
From the API Results, Everything is Returned except the Complete Automation probability. Is it a feature for only the Paid API access. ?
I’m getting a 500 server error when I call your endpoint. The app is in nodejs but I don’t think that’s the main problem, I got some error messages before but now I mostly get this. Do you have any idea what could cause this? It’s difficult to debug as it doesn’t give any data.
Is there any way to retrieve the Complete Automation Probability with the API? Or at least provide a way that it is calculated?
Hello, your wiki pages talking about pro key contain wrong information:
it should be:
botometer_api_url = 'https://botometer-pro.p.rapidapi.com'
bom = botometer.Botometer(botometer_api_url=botometer_api_url, wait_on_ratelimit=True, rapidapi_key=rapidapi_key, **twitter_app_auth)
Rest of the process is same. Thank you!
I have to collect data for huge number of users and therefore I purchased multiple botometer pro accounts. I was wondering can I use the same twitter app authorization key with those multiple pro accounts. Will this in any way reduce the speed at which botometer will extract data for me?
Hi Botometer,
I have two questions. I notice that time for getting the result varies. At my side, it could take 40 seconds, while my friend in Germany can have the result within 10 seconds. We both only query one account at a time. Also, it's getting slower after we do more queries. How can I fix it? I hope it's not related to my network.
Second, I noticed that the bot score of an account will change. How often do you suggest that I should update my database? Is it ok if the score is changed within 1?
BTW, I'm using a pro account.
Best,
Scott
I am just trying to run the sample code, but it throws an error
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python37-32\lib\site-packages\tweepy\streaming.py", line 358 def _start(self, async): ^ SyntaxError: invalid syntax
I'm running it with Python 3.7
Kindly assist with the below error when i try to execute the script ...
bom = botometer.Botometer
AttributeError: module 'botometer' has no attribute 'Botometer'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.