benpowis / search-console-bq Goto Github PK
View Code? Open in Web Editor NEWPython code to grab data from Google Search Console and send it to BigQuery.
Python code to grab data from Google Search Console and send it to BigQuery.
Hi Ben,
I would like to first thank you for the work put into this. I am trying to test your code for streaming Search Console data into BigQuery.
I have so far followed all the steps:
I face a permission problem at this snippet of code
response = service.searchanalytics().query(siteUrl=site_url, body=request).execute()
res = response['rows']
df = pd.DataFrame.from_dict(res)
I get the following error: <HttpError 403 when requesting https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.xxx.com/searchAnalytics/query?alt=json returned "User does not have sufficient permission for site 'https://www.xxx.com'.
Changed the real website to xxx.com above.
What's going wrong?
Thanks in advance for your help.
Have you considered deploying this as Cloud Function or would it timeout at the 9 minute limit?
A couple of things:
'dimensions': ["country","page","device","query",""], # uneditable to enforce a nice clean dataframe at the end!
It's working now, the previous error is because the table schema should be float/int but I'm using String.
Thanks,
--- Updated ---
Hi @benpowis I try using the python script, but I got an Error on job_config.destination
D:\Users\jimi_\Downloads\search-console-bq-master>py search_console_bq.py Traceback (most recent call last): File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 61, in get_sc_df job_config.destination = table_ref File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\job\base.py", line 755, in __setattr__ raise AttributeError( AttributeError: Property destination is unknown for <class 'google.cloud.bigquery.job.load.LoadJobConfig'>.
I search it on google and found this answer https://stackoverflow.com/questions/67687141/property-destination-is-unknown-for-class-google-cloud-bigquery-job-load-loadj
and I try the solution to remove the line job_config.destination = table_ref
and got another error: /gsc$ python3 search_console_bq.py Traceback (most recent call last): File "search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2578, in load_table_from_dataframe raise ValueError("This method requires pyarrow to be installed") ValueError: This method requires pyarrow to be installed
I installed the pyarrow with pip3 install pyarrow
but now I got another Error:
D:\Users\jimi_\Downloads\search-console-bq-master>py search_console_bq.py Traceback (most recent call last): File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\client.py", line 2649, in load_table_from_dataframe _pandas_helpers.dataframe_to_parquet( File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 586, in dataframe_to_parquet arrow_table = dataframe_to_arrow(dataframe, bq_schema) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 529, in dataframe_to_arrow bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 290, in bq_to_arrow_array return pyarrow.Array.from_pandas(series, type=arrow_type) File "pyarrow\array.pxi", line 904, in pyarrow.lib.Array.from_pandas File "pyarrow\array.pxi", line 302, in pyarrow.lib.array File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array TypeError: an integer is required (got type str)
I tried in VM Linux as well:
/gsc$ python3 search_console_bq.py Traceback (most recent call last): File "search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2654, in load_table_from_dataframe parquet_use_compliant_nested_type=True, File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 586, in dataframe_to_parquet arrow_table = dataframe_to_arrow(dataframe, bq_schema) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 529, in dataframe_to_arrow bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 290, in bq_to_arrow_array return pyarrow.Array.from_pandas(series, type=arrow_type) File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 302, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array TypeError: an integer is required (got type str)
Hey @benpowis ,
over the last 3 years i downloaded the data for a GSC Property and never get more than 50.000 rows back.
I think it is not necessary to loop until 100.000 Rows.
Hi @benpowis,
I created a service account file but the api won't grant me access to the property.
I followed the instructions on the google cloud documentation but can there be something else to consider?
This is the error message I got:
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/webmasters/v3/sites/XXXPropertyXXX/searchAnalytics/query?alt=json returned "User does not have sufficient permission for site 'XXXPropertyXXX'. See also: https://support.google.com/webmasters/answer/2451999.">
The used account definietely has access to the search console as a delegated owner.
Greetings
Daniel
I added timeframe calculation to the script for daily downloads:
`
from datetime import datetime, date, time, timedelta
########### CALCULATE TIMEFRAME FOR REGULAR DOWNLOADS (OPTIONAL) #######
now = datetime.now()
date_calc_start = now - timedelta(days=4)
date_calc_end = now - timedelta(days=3)
timeframe_start = date_calc_start.strftime("%Y-%m-%d")
timeframe_end = date_calc_end.strftime("%Y-%m-%d")
########### CALCULATE TIMEFRAME DOWNLOADS END ###########################
########### SET YOUR PARAMETERS HERE ####################################
PROPERTIES = ["https://www.website.com"]
BQ_DATASET_NAME = 'sc_dataset'
BQ_TABLE_NAME = 'sc_dataset_table_1'
SERVICE_ACCOUNT_FILE = 'company-key-56843ec6463.json'
start_date = timeframe_start
end_date = timeframe_end
################ END OF PARAMETERS ######################################
`
Hi Ben,
First of all thanks for sharing this code!
I'm new to Python so you helped me a lot with this.
Now, what I am trying to achieve:
I have several (30+) properties of which I want to have the results in GBQ. It's no problem to get the results of all properties by using the "Properties" parameter you have included.
However, as you wrote in the README.md, the size of the table gets very big so partitioning is advised.
In the schema in GBQ I see that the "date" column is type = "string" which cannot be used for partitioning if I am correct.
Could you please advice on how to partition on the "date" column?
Thanks in advance for helping out!
Kr,
Merijn
https://developers.google.com/search/blog/2020/12/search-console-api-updates#discovery-doc-migration
I suppose, the following is what needs to be changed at least? Tested with the updated service names, and woring. Is there anything else that needs to be updated, after support for the current library drops? BW
service = build(
'searchconsole',
'v1',
credentials=credentials
)
Hi, i think the search country could be added to the script, as it won't noticeably affect table size if searches originate from one main country but at the same time it will matter for international domains, no? bw
Hi, I'm getting the following AttributeError: "Property destination is unknown for <class 'google.cloud.bigquery.job.load.LoadJobConfig'>." Any advice? Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.