Giter Club home page Giter Club logo

search-console-bq's People

Contributors

benpowis avatar dependabot[bot] avatar hoenie-ams avatar nyzl avatar whatevaman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

search-console-bq's Issues

User Permissions problem despite following all steps instructed

Hi Ben,

I would like to first thank you for the work put into this. I am trying to test your code for streaming Search Console data into BigQuery.

I have so far followed all the steps:

  • Activated 'Google Search Console API'
  • Create a service account, provide it 'Full' access on Search Console and 'Data Owner' permission on Google Cloud Platform
  • Added parameters

I face a permission problem at this snippet of code

Make the request to grab the data

response = service.searchanalytics().query(siteUrl=site_url, body=request).execute()

Go down one level in the response object

res = response['rows']

Create a DataFrame of the results

df = pd.DataFrame.from_dict(res)

I get the following error: <HttpError 403 when requesting https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.xxx.com/searchAnalytics/query?alt=json returned "User does not have sufficient permission for site 'https://www.xxx.com'.

Changed the real website to xxx.com above.

What's going wrong?

Thanks in advance for your help.

Cloud Function

Have you considered deploying this as Cloud Function or would it timeout at the 9 minute limit?

Data validation and expanding dimensions - not an "issues" but a request for help

A couple of things:

  1. The reason I am here using your cool code is that we had issues with another transfer service that failed to bring in all data, but never reported that to us. What could we add to the code to have it print back success or any issues it received like timeouts with the service... just want to know with certainty if we received all the data or not.
  2. I'd like to get all dimensions, aggregated by date (each combination of dimensions for every date in the range of dates). I see:
    'dimensions': ["country","page","device","query",""], # uneditable to enforce a nice clean dataframe at the end!
    How would I go about modifying code to get this?

Error on job_config.destination

It's working now, the previous error is because the table schema should be float/int but I'm using String.

Thanks,
--- Updated ---
Hi @benpowis I try using the python script, but I got an Error on job_config.destination

D:\Users\jimi_\Downloads\search-console-bq-master>py search_console_bq.py Traceback (most recent call last): File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 61, in get_sc_df job_config.destination = table_ref File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\job\base.py", line 755, in __setattr__ raise AttributeError( AttributeError: Property destination is unknown for <class 'google.cloud.bigquery.job.load.LoadJobConfig'>.

I search it on google and found this answer https://stackoverflow.com/questions/67687141/property-destination-is-unknown-for-class-google-cloud-bigquery-job-load-loadj

and I try the solution to remove the line job_config.destination = table_ref

and got another error: /gsc$ python3 search_console_bq.py Traceback (most recent call last): File "search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2578, in load_table_from_dataframe raise ValueError("This method requires pyarrow to be installed") ValueError: This method requires pyarrow to be installed

I installed the pyarrow with pip3 install pyarrow
but now I got another Error:
D:\Users\jimi_\Downloads\search-console-bq-master>py search_console_bq.py Traceback (most recent call last): File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "D:\Users\jimi_\Downloads\search-console-bq-master\search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\client.py", line 2649, in load_table_from_dataframe _pandas_helpers.dataframe_to_parquet( File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 586, in dataframe_to_parquet arrow_table = dataframe_to_arrow(dataframe, bq_schema) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 529, in dataframe_to_arrow bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field) File "D:\Users\jimi_\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 290, in bq_to_arrow_array return pyarrow.Array.from_pandas(series, type=arrow_type) File "pyarrow\array.pxi", line 904, in pyarrow.lib.Array.from_pandas File "pyarrow\array.pxi", line 302, in pyarrow.lib.array File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array TypeError: an integer is required (got type str)

I tried in VM Linux as well:

/gsc$ python3 search_console_bq.py Traceback (most recent call last): File "search_console_bq.py", line 74, in <module> y = get_sc_df(p,"2021-10-15","2021-10-15",x) File "search_console_bq.py", line 64, in get_sc_df load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2654, in load_table_from_dataframe parquet_use_compliant_nested_type=True, File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 586, in dataframe_to_parquet arrow_table = dataframe_to_arrow(dataframe, bq_schema) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 529, in dataframe_to_arrow bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field) File "/home/james_tamba/.local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 290, in bq_to_arrow_array return pyarrow.Array.from_pandas(series, type=arrow_type) File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 302, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array TypeError: an integer is required (got type str)

Service Account File

Hi @benpowis,

I created a service account file but the api won't grant me access to the property.
I followed the instructions on the google cloud documentation but can there be something else to consider?

This is the error message I got:
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/webmasters/v3/sites/XXXPropertyXXX/searchAnalytics/query?alt=json returned "User does not have sufficient permission for site 'XXXPropertyXXX'. See also: https://support.google.com/webmasters/answer/2451999.">

The used account definietely has access to the search console as a delegated owner.

Greetings
Daniel

Optional timeframe config for regular downloads

I added timeframe calculation to the script for daily downloads:

`
from datetime import datetime, date, time, timedelta

########### CALCULATE TIMEFRAME FOR REGULAR DOWNLOADS (OPTIONAL) #######
now = datetime.now()
date_calc_start = now - timedelta(days=4)
date_calc_end = now - timedelta(days=3)
timeframe_start = date_calc_start.strftime("%Y-%m-%d")
timeframe_end = date_calc_end.strftime("%Y-%m-%d")
########### CALCULATE TIMEFRAME DOWNLOADS END ###########################

########### SET YOUR PARAMETERS HERE ####################################
PROPERTIES = ["https://www.website.com"]
BQ_DATASET_NAME = 'sc_dataset'
BQ_TABLE_NAME = 'sc_dataset_table_1'
SERVICE_ACCOUNT_FILE = 'company-key-56843ec6463.json'
start_date = timeframe_start
end_date = timeframe_end
################ END OF PARAMETERS ######################################
`

Partitioned table

Hi Ben,

First of all thanks for sharing this code!
I'm new to Python so you helped me a lot with this.

Now, what I am trying to achieve:
I have several (30+) properties of which I want to have the results in GBQ. It's no problem to get the results of all properties by using the "Properties" parameter you have included.
However, as you wrote in the README.md, the size of the table gets very big so partitioning is advised.
In the schema in GBQ I see that the "date" column is type = "string" which cannot be used for partitioning if I am correct.

Could you please advice on how to partition on the "date" column?

Thanks in advance for helping out!

Kr,

Merijn

Country missing in table schema

Hi, i think the search country could be added to the script, as it won't noticeably affect table size if searches originate from one main country but at the same time it will matter for international domains, no? bw

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.