googlecloudplatform / cloud-for-marketing Goto Github PK

License: Apache License 2.0

Shell 8.79% JavaScript 69.74% Java 8.98% Dockerfile 0.05% Python 8.49% Jupyter Notebook 2.89% HTML 1.05%

cloud-for-marketing's Introduction

Gathers various non-official Google products gathered around the theme of running marketing workloads using Google Cloud Platform's products.

There are two main categories:

Marketing Analytics: mostly aimed at marketing departments.
Marketing Technology: mostly aimed that the vertical.

cloud-for-marketing's People

Contributors

Stargazers

Watchers

cloud-for-marketing's Issues

Channel credentials must be a ChannelCredentials object

After upgrading to the latest Tentacles version, I'm having the following errors when sending data to Google Ads API.

I reinstalled Tentacles on Friday, Sep 29.

I can't downgrade Tentacles because I was getting some Google Ads API deprecation errors (v.12 deprecated) - that's why I reinstalled Tentacles.

I am not sure yet if the data is coming through to Google Ads or not, I need some more time to test but it's likely that the data is not coming through.

tentacles_api Cloud Function fails with the error below.

Error: Process exited with code 16
    at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:92:22)
    at process.emit (events.js:400:28)
    at process.emit (domain.js:475:12)
    at process.exit (internal/process/per_thread.js:179:15)
    at sendCrashResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/logger.js:44:9)
    at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:88:44)
    at process.emit (events.js:400:28)
    at process.emit (domain.js:475:12)
    at processPromiseRejections (internal/process/promises.js:245:33)
    at processTicksAndRejections (internal/process/task_queues.js:96:32)

TypeError: Channel credentials must be a ChannelCredentials object
    at new ChannelImplementation (/workspace/node_modules/google-ads-node/node_modules/@grpc/grpc-js/build/src/channel.js:28:19)
    at new Client (/workspace/node_modules/google-ads-node/node_modules/@grpc/grpc-js/build/src/client.js:65:36)
    at new ServiceClientImpl (/workspace/node_modules/google-ads-node/node_modules/@grpc/grpc-js/build/src/make-client.js:58:5)
    at GrpcClient.createStub (/workspace/node_modules/google-ads-node/node_modules/google-gax/build/src/grpc.js:334:22)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)

Please let me know if you have any questions or need any more information.

Anyone else experienced the same error? Any insight on what the root cause might be?

update nodejs 8 code base to 10

Just received a notification on one of my Google Cloud Projects, that nodejs 8 code base should be updated to 10, as v8 won't be supported as of march 15 2021.

I have successfully deployed an instance of the Tentacles project without actually knowing any nodejs - I'm a python guy :)

How do I update to v10? .. or could you update this project?

Issue deploying config to Cloud Datastore

A client has already set up Datastore rather than Firestore in their project (for something unrelated to Tentacles). Although the database is empty, it can't be upgraded because, according to the message shown, 'data has been created and deleted'.

When running the './deploy.sh update_api_config' Tentacles command, the following message appears:

'In detecting Firestore mode: 9 FAILED_PRECONDITION: The Cloud Firestore API is not available for Cloud Datastore projects.' (see screenshot for more).

Is there a way around this? What should the process be? Can Tentacles use Cloud Datastore, and can the config be created manually?

Many thanks

mostly aimed that the vertical?

typo?

cloud-for-marketing/README.md

Line 5 in 1014f23

- Marketing Technology: mostly aimed that the vertical.

Unrecognized arguments during deployment of Cloud Function 3

Sometimes the input script fails when it tries to create cloud function 3. Seems to indicate there's a blank argument.

For additional information I am only setting up Measurement Protocol and using Europe West 2. I've created a MP and Data Import setup previously and it has worked fine.

Campaign manager configuration - what does the quantity represent

In the example here, the quantity is set to 1, but it doesn't say what this quantity represents. Does this correspond to the amount of products per order? Or is this used to deduplicate the orders by order ID (ordinal)?

GoogleAdsError: Version v10 is deprecated. Requests to this version will be blocked

We have a daily job which sends data to Google Ads.

tentacles_api function started throwing this error on March 1, 2023:

[API.ADS] Error in upload conversions batch: {batch-number} GoogleAdsFailure {\n errors: [\n GoogleAdsError {\n error_code: [ErrorCode],\n message: ' Version v10 is deprecated. Requests to this version will be blocked.

Can you please fix the error? Do we then reinstall Tentacles?

Unable to Install on GCP - Step 14 - nodejs-common package issue

Unable to install on GCP. Package error. (nodejs-common) that is not longer available via npm registry.
Error when deploying cloud functions: Step 14
Error code 404

tentacles_api Cloud Function crashes and times out while sending data to Google Ads

Background

We use Tentacles to send data to Google Ads API.

We last reinstalled Tentacles on Feb 12, 2024 which is fairly recently.

Issue

Starting on March 15, 2024, tentacles_api Cloud Function started timing out and crashing with the below error.

[T.MAIN] Error in API[ACLC], config[test_NC_pltv_2_2]:  Error: Task[KjTtfB4HIzfCPqPjyLcG] has been processed 4 times.
    at TentaclesTaskOnFirestore.start (/workspace/src/tentacles_task/tentacles_task_firestore.js:182:34)
    at async sendApiData (/workspace/src/tentacles.js:461:30)

Function execution took 539999 ms, finished with status: 'timeout'

The function appears to be restarting and different instances of it run for 40 minutes.

Before these issues started, usually the function would just run for under 5 minutes and do all the work and succeed.

When I run this query:

SELECT
  DATETIME(TIMESTAMP(createdAt), "US/Mountain") createdAtMST,
  *,
  SAFE_DIVIDE(numberOfFailed, numberOfLines) error_percent
FROM
  adswerve-355706.tentacles.TentaclesTask
WHERE
  DATE(TIMESTAMP(createdAt), "US/Mountain") >= "2024-03-01"
ORDER BY
  createdAt DESC

I see that starting from March 15, there are NULLs in numberOfLines and numberOfFailed. Error says: Error: Task[KjTtfB4HIzfCPqPjyLcG] has been processed 4 times.

When examining data in Google Ads UI, it appears that some data is coming thru, but I am almost certain it's not all data, only part of it.

Questions

Is it a known issue? Can you replicate it on your end? Can it be fixed?

Steps to replicate the issue

Try sending some test data to Google Ads API endpoint.
Examine Cloud Funtion logs.
Run the query for TentaclesTask (see above)

Tentacles Not Finding GCS Storage File

We have been having an issue in executing a BQ -> GA import with tentacles. The problem is that the tentacles-init cloud function can’t find the .csv file containing our data in the right location (see the attached image).

The export to the specified outbound folder works and it triggers the cloud function
The file is deleted from the “outbound” folder and placed in “processed/outbound”
But then the “init” cloud function can’t find the file using the full path “processed/outbound/filename.csv” in outbound.
The configuration specifies the folder “outbound” as the outbound folder.
This results in a failed data import in GA saying that it cant find the file.
Does anyone have a solution?

trouble getting gsutil to cp files to GCS

Having trouble getting gsutil to work with Tentacles naming convention. The gsutil is throwing an error

gsutil cp -r API[GS]_config[foo]_test.csv gs://example
CommandException: No URLs matched: API[GS]_config[foo]_test.csv

Reference:
GoogleCloudPlatform/gsutil#290

Requests to CM API version 3.3 will no longer work

This line, I believe, causes

"Requests to version 3.3 will no longer work..."

warning message from CM360 (see the screenshot attached). Looks like we have 30 days for fixing this.

Why Google Cloud Dataflow or Apache Beam

I love the idea of Future Customer Value Segments. However, from an implementation perspective, Dataflow or Apache Beam looks like overkill as a batch prediction solution. Are there reasons why a customized python solution (e.g Sklearn) or BigQuery ML not a better solution?

Google Ads Customer Match upload issue - tentacles

I've been receiving an error across multiple tests (new installation, existing installation, different Google Ads accounts) in the final step (tentacles_api) of hashed email upload to Google Ads.

The input file is a copy of the "Sample Data file content" from the documentation. Single line hashed_email jsonl.

[API.ADS] Error in Customer Match upload batch[3646537312283590-0-0] TypeError: Cannot read property 'replace' of undefined at GoogleAds.uploadUserDataToUserList (/workspace/node_modules/@google-cloud/nodejs-common/src/apis/google_ads.js:531:67) at /workspace/node_modules/@google-cloud/nodejs-common/src/apis/google_ads.js:499:37 at /workspace/node_modules/@google-cloud/nodejs-common/src/components/utils.js:278:25 at Array.map (<anonymous>) at sendSingleRound (/workspace/node_modules/@google-cloud/nodejs-common/src/components/utils.js:277:36) at reduceFn (/workspace/node_modules/@google-cloud/nodejs-common/src/components/utils.js:342:17) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async /workspace/node_modules/@google-cloud/nodejs-common/src/components/utils.js:348:24 at async sendApiData (/workspace/src/tentacles.js:411:20)

FoCVS - Apache Beam failing to run with InteractiveRunner (instead of DirectRunner / Dataflowrunner)

To better understand what is going on in the Future-Customer-Value-Segments (FoCVS) solution and to do additional exploration on the data in intermediate steps, I'm trying to run fcvs_pipeline_csv.py in a Jupyter Notebook step by step with InteractiveRunner. Unfortunately I've been running into some issues, which don't appear when the runner is set to DirectRunner.

Notebook showing attempt so far

.ipynb hosted on Colab (originally GCP, see below):
https://colab.research.google.com/drive/1sbKkh2goXQU1NhYOQ8MWH_vT1GTvqVgw?usp=sharing

InteractiveRunner error

When you remove the with statement, so we can execute it step by step, by replacing this line:

with beam.Pipeline(options=options) as pipeline:

with:

# --runner=InteractiveRunner
# ...

pipeline = beam.Pipeline(options=options)

it goes well until this step in the pipeline:

limits_dates = (
    min_max_dates
    | beam.FlatMap(c.limit_dates_boundaries, pvalue.AsSingleton(options))
)

which will throw the following error:

ValueError: No producer for ref_PCollection_PCollection_25

Full error (CLICK ME)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-5ba7514f8494> in <module>
      4 )
      5 
----> 6 ib.show(limits_dates)

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/utils.py in run_within_progress_indicator(*args, **kwargs)
    243   def run_within_progress_indicator(*args, **kwargs):
    244     with ProgressIndicator('Processing...', 'Done.'):
--> 245       return func(*args, **kwargs)
    246 
    247   return run_within_progress_indicator

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/interactive_beam.py in show(include_window_info, visualize_data, n, duration, *pcolls)
    473   recording_manager = ie.current_env().get_recording_manager(
    474       user_pipeline, create_if_absent=True)
--> 475   recording = recording_manager.record(pcolls, max_n=n, max_duration=duration)
    476 
    477   # Catch a KeyboardInterrupt to gracefully cancel the recording and

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/recording_manager.py in record(self, pcolls, max_n, max_duration)
    453           category=DeprecationWarning)
    454       pf.PipelineFragment(list(uncomputed_pcolls),
--> 455                           self.user_pipeline.options).run()
    456       result = ie.current_env().pipeline_result(self.user_pipeline)
    457     else:

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/pipeline_fragment.py in run(self, display_pipeline_graph, use_cache, blocking)
    111       self._runner_pipeline.runner._force_compute = not use_cache
    112       self._runner_pipeline.runner._blocking = blocking
--> 113       return self.deduce_fragment().run()
    114     finally:
    115       self._runner_pipeline.runner._skip_display = preserved_skip_display

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    562         finally:
    563           shutil.rmtree(tmpdir)
--> 564       return self.runner.run_pipeline(self, self._options)
    565     finally:
    566       shutil.rmtree(self.local_tempdir, ignore_errors=True)

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)
    166 
    167     pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
--> 168         pipeline_instrument.instrumented_pipeline_proto(),
    169         self._underlying_runner,
    170         options)

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/pipeline_instrument.py in instrumented_pipeline_proto(self)
    124       # instrumented pipeline run cares.
    125       return pf.PipelineFragment(
--> 126           list(targets)).deduce_fragment().to_runner_api()
    127     return self._pipeline.to_runner_api()
    128 

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/runners/interactive/pipeline_fragment.py in deduce_fragment(self)
     98         self._runner_pipeline.to_runner_api(),
     99         self._runner_pipeline.runner,
--> 100         self._options)
    101     ie.current_env().add_derived_pipeline(self._runner_pipeline, fragment)
    102     return fragment

~/apache-beam-2.31.0/lib/python3.7/site-packages/apache_beam/pipeline.py in from_runner_api(proto, runner, options, return_context)
    932       pcollection.pipeline = p
    933       if not pcollection.producer:
--> 934         raise ValueError('No producer for %s' % id)
    935 
    936     # Inject PBegin input where necessary.

ValueError: No producer for ref_PCollection_PCollection_25

Questions

Would it be possible to fix this error (and probably more after this), so it will be possible to fully run this pipeline in a Jupyter Notebook with InteractiveRunner?
How did you develop this pipeline? The DAG generated from this pipeline has many connected parts. It is hard to imagine that this is written in one go, instead of interactively testing out each part.
If question 1 is not possible, how would you suggest to learn about the internals of this solution, and how to play around with the intermediate data transforms?

Execution environment

GCP Dataflow Notebook with Kernel: apache-beam-2.31.0 (last updated 2021.08.30)
Python 3.7

Google Ads Customer Match upload issue - tentacles - login customer id error

We are trying to use the customer match upload for Google ADS but we are always receiving the following error:
[API.ADS] Error in Customer Match upload batch[4089054709043335-0-0] GoogleAdsFailure { errors: [ GoogleAdsError { error_code: [ErrorCode], message: "The login customer id header 'Optional[XXXXXXXXXXXXX]' could not be validated." } ], request_id: '0FJKDYRnIqxsHIIHIM2zXg' }

Can you explain the format of login_customer_id ?
How can I collect the right value to be used?
Is this something that we can collect through the Google ADS Admin Web GUI?

Region Names in Step 5 of the Installation Process

Step 5 of the installation process currently seems to have a bug that uses the value "NAME:" for every odd option in the list (see the screenshot below). Valid (even) options work as expected.

Can tentacles_init cloud function be delayed?

My input file to the 'outbound' Cloud Storage bucket is being generated by Dataprep/Dataflow. What I find happening is that the cloud function is too eager. Dataflow/Dataprep starts to write the output file, and the cloud function executes before the file has been completely written, resulting in invalid or partial data being sent to GA.

Would it be possible to add a mechanism to delay the cloud function execution? Maybe a parameter in the filename to tell the init function to pause for X seconds once a new file is detected, or even a way to change the overall functionality of the Cloud Function Cloud Storage trigger.