Giter Club home page Giter Club logo

Comments (10)

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024 2

Hi @bbernays

Magic! The sync completed successfully from the Linux machine - still with the very low batch_size though.
I will run several additional syncs adjusting the concurrency and batch_size values to see if the original error is even reproducible in Linux.

I will report back with my findings.

Thank you

from cloudquery.

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024 1

Hey all

I can confirm that moving to Linux resolved my problems and will proceed to close this issue.
For full context, the only difference/change was changing to a machine of a different OS flavor.
The Ubuntu VM was collocated with the Windows VM in the same subnet and their size is also the same.

Thank you all again for the support.
You guys rock!

from cloudquery.

erezrokah avatar erezrokah commented on September 26, 2024

Hi @rafaelrodrigues3092 in #12577 it seems reducing the concurrency for the Azure plugin seems to have solved the issue. Do you mind trying it again? Maybe a value of 1000 like in #12577 (comment) could work.

Also can you share which authentication method you're using? We don't recommend using az login in production environments as that leads to memory issues, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-az-login

If you're using az login you could try env variables instead, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-environment-variables

from cloudquery.

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024

Hi @rafaelrodrigues3092 in #12577 it seems reducing the concurrency for the Azure plugin seems to have solved the issue. Do you mind trying it again? Maybe a value of 1000 like in #12577 (comment) could work.

Also can you share which authentication method you're using? We don't recommend using az login in production environments as that leads to memory issues, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-az-login

If you're using az login you could try env variables instead, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-environment-variables

Hi @erezrokah, thanks for following up.

I tested setting the concurrency first to 10000 and then down again to 1000 and still see the same issue.
Regarding authentication, yes, I am already using the environment variable approach.

image

I would like to note that I see the process fail less than 5 minutes into the execution. Typically within 2.5-4 minutes, the process fails.

Also, before it fails, I don't see any type of memory pressure. I do see CPU pressure though. CPU utilization goes up to 100% (from a baseline of <10%) and the memory is around ~50% (from a baseline of ~39%)
This is on a 4-core, 32GB RAM Windows system.

This was the log for execution with concurrency 1000:

{"level":"warn","module":"cli","time":"2024-01-13T15:00:29Z","message":"when using the CloudQuery registry, it's recommended to log in via `cloudquery login`. Logging in allows for better rate limits and downloading of premium plugins"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_compute_capacity_reservation_groups","time":"2024-01-13T15:00:38Z"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_network_virtual_network_gateways","time":"2024-01-13T15:00:38Z"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_network_virtual_network_gateway_connections","time":"2024-01-13T15:00:38Z"}
{"level":"error","module":"cli","err":"write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","message":"BatchClose","module":"pgx","pid":2165069110,"time":587.5166,"time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","duration":1509.1935,"error":"failed to execute batch: write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","len":10000,"message":"failed to write batch","module":"pg-dest","time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","client":"subscriptions/41ae70bf-262c-4f13-85f9-dc46e7e4f48d","error":"context canceled","message":"pre resource resolver failed","module":"azure-src","table":"azure_keyvault_keyvault","time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","error":"failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","time":"2024-01-13T15:03:34Z","message":"exiting with error"}

Thank you!

from cloudquery.

yevgenypats avatar yevgenypats commented on September 26, 2024

Hey @rafaelrodrigues3092 is the Postgres running on the same machine ?

from cloudquery.

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024

Hey @rafaelrodrigues3092 is the Postgres running on the same machine ?

Hi @yevgenypats,

Thanks for the reply. It is not.
Cloudquey is running from an Azure Windows VM with the specs listed above.
Postgres is running on an Azure Database (flexible server) - https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview

Thanks

from cloudquery.

erezrokah avatar erezrokah commented on September 26, 2024

Hi @rafaelrodrigues3092, thanks for the reply. Are you able to sync data at all or the sync fails at the beginning?

If at the beginning can you verify the DB connection string via a tool like psql, and if not at the beginning can you try modifying batch settings? See https://hub.cloudquery.io/plugins/destination/cloudquery/postgresql/v7.1.5/docs?search=post#overview-postgresql-spec (batch_size, batch_size_bytes and batch_timeout). Maybe try first with batch_size: 1 to debug

from cloudquery.

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024

Hi @erezrokah

Thank you for the suggestions.
I ran a successful sync by lowering the batch_size to 1, then increasing it successfully up to 8000 for a single azure subscription.
When going above 8000 for a single subscription, or when adding the all the azure subscriptions back with batch_size < 8000, I started re-encountering the same errors.

I am limited on what logs I can see from the PG side (since it's a managed service), but I do see this message incomplete message from client consistently appearing every time the error occurs.

After making several tweaks through trial and error to the destination spec values as shown here:

kind: destination
spec:
  name: "postgresql"
  registry: "github"
  path: "cloudquery/postgresql"
  version: "v7.1.5"
  migrate_mode: "forced"
  write_mode: "overwrite-delete-stale"
  spec:
    connection_string: "${PG_CONNECTION_STRING}"
    pgx_log_level: "warn"
    batch_size: 1000
    batch_timeout: 120s
    batch_size_bytes: 10000000

The sync ran for ~18 minutes (I got data in the database) but then it errored out with failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: remote error: tls: bad record MAC.
This is a repeatable issue as I got the same message after 3 consecutive executions.

As an FYI - my connection string requires ssl: postgresql://<USER_NAME>:<PASSWORD>@<SERVER_NAME>:6432/<DB_NAME>?sslmode=require

Here's the log:

{"level":"error","module":"cli","err":"remote error: tls: bad record MAC","message":"BatchClose","module":"pgx","pid":3691910899,"time":596.6804,"time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","duration":794.06,"error":"failed to execute batch: remote error: tls: bad record MAC","len":1000,"message":"failed to write batch","module":"pg-dest","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION>","error":"context canceled","message":"table resolver finished with error","module":"azure-src","table":"azure_compute_virtual_machine_extensions","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION>","error":"context canceled","message":"table resolver finished with error","module":"azure-src","table":"azure_cosmos_locations","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","grpc.code":"Internal","grpc.component":"server","grpc.error":"rpc error: code = Internal desc = failed to send message: rpc error: code = Unavailable desc = transport is closing","grpc.method":"Sync","grpc.method_type":"server_stream","grpc.service":"cloudquery.plugin.v3.Plugin","grpc.start_time":"2024-01-16T15:27:38Z","grpc.time_ms":"1.1029789e+06","message":"finished call","peer.address":"@","protocol":"grpc","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","error":"failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: remote error: tls: bad record MAC","time":"2024-01-16T15:46:01Z","message":"exiting with error"}

I am still going through trial and error to figure out the best values to tune the sync, but I was wondering if you have any insight into the bad record MAC error and/or if you have any rules of thumb for optimal batching or source concurrency and batching combination that could potentially help me here.

Thank you for the continued support!

from cloudquery.

bbernays avatar bbernays commented on September 26, 2024

@rafaelrodrigues3092 Very interesting! Thank you for that detailed update! A few questions

  1. Does the data flow through any sort of firewall? Either an appliance on the network or locally on the machine?
  2. Are you able to try running this sync on a Linux machine or at least use WSL2? I want to see if the issue is with the Windows networking stack...

from cloudquery.

rafaelrodrigues3092 avatar rafaelrodrigues3092 commented on September 26, 2024

Hi @bbernays
Thanks for getting back to me.

  1. Networking is not my forte, I just dabble a bit but from my tribal knowledge, yes the data does flow through a network SDWAN device. I will see if I can make the traffic route directly to the database as the PG database and the client server are on the same virtual network (just different subnets).

  2. Yes, I will run the sync from a Linux machine and report back.

Thank you again

from cloudquery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.