Comments (10)
Hi @bbernays
Magic! The sync completed successfully from the Linux machine - still with the very low batch_size
though.
I will run several additional syncs adjusting the concurrency and batch_size values to see if the original error is even reproducible in Linux.
I will report back with my findings.
Thank you
from cloudquery.
Hey all
I can confirm that moving to Linux resolved my problems and will proceed to close this issue.
For full context, the only difference/change was changing to a machine of a different OS flavor.
The Ubuntu VM was collocated with the Windows VM in the same subnet and their size is also the same.
Thank you all again for the support.
You guys rock!
from cloudquery.
Hi @rafaelrodrigues3092 in #12577 it seems reducing the concurrency for the Azure plugin seems to have solved the issue. Do you mind trying it again? Maybe a value of 1000
like in #12577 (comment) could work.
Also can you share which authentication method you're using? We don't recommend using az login
in production environments as that leads to memory issues, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-az-login
If you're using az login
you could try env variables instead, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-environment-variables
from cloudquery.
Hi @rafaelrodrigues3092 in #12577 it seems reducing the concurrency for the Azure plugin seems to have solved the issue. Do you mind trying it again? Maybe a value of
1000
like in #12577 (comment) could work.Also can you share which authentication method you're using? We don't recommend using
az login
in production environments as that leads to memory issues, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-az-loginIf you're using
az login
you could try env variables instead, see https://hub.cloudquery.io/plugins/source/cloudquery/azure/v11.3.0/docs?search=az#overview-authentication-with-environment-variables
Hi @erezrokah, thanks for following up.
I tested setting the concurrency first to 10000
and then down again to 1000
and still see the same issue.
Regarding authentication, yes, I am already using the environment variable approach.
I would like to note that I see the process fail less than 5 minutes into the execution. Typically within 2.5-4 minutes, the process fails.
Also, before it fails, I don't see any type of memory pressure. I do see CPU pressure though. CPU utilization goes up to 100% (from a baseline of <10%) and the memory is around ~50% (from a baseline of ~39%)
This is on a 4-core, 32GB RAM Windows system.
This was the log for execution with concurrency 1000
:
{"level":"warn","module":"cli","time":"2024-01-13T15:00:29Z","message":"when using the CloudQuery registry, it's recommended to log in via `cloudquery login`. Logging in allows for better rate limits and downloading of premium plugins"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_compute_capacity_reservation_groups","time":"2024-01-13T15:00:38Z"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_network_virtual_network_gateways","time":"2024-01-13T15:00:38Z"}
{"level":"warn","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION_ID>/resourceGroups/<AZURE_RESOURCE_GROUP>","message":"multiplex returned duplicate client","module":"azure-src","table":"azure_network_virtual_network_gateway_connections","time":"2024-01-13T15:00:38Z"}
{"level":"error","module":"cli","err":"write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","message":"BatchClose","module":"pgx","pid":2165069110,"time":587.5166,"time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","duration":1509.1935,"error":"failed to execute batch: write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","len":10000,"message":"failed to write batch","module":"pg-dest","time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","client":"subscriptions/41ae70bf-262c-4f13-85f9-dc46e7e4f48d","error":"context canceled","message":"pre resource resolver failed","module":"azure-src","table":"azure_keyvault_keyvault","time":"2024-01-13T15:03:33Z"}
{"level":"error","module":"cli","error":"failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: write tcp <SOURCE_IP>:56218-><DESTINATION_PG_IP>:6432: wsasend: An existing connection was forcibly closed by the remote host.","time":"2024-01-13T15:03:34Z","message":"exiting with error"}
Thank you!
from cloudquery.
Hey @rafaelrodrigues3092 is the Postgres running on the same machine ?
from cloudquery.
Hey @rafaelrodrigues3092 is the Postgres running on the same machine ?
Hi @yevgenypats,
Thanks for the reply. It is not.
Cloudquey is running from an Azure Windows VM with the specs listed above.
Postgres is running on an Azure Database (flexible server) - https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview
Thanks
from cloudquery.
Hi @rafaelrodrigues3092, thanks for the reply. Are you able to sync data at all or the sync fails at the beginning?
If at the beginning can you verify the DB connection string via a tool like psql
, and if not at the beginning can you try modifying batch settings? See https://hub.cloudquery.io/plugins/destination/cloudquery/postgresql/v7.1.5/docs?search=post#overview-postgresql-spec (batch_size
, batch_size_bytes
and batch_timeout
). Maybe try first with batch_size: 1
to debug
from cloudquery.
Hi @erezrokah
Thank you for the suggestions.
I ran a successful sync by lowering the batch_size to 1
, then increasing it successfully up to 8000
for a single azure subscription.
When going above 8000
for a single subscription, or when adding the all the azure subscriptions back with batch_size < 8000
, I started re-encountering the same errors.
I am limited on what logs I can see from the PG side (since it's a managed service), but I do see this message incomplete message from client
consistently appearing every time the error occurs.
After making several tweaks through trial and error to the destination spec values as shown here:
kind: destination
spec:
name: "postgresql"
registry: "github"
path: "cloudquery/postgresql"
version: "v7.1.5"
migrate_mode: "forced"
write_mode: "overwrite-delete-stale"
spec:
connection_string: "${PG_CONNECTION_STRING}"
pgx_log_level: "warn"
batch_size: 1000
batch_timeout: 120s
batch_size_bytes: 10000000
The sync ran for ~18 minutes (I got data in the database) but then it errored out with failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: remote error: tls: bad record MAC
.
This is a repeatable issue as I got the same message after 3 consecutive executions.
As an FYI - my connection string requires ssl: postgresql://<USER_NAME>:<PASSWORD>@<SERVER_NAME>:6432/<DB_NAME>?sslmode=require
Here's the log:
{"level":"error","module":"cli","err":"remote error: tls: bad record MAC","message":"BatchClose","module":"pgx","pid":3691910899,"time":596.6804,"time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","duration":794.06,"error":"failed to execute batch: remote error: tls: bad record MAC","len":1000,"message":"failed to write batch","module":"pg-dest","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION>","error":"context canceled","message":"table resolver finished with error","module":"azure-src","table":"azure_compute_virtual_machine_extensions","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","client":"subscriptions/<AZURE_SUBSCRIPTION>","error":"context canceled","message":"table resolver finished with error","module":"azure-src","table":"azure_cosmos_locations","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","grpc.code":"Internal","grpc.component":"server","grpc.error":"rpc error: code = Internal desc = failed to send message: rpc error: code = Unavailable desc = transport is closing","grpc.method":"Sync","grpc.method_type":"server_stream","grpc.service":"cloudquery.plugin.v3.Plugin","grpc.start_time":"2024-01-16T15:27:38Z","grpc.time_ms":"1.1029789e+06","message":"finished call","peer.address":"@","protocol":"grpc","time":"2024-01-16T15:46:01Z"}
{"level":"error","module":"cli","error":"failed to sync v3 source azure: write client returned error (insert): plugin returned error: failed to execute batch: remote error: tls: bad record MAC","time":"2024-01-16T15:46:01Z","message":"exiting with error"}
I am still going through trial and error to figure out the best values to tune the sync, but I was wondering if you have any insight into the bad record MAC
error and/or if you have any rules of thumb for optimal batching or source concurrency and batching combination that could potentially help me here.
Thank you for the continued support!
from cloudquery.
@rafaelrodrigues3092 Very interesting! Thank you for that detailed update! A few questions
- Does the data flow through any sort of firewall? Either an appliance on the network or locally on the machine?
- Are you able to try running this sync on a Linux machine or at least use WSL2? I want to see if the issue is with the Windows networking stack...
from cloudquery.
Hi @bbernays
Thanks for getting back to me.
-
Networking is not my forte, I just dabble a bit but from my tribal knowledge, yes the data does flow through a network SDWAN device. I will see if I can make the traffic route directly to the database as the PG database and the client server are on the same virtual network (just different subnets).
-
Yes, I will run the sync from a Linux machine and report back.
Thank you again
from cloudquery.
Related Issues (20)
- aws_costexplorer_cost_custom not populating input_hash and input_json column in postgres HOT 3
- bug: Duplicated invocationID in logs
- Document (or simplify) running of GCP plugin inside AWS Fargate HOT 1
- Implement Incremental table support for Typeform's FormResponses table.
- aws_costexplorer_cost_custom runs endless with a high number of results HOT 1
- bug: GCP recommender (15.5.0) not downloading all the recommenders available on google docs HOT 1
- feat: incremental tables for JS SDK HOT 1
- feat: Postgres intelligent updates
- Test(destination-clickhouse): ClickHouse destination CI is flaky
- bug: aws_asset_inventory transform setting all regions as 'unavailable' in view HOT 1
- feat: Add detailed information about the managed nodes to OKE tables in Oracle source plugin
- bug: Stripe to S3 integration writing same row every run
- bug: AWS Compliance pack giving error on aws_compliance__foundational_security model HOT 2
- bug: AWS -> S3 and then S3 -> PostgreSQL doesnt work HOT 5
- bug: OCI db_systems are missing HOT 3
- bug: mysql destination does not actually bulk write to the database HOT 1
- bug: Cannot specifically skip fields using `WithSkipFields` on a sub struct's field being unwrapped using `WithUnwrapStructFields`
- Bug(source-salesforce): Missing `.Name` property from business account raw JSON data
- feat: Add Storage Account Management Policies to Azure source plugin
- feat: Support primary keys on Snowflake destination plugin HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloudquery.