Comments (9)
Thanks for catching the lack of caching on _get_redshift_constraints
. Sounds like an easy optimization.
When running something like alembic which does a lot of reflection, this can slow things down considerably because the dialect winds up querying information about the entire database every time alembic asks for anything
I believe there's a fundamental constraint in SQLAlchemy's API for reflection that allows caching only within the context of reflecting a single table. I originally wrote the reflection code in this dialect with bulk queries like _get_all_relation_info
hoping that we could reconstruct the entire database schema with just a few queries, but I don't believe this is possible.
I do hope that using cache_info for the constraints query will speed things up a bit, but reflection across many tables is likely going to continue to be slow.
from sqlalchemy-redshift.
PR for this in #102
from sqlalchemy-redshift.
Also see #103 for more thoughts on persisting schema info across multiple tables.
from sqlalchemy-redshift.
I think your approach of doing just a few queries actually does work. You're right that the reflection methods that make up the public interface of the dialect generally require a table name, and that caching the result of those methods by itself won't reduce the number of queries that you have to make, but caching the result of your _get_all_*
methods, which return information about the whole database, does definitely speed things up considerably. With the change you made in PR #102, a test that we have for making sure that our migrations are up to date goes from 2+ minutes to about 3 seconds.
from sqlalchemy-redshift.
a test that we have for making sure that our migrations are up to date goes from 2+ minutes to about 3 seconds
Wow. That is much more improvement than I expected.
from sqlalchemy-redshift.
Well, previously that query in _get_all_constraint_info
was being run over and over and over again. Now it only gets run once.
from sqlalchemy-redshift.
Thank you again for inspecting the code enough to figure out what was going on. I look forward to seeing whether this significantly improves some of my own use cases.
from sqlalchemy-redshift.
And thank you for the quick response. :)
It's an especially large gain for us because we often have a bunch of stale schemas in this cluster. Therefore, the _get_all_constraint_info
query slows down slightly, and the cumulative effect of running that slightly slower query over and over again can cause the test that I mentioned to take 2 minutes or longer. I guess you probably won't notice as much of a speedup, but probably still some.
from sqlalchemy-redshift.
This change is now merged to master.
from sqlalchemy-redshift.
Related Issues (20)
- Log handler added in v0.8.13 produces duplicate logs in application
- Regression in v0.8.13: table reflection broken for non-superusers HOT 2
- Do not run integration tests with dbuser having elevated permissions HOT 1
- Migration error: 'Relation "alembic_version" already exists'
- Alembic migration issue: alter table
- Alembic migrations use RETURNING on update
- Performance degradation and memory peaks when updating from 0.8.12
- Class method needs updating HOT 1
- Table Reflection Slowness
- Temp Table Reflection
- Programming Error with percentile_cont HOT 1
- statements save OIDs unless commited
- Proposal: Remove pkg_resources from the fast path
- Any way to support sqlalchemy >= 2.0? HOT 3
- util.text_type issue HOT 1
- attach query group while running redshift queries
- Reflection does not populate primary key column if its name requres to be enclosed in double quotes
- sqlalchemy-redshift installing older SQLAlchemy version HOT 1
- get_table_oid results in an error for external tables
- Please update this library to support sqlalchemy 2.*
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sqlalchemy-redshift.