soundcloud / lhm Goto Github PK
View Code? Open in Web Editor NEWOnline MySQL schema migrations
License: BSD 3-Clause "New" or "Revised" License
Online MySQL schema migrations
License: BSD 3-Clause "New" or "Revised" License
Do you think it would be difficult to add options to LHM to allow for monitoring and adjustment of load if slave lag or server load grows above a threshold? At Shopify we see our replication lag grow very large and quite quickly for our bigger tables when we run LHMs which is to be expected, but it would be nice if LHM were able to monitor how hard it was pushing the cluster and back off if it got to be too hard. This would be at the expense of how long the migration took to run, but I think I'd rather a longer running migration which could come closer to guaranteeing our replication lag would remain inside some threshold.
If you guys don't want to add this or don't think you'll find the time would you give me some pointers on how I might accomplish it? Thanks!
All foreign keys to migrated table points to temporary table after migration:
CONSTRAINT `referral_visitor_offer_id_fk` FOREIGN KEY (`visitor_offer_id`)
REFERENCES `lhma_2013_06_14_12_33_29_347_visitor_offers` (`id`)
Hello,
How can you run lhm with binlog turned on?
Creating/Dropping triggers fails because the user running the migration does not have SUPER privileges.
Currently LHM has two phases to a migration. During the first phase the site stays live while writes are applied to old+new tables. During the second phase the site goes down while the journal is played back.
The amount of time the site is down during the second phase depends directly on the amount of write traffic received during the first phase.
It seems conceivable that if the first phase took long enough, LHM could automate juggling the table roles, and could perform the play-back while the site was still up. Implemented correctly, this could almost completely eliminate downtime since the process would be repeatable and the playback windows would get shorter and shorter.
Is this an issue you've experienced with your dataset? I've only experienced this once, and the downtime introduced by the playback time was only like 10 minutes after a day-long migration, but it still felt unnecessary.
Is it possible to include an option to clean up the leftover tables after the migration?
They have the form lhma_*.
I see that LHM supports only primary key column with name 'id'. Any other column name, irrespective of being an integer column is not supported.
Any historical reason behind this?
I also see that pull requests fixing this issue has been pending for a long time. See #49 and #73.
Any help is highly appreciated.
Thanks!
--Sudhir
I ran into an issue where I couldn't easily use Lhm to create a index w/ a specific key length.
Any plans to add support for sequel? https://github.com/jeremyevans/sequel
Hi,
I was writing some migration on a table, and came across a weird issue while testing in development environment. When I have a single line in a table, this line is lost during the migration process. It works well with 2, 5 or more.
I believe this comes from https://github.com/soundcloud/lhm/blob/master/lib/lhm/chunker.rb#L30.
If I understand that line well, the || (@next_to_insert == 1 && @start == 1)
part was added in case @next_to_insert == 0 && @start == 1 && @limit == 1
(if @limit != 1
, the second part of the or statement implies the first one).
Shouldn't that line be changed from while @next_to_insert < @limit || (@next_to_insert == 1 && @start == 1)
to while @next_to_insert <= @limit
?
Moreover, I had a feeling that if @limit - @start % stride == 1
, the last line would be lost. I tested that it is actually the case.
Lhm::Table::Parser is used to parse the strucutre of origin and destination tables in a migration. The resulting intersection determines which columns are copied in the chunking phase.
Lhm::Table::Parser currently parses the output of show create table. An error in the regular expression for columns causes the following table definition to parse incorrectly:
CREATE TABLE users
(
id
int(11) NOT NULL AUTO_INCREMENT,
reference
int(11) DEFAULT NULL,
username
varchar(255) DEFAULT NULL,
group
varchar(255) DEFAULT NULL,
created_at
datetime DEFAULT NULL,
comment
varchar(20) DEFAULT NULL,
description
text,
PRIMARY KEY (id
),
UNIQUE KEY index_users_on_reference
(reference
),
KEY index_users_on_username_and_created_at
(username
,created_at
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Parsing this ddl will fail to add the description
column to the parsed table.
A more robust Table parser using information_schema and no regular expressions fixes this problem.
We need to add travis ci to test under rails 4.0, and 4.1.
Simple table alteration:
Lhm.change_table :shopping_carts do |m|
m.add_column :created_at, "datetime"
end
I run the script and go to my database to see if the original table is locked:
mysql> SHOW OPEN TABLES where `Table` = 'shopping_carts';
+----------------------+----------------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------------------+----------------+--------+-------------+
| inventory_production | shopping_carts | 1 | 0 |
+----------------------+----------------+--------+-------------+
The table is locked. Isn't LHM supposed to prevent this? I assume I missed something obvious but don't see what.
Gemfile:
source 'https://rubygems.org'
gem 'activerecord'
gem 'mysql2'
gem 'lhm'
The documentation mentions, "The test suite is also run against MRI 2.0.0 in Continuous Integration, but there are a few bugs left to fix.". What are some of these bugs? I'd like to help, if possible.
Even if the migration runs successfully, it seems like the temporary tables remain. I have to manually call Lhm.cleanup(true)
after each migration.
Hi there!
I'm currently trying to solve some pains we're having at work with DDL migrations on really huge tables. We've got a reasonably standard 3 node Galaera cluster set up. Right now, we're running these migrations by hand using RSU mode and a console, since the default TOI method really screws us up.
Would you be able to please answer these two questions?
Has LHM been tested with Galera clusters, and is it still able to play nice and be all non-locky while being executed in TOI mode?
How well does it cope with writes coming to the table being altered while the migration is in progress? I assume it behaves comparably to pt-online-schema-change?
Thanks so much for your time!
<3 Ry
From the MySQL docs on the matter:
You might run this statement in your setup scripts after representative data has been loaded into the table, and run it periodically after DML operations significantly change the contents of indexed columns, or on a schedule at times of low activity.
Because it involves a table read-lock, I would propose it's a good opportunity to do this after copying all the data into the new table, before swapping the names back. Thoughts?
It would be great to have an example of the project README of how to add a column with a default value.
Hi,
I'm currently using a :git reference in my Gemfile to LHM because I need the Mysql2 support that not in the current gem release, but I'd much prefer to switch back to using the gem so bundler can pack it properly. Any chance you could push out a new version for me please?
Cheers,
Lucas Parry
we define the method extract_primary_key
https://github.com/soundcloud/lhm/blob/master/lib/lhm/table.rb#L96 . I am pretty much sure that was done because we had to support different adapters. Now as we only support ActiveRecord, we can use the method from their connection object.
If you add an index with a few columns, the default index name function can build a name that's too long for MySQL.
It would be great if there was an option to specify a custom index name.
There's a fair few patches applied to master that haven't been released yet. I would love to use some of the new capability and I'm sure many others would as well. Are there any plans for an imminent release? Is there any work that needs to be done to get to a release? (Can I help?)
FYI here's a list:
Lately we used LHM for some bigger migrations on a big (22 mio rows) database table with a high load. During that migration (took 20 minutes with default stride size and throttle time) we had several dead locks that occured, because the app tried to update table rows, that were currently be copied in a chunk.
Previously we used the percona toolkit, where we didn't had this problem while migrating. After bit of investigating, we figured, that they do something really smart: They calculate the size of the chunk that is being copied dynamically to always stay within a given time (--chunk_size_time
). I found a blog article about that here: http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/.
Of course we can estimate (or try out different values) the appropriate stride
value for the throttler, depending on the write load on that table. But this would be a bit cumbersome and has to be done per table we want to migrate.
So I would suggest to update the throttler (or create a dynamic throttler) so it calculates the size of next chunk to copy depending on the runtime of last run.
What do you think?
This is a question not an issue, but I wasn't sure the best place to ask.
After a migration (using lhm) has run does the rails server require a restart to clear the ActiveRecord Cache? I originally thought it did but I ran a few quick tests and am now not sure.
Example
User model
class User < ActiveRecord::Base
...
def get_name
"TEST - #{self.name}"
end
end
Before Migration
> User.first
#<User id: 1>
> User.first.name
undefined method `name' for #<User:...>
> User.first.get_name
undefined method `name' for #<User:...>
Run a migration to add name column to user model
After Migration
> User.first
#<User id: 1>
> User.first.name
nil
> User.first.get_name
=> "TEST - "
Hi,
Until today, I was (happily) using large-hadron-migrator
v0.2.1 and running my deployments with Capistrano.
Each migration with LHM was very verbose, but it was also very useful to know te progression.
A couple of hours ago, I've noticed that you release lhm
v1.1.0 and I've started using it.
It seems better, but I don't have any output during the migration.
Since my migrations can take hours, I'd like to know what the progression is.
Is there a way to activate a logging level (for future migrations) or a way to see the progress of a running migration ?
Thanks a lot for your help and for making/maintaining such a great tool.
Is anyone else experiencing failing tests on master as of now?
I am seeing 2 failing tests:
This is a snippet of the error output:
Mysql::Error: Table 'addscolumn' already exists: rename table new_addscolumn to addscolumn, addscolumn to 2011_06_12_16_51_51_3N_addscolumn
Looks like the Hash order changes when passed into the rename_tables method.
I am running Ruby 1.8.7, and the command: spec spec in my terminal window
Hello,
we migrated from a RDS-backed MySQL 5.5 to a newer generation RDS MySQL 5.6, and the first attempt to add an index using lhm gives us this error:
I, [2014-10-20T13:56:11.100569 #8934] INFO -- : Starting LHM run on table=lhmn_pushes
I, [2014-10-20T13:56:11.102629 #8934] INFO -- : Starting run of class=Lhm::Migrator
I, [2014-10-20T13:56:11.344552 #8934] INFO -- : Starting run of class=Lhm::Entangler
E, [2014-10-20T13:56:11.350324 #8934] ERROR -- : Error in class=Lhm::Entangler, reverting. exception=ActiveRecord::StatementInvalid message=Mysql2::Error: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable): create trigger `lhmt_del_pushes`
after delete on `pushes` for each row
delete ignore from `lhmn_pushes` /* large hadron migration */
where `lhmn_pushes`.`id` = OLD.`id` /* large hadron migration */
rake aborted!
An error has occurred, all later migrations canceled:
Mysql2::Error: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable): drop trigger if exists `lhmt_del_pushes` /* large hadron migration */
After discussing this with my client's sysadmin, SUPER cannot be used on RDS apparently, and binary logging is always on for point in time recovery.
This is also the first time we use LHM to add an index (previously it was just for columns).
Is the error message reliable here?
Did anyone meet this issue and can it be solved?
thanks!
Tables that support deletions could have large gaps between ids and have large max id values, despite not having as many rows.
Lhm currently does not detect whether the block that it is copying is empty or not, which slows down the migration significantly because the delay is still incurred.
Just wondering if this feature is in the pipeline to be supported in the next release? I'm hoping to adopt lhm as a solution for large table migrations and it'll be awesome to have this feature.
Thanks
Curious if anyone else has experienced a deadlock during an LHM migration where, for instance, on a busy table Transaction1 (replace into...) is waiting for Transaction2 (insert into...) with lock mode AUTO-INC
and the transactions get rolled back?
In the past we've seen this once, and the most obvious answer seems that it's just an auto-increment lock deadlock. However, for most cases you're copying over the id
field from table-to-table, which is typically an auto-increment field.
To fix this one could remove the auto-increment from that field before the LHM, and add it back to the new table's field right before the table switch. Another option would be to set innodb_autoinc_lock_mode = 2 in mysql temporarily until the table switch.
My question is, would this be a good configurable option for LHM, since it applies to a typical database setup?
Does it work with any master/slave adapters that work with ActiveRecord? I see the mention about https://github.com/d11wtq/dm-master-slave-adapter which works with DataMapper.
I wanted to know if it works with https://github.com/taskrabbit/makara or something similar.
I am sorry to ask this question as an issue! I couldn't find a mailing list where I could ask questions. If there's one could anyone let me know about it please?
I have a large table that gets regularly archived, so the IDs don't start at 1. This makes the percentage complete LHM reports way off.
jasons-mbp:panda jason$ rake db:migrate
SESSION STORE: active_record_store
CACHE STORE: file_store
== AuditedChanges: migrating =================================================
-- index_exists?(:audits, [:association_id, :association_type], {:name=>"association_index"})
-> 0.0024s
I, [2015-12-14T09:43:14.193077 #6202] INFO -- : Starting LHM run on table=lhmn_audits
I, [2015-12-14T09:43:14.195034 #6202] INFO -- : Starting run of class=Lhm::Migrator
I, [2015-12-14T09:43:14.296403 #6202] INFO -- : Starting run of class=Lhm::Entangler
I, [2015-12-14T09:43:14.392986 #6202] INFO -- : Starting run of class=Lhm::Chunker
I, [2015-12-14T09:43:14.395018 #6202] INFO -- : Starting run of class=Lhm::Throttler::Time
89.22% (332992893/373246177) completeI, [2015-12-14T09:43:16.593463 #6202] INFO -- : Starting run of class=Lhm::Throttler::Time
89.23% (333032893/373246177) completeI, [2015-12-14T09:43:18.900360 #6202] INFO -- : Starting run of class=Lhm::Throttler::Time
89.24% (333072893/373246177) completeI, [2015-12-14T09:43:21.189020 #6202] INFO -- : Starting run of class=Lhm::Throttler::Time
89.25% (333112893/373246177) completeI, [2015-12-14T09:43:23.605645 #6202] INFO -- : Starting run of class=Lhm::Throttler::Time
I realize that the percentage complete will not be perfect if there are missing IDs, but to make it closer to perfect, perhaps check what the first ID is in the table and calculate based on that rather than assuming 1?
See http://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-innodb_lock_wait_timeout.html -- Before v5.5, you can't set this parameter without restarting the server.
Since Invoker#set_session_lock_wait_timeouts
is not documented, I'm not sure what the consequences of skipping this step are, but I commented out this line--https://github.com/soundcloud/lhm/blob/58317e374dbf7f799ec592521307e8c4ae0d277b/lib/lhm/invoker.rb#L42--and it seems to be working on a DB table with ~1 million records. Maybe it's worth checking the Mysql version and skipping this step (and logging a warning?) if it's an older version. At the very least, please provide some documentation explaining the purpose of Invoker#set_session_lock_wait_timeouts
.
Docs:
Lhm chooses the strategy automatically based on the used MySQL server version, but you can override the behavior with an option.
Reality:
Using mysql 5.1.63-rel13.4-log. You must explicitly set options[:atomic_switch](re SqlHelper#supports_atomic_switch?)
Occasionally I have a large-ish table copied in my dev db and kick off an LHM migration, but I don't want to take the minutes/hours it takes for it to run in my dev environment. In these cases, I generally kill the migration, TRUNCATE
the table & kick the migration off again, but sometimes I need to clean up lhmn_*
and lhma_*
tables created by LHM before I can do so. I wrote this Rake task to do just that. Just thought I'd check to see if there's any interest in including this or something similar either in LHM itself or even just in the README. If there is interest in either, I'm happy to work up a pull request. If not, no sweat. :)
namespace :db do
desc "Cleanup after botched LHM migrations"
task :cleanup_lhm => :environment do
db = ActiveRecord::Base.connection
tables = []
%w(lhmn_ lhma_).each do |prefix|
tables << db.execute("SHOW TABLES LIKE '#{prefix}_%'").to_a
end
tables.flatten!
if tables.any?
puts "Dropping #{tables.join(', ')}"
tables.each do |table|
db.execute("DROP TABLE #{table}")
end
else
puts "No LHM tables found to clean up!"
end
end
end
Any plans to add Rails 3 support?
Is there any obvious way to throttle I/O ?
AWS instances are not very good in I/O, and running LHM migration pretty much takes all of it)
I was wondering if it would be possible to create a new release of this gem. It seems like there have been some things merged, but never released (like this #39).
Could someone tell me the correct syntax for renaming a table using Lhm? When I do something like this I get an undefined method error.
Lhm.change_table :foo do |t|
t.rename_table :bar
end
Has anyone else had a need to for a verification step. An assurance that the migration has completed successfully. e.g.
Lhm.change_table :my_table do |table|
table.ddl "ALTER TABLE `#{table.name}` DEFAULT CHARACTER SET utf8mb4, DEFAULT COLLATE utf8mb4_unicode_ci"
verification do |table|
result = table.query "SELECT count(*) FROM {#table.name) as new_table JOIN #{table.name} as old_table USING (id) WHERE new_table.name != old_table.name"
result.first[0] == 0
end
end
The verification block would be executed after the Chunker and before the Switcher. If it evaluated to false the change would be rolled back.
Does anyone else have a need for this and what strategies do you currently use? Does a feature like this fit into this gem?
I don't think it makes sense to always check unique & foreign key constraints within the background copy operation, since all the inserts are done against the primary key. Copied data would already be consistent in the original table.
Something like this could speed things up, I'm seeing an improvement in my use case.
module Lhm
class Chunker
module DisableKeyChecks
def without_key_checks
connection.execute(
"SET @old_unique_checks = @@unique_checks, " \
"@old_foreign_key_checks = @@foreign_key_checks, " \
"unique_checks = 0, " \
"foreign_key_checks = 0;"
)
yield
ensure
connection.execute(
"SET unique_checks = @old_unique_checks, " \
"foreign_key_checks = @old_foreign_key_checks;"
)
end
def execute_without_key_checks
without_key_checks do
execute_with_key_checks
end
end
def self.included(m)
m.class_eval do
alias_method :execute_with_key_checks, :execute
alias_method :execute, :execute_without_key_checks
end
end
end
include DisableKeyChecks
end
end
This could be enabled with an option, or when it can be proven safe (when the migration doesn't add any new constraints). Thoughts?
While copying data to the table, I experienced a deadlock that aborted the migration halfway through.
I'm guessing deadlocks happen if the INSERT statement and a trigger both try to touch the same row. Some possible solutions:
deadlock_retries: 1
to handle it when the query gets dropped?I am using Lhm for altering a table but the existing triggers doesn't get copied when the lhm table is renamed with the old table name.
Consider this
I have a table with trigger 'x' now lhm adds three more triggers create,update and delete and when the copying of table is complete it deletes the trigger and renames the lhm table with the old table but after it renames trigger 'x' is missing
Since LHM tasks are the ones creating lhma_ tables when renaming the old tables on an online schema change, it would be nice that the same code provides an option to drop those that were created 'n' days ago.
Something that could be run like:
bundle exec rake lhm:clean --older_than=10
Hi! Are there any plans for a new versioned release that includes the slave_lag throttler that's currently in Master?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.