soundcloud / lhm Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 189.0 518 KB

Online MySQL schema migrations

License: BSD 3-Clause "New" or "Revised" License

Ruby 96.84% Shell 3.16%

lhm's People

Contributors

Stargazers

Watchers

Forkers

dominixz sakool999 happiercamper1 dappel awd duncanbeevers sofatutor grobie jlecour odorcicd titanous kickstarter ssoroka datwright rickbenavidez wisq lparry elhu wistia gotryiton gauravmc vinbarnes epictetus mikedouglas jaygen karuna freshworks dreamfrog haileys psi fcsonline erikogan arthurnn 8090 airhorns snormore daihu drewish livestreamer willemave josephu stevehodgkiss byroot thetamind mledom sayap whisper1 rlhh shakthi92 ssoroka-se kanasite joelr jordie blackhat06 leandrooli codeslinger ags pomier phacops rayleyva hannestyden ofabulosodestino cswaroop meisterlabs mirceapreotu forrestvenable lemonhall bluetechy imjasonh arturo-c civisanalytics avit bolshakov christophermanning taonic jordanwheeler t3hk0d3 sj26 streamsix tapresearch mdesjardins nathanhoel otzy007 99plus2 elvinefendi tomzhang m1foley teespring prayagverma edk mishaconway swalkinshaw taskrabbit leonlee arunsnt nerobrain dikang123 girasquid edgarlanting rugby110

lhm's Issues

Feature request: `max_lag` and/or `max_load` options

Do you think it would be difficult to add options to LHM to allow for monitoring and adjustment of load if slave lag or server load grows above a threshold? At Shopify we see our replication lag grow very large and quite quickly for our bigger tables when we run LHMs which is to be expected, but it would be nice if LHM were able to monitor how hard it was pushing the cluster and back off if it got to be too hard. This would be at the expense of how long the migration took to run, but I think I'd rather a longer running migration which could come closer to guaranteeing our replication lag would remain inside some threshold.

If you guys don't want to add this or don't think you'll find the time would you give me some pointers on how I might accomplish it? Thanks!

Lhm do not work correctly when table has foreign keys

All foreign keys to migrated table points to temporary table after migration:

 CONSTRAINT `referral_visitor_offer_id_fk` FOREIGN KEY (`visitor_offer_id`) 
REFERENCES `lhma_2013_06_14_12_33_29_347_visitor_offers` (`id`)

running lhm with binlog on

Hello,

How can you run lhm with binlog turned on?

Creating/Dropping triggers fails because the user running the migration does not have SUPER privileges.

LHM Juggle

Currently LHM has two phases to a migration. During the first phase the site stays live while writes are applied to old+new tables. During the second phase the site goes down while the journal is played back.

The amount of time the site is down during the second phase depends directly on the amount of write traffic received during the first phase.

It seems conceivable that if the first phase took long enough, LHM could automate juggling the table roles, and could perform the play-back while the site was still up. Implemented correctly, this could almost completely eliminate downtime since the process would be repeatable and the playback windows would get shorter and shorter.

Is this an issue you've experienced with your dataset? I've only experienced this once, and the downtime introduced by the playback time was only like 10 minutes after a day-long migration, but it still felt unnecessary.

:resume option that skips creating copy table and triggers

Drop DataMapper support

Wondering if lots of people are relying on LHM + DM support?
Having to code/review/test for AR + DM, just make things more complicated.

@grobie WDYT ?

cc @camilo

Clean up leftover tables

Is it possible to include an option to clean up the leftover tables after the migration?

They have the form lhma_*.

LHM doesn't consider primary keys other than the column with the name 'id'

I see that LHM supports only primary key column with name 'id'. Any other column name, irrespective of being an integer column is not supported.

Any historical reason behind this?

I also see that pull requests fixing this issue has been pending for a long time. See #49 and #73.

Any help is highly appreciated.

Thanks!
--Sudhir

Support keys with a specific length when creating indexes

I ran into an issue where I couldn't easily use Lhm to create a index w/ a specific key length.

support for sequel?

Any plans to add support for sequel? https://github.com/jeremyevans/sequel

Forgotten line

Hi,

I was writing some migration on a table, and came across a weird issue while testing in development environment. When I have a single line in a table, this line is lost during the migration process. It works well with 2, 5 or more.

I believe this comes from https://github.com/soundcloud/lhm/blob/master/lib/lhm/chunker.rb#L30.
If I understand that line well, the || (@next_to_insert == 1 && @start == 1) part was added in case @next_to_insert == 0 && @start == 1 && @limit == 1 (if @limit != 1, the second part of the or statement implies the first one).

Shouldn't that line be changed from while @next_to_insert < @limit || (@next_to_insert == 1 && @start == 1) to while @next_to_insert <= @limit?

Moreover, I had a feeling that if @limit - @start % stride == 1, the last line would be lost. I tested that it is actually the case.

table parser does not parse columns with only name and type. this leads to data loss over these column types during the copy phase/

Lhm::Table::Parser is used to parse the strucutre of origin and destination tables in a migration. The resulting intersection determines which columns are copied in the chunking phase.

Lhm::Table::Parser currently parses the output of show create table. An error in the regular expression for columns causes the following table definition to parse incorrectly:

CREATE TABLE users (
id int(11) NOT NULL AUTO_INCREMENT,
reference int(11) DEFAULT NULL,
username varchar(255) DEFAULT NULL,
group varchar(255) DEFAULT NULL,
created_at datetime DEFAULT NULL,
comment varchar(20) DEFAULT NULL,
description text,
PRIMARY KEY (id),
UNIQUE KEY index_users_on_reference (reference),
KEY index_users_on_username_and_created_at (username,created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Parsing this ddl will fail to add the description column to the parsed table.

A more robust Table parser using information_schema and no regular expressions fixes this problem.

Add tests for rails 4

We need to add travis ci to test under rails 4.0, and 4.1.

LHM is locking my table

Simple table alteration:

Lhm.change_table :shopping_carts do |m|
  m.add_column :created_at, "datetime"
end

I run the script and go to my database to see if the original table is locked:

mysql> SHOW OPEN TABLES where `Table` = 'shopping_carts';
+----------------------+----------------+--------+-------------+
| Database             | Table          | In_use | Name_locked |
+----------------------+----------------+--------+-------------+
| inventory_production | shopping_carts |      1 |           0 |
+----------------------+----------------+--------+-------------+

The table is locked. Isn't LHM supposed to prevent this? I assume I missed something obvious but don't see what.

Gemfile:

source 'https://rubygems.org'

gem 'activerecord'
gem 'mysql2'
gem 'lhm'

What bugs have been discovered using MRI 2.0.0?

The documentation mentions, "The test suite is also run against MRI 2.0.0 in Continuous Integration, but there are a few bugs left to fix.". What are some of these bugs? I'd like to help, if possible.

Temporary tables are not cleaned up automatically

Even if the migration runs successfully, it seems like the temporary tables remain. I have to manually call Lhm.cleanup(true) after each migration.

How well does LHM play with Galera clusters?

Hi there!

I'm currently trying to solve some pains we're having at work with DDL migrations on really huge tables. We've got a reasonably standard 3 node Galaera cluster set up. Right now, we're running these migrations by hand using RSU mode and a console, since the default TOI method really screws us up.

Would you be able to please answer these two questions?

Has LHM been tested with Galera clusters, and is it still able to play nice and be all non-locky while being executed in TOI mode?

How well does it cope with writes coming to the table being altered while the migration is in progress? I assume it behaves comparably to pt-online-schema-change?

Thanks so much for your time!

<3 Ry

Run ANALYZE TABLE after table copy

From the MySQL docs on the matter:

You might run this statement in your setup scripts after representative data has been loaded into the table, and run it periodically after DML operations significantly change the contents of indexed columns, or on a schedule at times of low activity.

Because it involves a table read-lock, I would propose it's a good opportunity to do this after copying all the data into the new table, before swapping the names back. Thoughts?

Document how to set a default column value in README

It would be great to have an example of the project README of how to add a column with a default value.

Can you please release a new version on rubygems

Hi,

I'm currently using a :git reference in my Gemfile to LHM because I need the Mysql2 support that not in the current gem release, but I'd much prefer to switch back to using the gem so bundler can pack it properly. Any chance you could push out a new version for me please?

Cheers,

Lucas Parry

Docs for usage with non Rails/DataMapper projects

Lhm.cleanup needs to also clean up triggers

Use extract_primary_key from AR

we define the method extract_primary_key https://github.com/soundcloud/lhm/blob/master/lib/lhm/table.rb#L96 . I am pretty much sure that was done because we had to support different adapters. Now as we only support ActiveRecord, we can use the method from their connection object.

Custom index names

If you add an index with a few columns, the default index name function can build a name that's too long for MySQL.

It would be great if there was an option to specify a custom index name.

Release looming?

There's a fair few patches applied to master that haven't been released yet. I would love to use some of the new capability and I'm sure many others would as well. Are there any plans for an imminent release? Is there any work that needs to be done to get to a release? (Can I help?)

FYI here's a list:

v2.0.0...master

Reconnect when MySQL connection terminates

Calculate Stride Size dynamically to avoid DeadLocks

Lately we used LHM for some bigger migrations on a big (22 mio rows) database table with a high load. During that migration (took 20 minutes with default stride size and throttle time) we had several dead locks that occured, because the app tried to update table rows, that were currently be copied in a chunk.

Previously we used the percona toolkit, where we didn't had this problem while migrating. After bit of investigating, we figured, that they do something really smart: They calculate the size of the chunk that is being copied dynamically to always stay within a given time (--chunk_size_time). I found a blog article about that here: http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/.

Of course we can estimate (or try out different values) the appropriate stride value for the throttler, depending on the write load on that table. But this would be a bit cumbersome and has to be done per table we want to migrate.

So I would suggest to update the throttler (or create a dynamic throttler) so it calculates the size of next chunk to copy depending on the runtime of last run.

What do you think?

Question: After an LHM migration does rails require a restart?

This is a question not an issue, but I wasn't sure the best place to ask.

After a migration (using lhm) has run does the rails server require a restart to clear the ActiveRecord Cache? I originally thought it did but I ran a few quick tests and am now not sure.

Example

User model

class User < ActiveRecord::Base
 ...
  def get_name
    "TEST - #{self.name}"
  end
end

Before Migration

> User.first
#<User id: 1>
> User.first.name
undefined method `name' for #<User:...>
> User.first.get_name
undefined method `name' for #<User:...>

Run a migration to add name column to user model

After Migration

> User.first
#<User id: 1>
> User.first.name
nil
> User.first.get_name
=> "TEST - "

Confirm before switching table

Migration verbosity

Hi,

Until today, I was (happily) using large-hadron-migrator v0.2.1 and running my deployments with Capistrano.
Each migration with LHM was very verbose, but it was also very useful to know te progression.

A couple of hours ago, I've noticed that you release lhm v1.1.0 and I've started using it.
It seems better, but I don't have any output during the migration.

Since my migrations can take hours, I'd like to know what the progression is.

Is there a way to activate a logging level (for future migrations) or a way to see the progress of a running migration ?

Thanks a lot for your help and for making/maintaining such a great tool.

Tests failing?

Is anyone else experiencing failing tests on master as of now?

I am seeing 2 failing tests:

LargeHadronMigration integration should add new column
LargeHadronMigration integration should have same row data

This is a snippet of the error output:
Mysql::Error: Table 'addscolumn' already exists: rename table new_addscolumn to addscolumn, addscolumn to 2011_06_12_16_51_51_3N_addscolumn

Looks like the Hash order changes when passed into the rename_tables method.

I am running Ruby 1.8.7, and the command: spec spec in my terminal window

"You do not have the SUPER privilege and binary logging is enabled"

Hello,

we migrated from a RDS-backed MySQL 5.5 to a newer generation RDS MySQL 5.6, and the first attempt to add an index using lhm gives us this error:

I, [2014-10-20T13:56:11.100569 #8934]  INFO -- : Starting LHM run on table=lhmn_pushes
I, [2014-10-20T13:56:11.102629 #8934]  INFO -- : Starting run of class=Lhm::Migrator
I, [2014-10-20T13:56:11.344552 #8934]  INFO -- : Starting run of class=Lhm::Entangler
E, [2014-10-20T13:56:11.350324 #8934] ERROR -- : Error in class=Lhm::Entangler, reverting. exception=ActiveRecord::StatementInvalid message=Mysql2::Error: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable): create trigger `lhmt_del_pushes`
after delete on `pushes` for each row
delete ignore from `lhmn_pushes` /* large hadron migration */
where `lhmn_pushes`.`id` = OLD.`id` /* large hadron migration */
rake aborted!
An error has occurred, all later migrations canceled:

Mysql2::Error: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable): drop trigger if exists `lhmt_del_pushes` /* large hadron migration */

After discussing this with my client's sysadmin, SUPER cannot be used on RDS apparently, and binary logging is always on for point in time recovery.

This is also the first time we use LHM to add an index (previously it was just for columns).

Is the error message reliable here?

Did anyone meet this issue and can it be solved?

thanks!

Feature request: Support tables with gaps in the primary key column 'id'

Tables that support deletions could have large gaps between ids and have large max id values, despite not having as many rows.

Lhm currently does not detect whether the block that it is copying is empty or not, which slows down the migration significantly because the delay is still incurred.

Just wondering if this feature is in the pipeline to be supported in the next release? I'm hoping to adopt lhm as a solution for large table migrations and it'll be awesome to have this feature.

Thanks

Discussion: auto-inc lock deadlocks?

Curious if anyone else has experienced a deadlock during an LHM migration where, for instance, on a busy table Transaction1 (replace into...) is waiting for Transaction2 (insert into...) with lock mode AUTO-INC and the transactions get rolled back?

In the past we've seen this once, and the most obvious answer seems that it's just an auto-increment lock deadlock. However, for most cases you're copying over the id field from table-to-table, which is typically an auto-increment field.

To fix this one could remove the auto-increment from that field before the LHM, and add it back to the new table's field right before the table switch. Another option would be to set innodb_autoinc_lock_mode = 2 in mysql temporarily until the table switch.

My question is, would this be a good configurable option for LHM, since it applies to a typical database setup?

Does master/slave mode work with any ActiveRecord Adapters?

Does it work with any master/slave adapters that work with ActiveRecord? I see the mention about https://github.com/d11wtq/dm-master-slave-adapter which works with DataMapper.

I wanted to know if it works with https://github.com/taskrabbit/makara or something similar.

I am sorry to ask this question as an issue! I couldn't find a mailing list where I could ask questions. If there's one could anyone let me know about it please?

Completion percentage is wrong when IDs don't start at 1

I have a large table that gets regularly archived, so the IDs don't start at 1. This makes the percentage complete LHM reports way off.

jasons-mbp:panda jason$ rake db:migrate
SESSION STORE: active_record_store
CACHE STORE: file_store
==  AuditedChanges: migrating =================================================
-- index_exists?(:audits, [:association_id, :association_type], {:name=>"association_index"})
   -> 0.0024s
I, [2015-12-14T09:43:14.193077 #6202]  INFO -- : Starting LHM run on table=lhmn_audits
I, [2015-12-14T09:43:14.195034 #6202]  INFO -- : Starting run of class=Lhm::Migrator
I, [2015-12-14T09:43:14.296403 #6202]  INFO -- : Starting run of class=Lhm::Entangler
I, [2015-12-14T09:43:14.392986 #6202]  INFO -- : Starting run of class=Lhm::Chunker
I, [2015-12-14T09:43:14.395018 #6202]  INFO -- : Starting run of class=Lhm::Throttler::Time
89.22% (332992893/373246177) completeI, [2015-12-14T09:43:16.593463 #6202]  INFO -- : Starting run of class=Lhm::Throttler::Time
89.23% (333032893/373246177) completeI, [2015-12-14T09:43:18.900360 #6202]  INFO -- : Starting run of class=Lhm::Throttler::Time
89.24% (333072893/373246177) completeI, [2015-12-14T09:43:21.189020 #6202]  INFO -- : Starting run of class=Lhm::Throttler::Time
89.25% (333112893/373246177) completeI, [2015-12-14T09:43:23.605645 #6202]  INFO -- : Starting run of class=Lhm::Throttler::Time

I realize that the percentage complete will not be perfect if there are missing IDs, but to make it closer to perfect, perhaps check what the first ID is in the table and calculate based on that rather than assuming 1?

lock_wait_timeout is readonly in older versions of SQL

See http://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-innodb_lock_wait_timeout.html -- Before v5.5, you can't set this parameter without restarting the server.

Since Invoker#set_session_lock_wait_timeouts is not documented, I'm not sure what the consequences of skipping this step are, but I commented out this line--https://github.com/soundcloud/lhm/blob/58317e374dbf7f799ec592521307e8c4ae0d277b/lib/lhm/invoker.rb#L42--and it seems to be working on a DB table with ~1 million records. Maybe it's worth checking the Mysql version and skipping this step (and logging a warning?) if it's an older version. At the very least, please provide some documentation explaining the purpose of Invoker#set_session_lock_wait_timeouts.

Docs indicate atomic_switch is autodetected

Docs:

Lhm chooses the strategy automatically based on the used MySQL server version, but you can override the behavior with an option.

Reality:

Using mysql 5.1.63-rel13.4-log. You must explicitly set options[:atomic_switch](re SqlHelper#supports_atomic_switch?)

Rake task for cleaning up interrupted runs

Occasionally I have a large-ish table copied in my dev db and kick off an LHM migration, but I don't want to take the minutes/hours it takes for it to run in my dev environment. In these cases, I generally kill the migration, TRUNCATE the table & kick the migration off again, but sometimes I need to clean up lhmn_* and lhma_* tables created by LHM before I can do so. I wrote this Rake task to do just that. Just thought I'd check to see if there's any interest in including this or something similar either in LHM itself or even just in the README. If there is interest in either, I'm happy to work up a pull request. If not, no sweat. :)

namespace :db do
  desc "Cleanup after botched LHM migrations"
  task :cleanup_lhm => :environment do
    db     = ActiveRecord::Base.connection
    tables = []

    %w(lhmn_ lhma_).each do |prefix|
      tables << db.execute("SHOW TABLES LIKE '#{prefix}_%'").to_a
    end

    tables.flatten!

    if tables.any?
      puts "Dropping #{tables.join(', ')}"

      tables.each do |table|
        db.execute("DROP TABLE #{table}")
      end
    else
      puts "No LHM tables found to clean up!"
    end
  end
end

Rails 3 support

Any plans to add Rails 3 support?

I/O regulation?

Is there any obvious way to throttle I/O ?
AWS instances are not very good in I/O, and running LHM migration pretty much takes all of it)

Time for a new release?

I was wondering if it would be possible to create a new release of this gem. It seems like there have been some things merged, but never released (like this #39).

Undefined method 'rename_table' for Lhm

Could someone tell me the correct syntax for renaming a table using Lhm? When I do something like this I get an undefined method error.

Lhm.change_table :foo do |t|
  t.rename_table :bar
end

Feature Request: Verification Step

Has anyone else had a need to for a verification step. An assurance that the migration has completed successfully. e.g.

Lhm.change_table :my_table do |table|
  table.ddl "ALTER TABLE `#{table.name}` DEFAULT CHARACTER SET utf8mb4, DEFAULT COLLATE utf8mb4_unicode_ci"
  verification do |table|
    result = table.query "SELECT count(*) FROM {#table.name) as new_table JOIN #{table.name} as old_table USING (id) WHERE new_table.name != old_table.name"
    result.first[0] == 0
  end
end

The verification block would be executed after the Chunker and before the Switcher. If it evaluated to false the change would be rolled back.

Does anyone else have a need for this and what strategies do you currently use? Does a feature like this fit into this gem?

Disable secondary indexes during table copying

I don't think it makes sense to always check unique & foreign key constraints within the background copy operation, since all the inserts are done against the primary key. Copied data would already be consistent in the original table.

Something like this could speed things up, I'm seeing an improvement in my use case.

module Lhm
  class Chunker
    module DisableKeyChecks

      def without_key_checks
        connection.execute(
          "SET @old_unique_checks = @@unique_checks, " \
              "@old_foreign_key_checks = @@foreign_key_checks, " \
              "unique_checks = 0, " \
              "foreign_key_checks = 0;"
        )
        yield
      ensure
        connection.execute(
          "SET unique_checks = @old_unique_checks, " \
              "foreign_key_checks = @old_foreign_key_checks;"
        )
      end

      def execute_without_key_checks
        without_key_checks do
          execute_with_key_checks
        end
      end

      def self.included(m)
        m.class_eval do
          alias_method :execute_with_key_checks, :execute
          alias_method :execute, :execute_without_key_checks
        end
      end
    end

    include DisableKeyChecks

  end
end

This could be enabled with an option, or when it can be proven safe (when the migration doesn't add any new constraints). Thoughts?

Handle deadlocks

While copying data to the table, I experienced a deadlock that aborted the migration halfway through.

I'm guessing deadlocks happen if the INSERT statement and a trigger both try to touch the same row. Some possible solutions:

Add an option for deadlock_retries: 1 to handle it when the query gets dropped?
Use a different transaction isolation level for copying?

Lhm is not copying the triggers which the table already has

I am using Lhm for altering a table but the existing triggers doesn't get copied when the lhm table is renamed with the old table name.
Consider this
I have a table with trigger 'x' now lhm adds three more triggers create,update and delete and when the copying of table is complete it deletes the trigger and renames the lhm table with the old table but after it renames trigger 'x' is missing

Feature request: Drop old lhma_ tables

Since LHM tasks are the ones creating lhma_ tables when renaming the old tables on an online schema change, it would be nice that the same code provides an option to drop those that were created 'n' days ago.

Something that could be run like:

 bundle exec rake lhm:clean --older_than=10

Clean up copy table and triggers on errors

Any release coming that includes the slave_lag throttler?

Hi! Are there any plans for a new versioned release that includes the slave_lag throttler that's currently in Master?