Giter Club home page Giter Club logo

mosql's Introduction

MoSQL: a MongoDB → SQL streaming translator

MoSQL is no longer being actively maintained. If you are interested in helping maintain this repository, please let us know. We would love for it to find a forever home with someone who can give it the love it needs!

At Stripe, we love MongoDB. We love the flexibility it gives us in changing data schemas as we grow and learn, and we love its operational properties. We love replsets. We love the uniform query language that doesn't require generating and parsing strings, tracking placeholder parameters, or any of that nonsense.

The thing is, we also love SQL. We love the ease of doing ad-hoc data analysis over small-to-mid-size datasets in SQL. We love doing JOINs to pull together reports summarizing properties across multiple datasets. We love the fact that virtually every employee we hire already knows SQL and is comfortable using it to ask and answer questions about data.

So, we thought, why can't we have the best of both worlds? Thus: MoSQL.

MoSQL: Put Mo' SQL in your NoSQL

MoSQL

MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL.

Installation

Install from Rubygems as:

$ gem install mosql

Or build from source by:

$ gem build mosql.gemspec

And then install the built gem.

The Collection Map file

In order to define a SQL schema and import your data, MoSQL needs a collection map file describing the schema of your MongoDB data. (Don't worry -- MoSQL can handle it if your mongo data doesn't always exactly fit the stated schema. More on that later).

The collection map is a YAML file describing the databases and collections in Mongo that you want to import, in terms of their SQL types. An example collection map might be:

mongodb:
  blog_posts:
    :columns:
    - id:
      :source: _id
      :type: TEXT
    - author_name:
      :source: author.name
      :type: TEXT
    - author_bio:
      :source: author.bio
      :type: TEXT
    - title: TEXT
    - created: DOUBLE PRECISION
    :meta:
      :table: blog_posts
      :extra_props: true

Said another way, the collection map is a YAML file containing a hash mapping

<Mongo DB name> -> { <Mongo Collection Name> -> <Collection Definition> }

Where a <Collection Definition> is a hash with :columns and :meta fields.

:columns is a list of hashes mapping SQL column names to an hash describing that column. This hash may contain the following fields:

  • :source: The name of the attribute inside of MongoDB.
  • :type: (Mandatory) The SQL type.

Use of the :source attribute allows for renaming attributes, and extracting elements of a nested hash using MongoDB's dot notation. In the above example, the name and bio fields of the author sub-document will be expanded, and the MongoDB _id field will be mapped to an SQL id column.

At present, MoSQL does not support using the dot notation to access elements inside arrays.

As a shorthand, you can specify a one-element hash of the form name: TYPE, in which case name will be used for both the source attribute and the name of the destination column. You can see this shorthand for the title and created attributes, above.

Every defined collection must include a mapping for the _id attribute.

:meta contains metadata about this collection/table. It is required to include at least :table, naming the SQL table this collection will be mapped to. extra_props determines the handling of unknown fields in MongoDB objects -- more about that later.

By default, mosql looks for a collection map in a file named collections.yml in your current working directory, but you can specify a different one with -c or --collections.

Usage

Once you have a collection map. MoSQL usage is easy. The basic form is:

mosql [-c collections.yml] [--sql postgres://sql-server/sql-db] [--mongo mongodb://mongo-uri]

By default, mosql connects to both PostgreSQL and MongoDB instances running on default ports on localhost without authentication. You can point it at different targets using the --sql and --mongo command-line parameters.

mosql will:

  1. Create the appropriate SQL tables
  2. Import data from the Mongo database
  3. Start tailing the mongo oplog, propagating changes from MongoDB to SQL.

After the first run, mosql will store the status of the optailer in the mongo_sql table in your SQL database, and automatically resume where it left off. mosql uses the replset name to keep track of which mongo database it's tailing, so that you can tail multiple databases into the same SQL database. If you want to tail the same replSet, or multiple replSets with the same name, for some reason, you can use the --service flag to change the name mosql uses to track state.

You likely want to run mosql against a secondary node, at least for the initial import, which will cause large amounts of disk activity on the target node. One option is to specify this in your connect URI:

mosql --mongo mongodb://node1,node2,node3?readPreference=secondary

(Note that this requires you be using at least version 1.8.3 of mongo-ruby-driver)

Advanced usage

For advanced scenarios, you can pass options to control mosql's behavior. If you pass --skip-tail, mosql will do the initial import, but not tail the oplog. This could be used, for example, to do an import off of a backup snapshot, and then start the tailer on the live cluster. This can also be useful for hosted services where you do not have access to the oplog.

If you need to force a fresh reimport, run --reimport, which will cause mosql to drop tables, create them anew, and do another import.

Normaly, MoSQL will scan through a list of the databases on the mongo server you connect to. You avoid this behavior by specifiying a specific mongo db to connect to with the --only-db [dbname] option. This is useful for hosted services which do not let you list all databases (via the listDatabases command).

Schema mismatches and _extra_props

If MoSQL encounters values in the MongoDB database that don't fit within the stated schema (e.g. a floating-point value in a INTEGER field), it will log a warning, ignore the entire object, and continue.

If it encounters a MongoDB object with fields not listed in the collection map, it will discard the extra fields, unless :extra_props is set in the :meta hash. If it is, it will collect any missing fields, JSON-encode them in a hash, and store the resulting text in _extra_props in SQL. You can set :extra_props to use JSON, JSONB, or TEXT.

As of PostgreSQL 9.3, you can declare columns as type "JSON" and use the native JSON support to inspect inside of JSON-encoded types. In earlier versions, you can write code in an extension language, such as plv8.

Non-scalar types

MoSQL supports array types, using the INTEGER ARRAY array type syntax. This will cause MoSQL to create the column as an array type in PostgreSQL, and insert rows appropriately-formatted.

Fields with hash values, or array values that are not in an ARRAY-typed column, will be transformed into JSON TEXT strings before being inserted into PostgreSQL.

Authentication

At present, in order to use MoSQL with a MongoDB instance requiring authentication, you must:

  • Have a user with access to the admin database.
  • Specify the admin database in the --mongo argument
  • Specify the username and password in the --mongo argument

e.g.

mosql --mongo mongodb://$USER:$PASSWORD@$HOST/admin

In order to use MongoDB 2.4's "roles" support (which is different from that in 2.6), you need to create the user in the admin database, give it explicit read access to the databases you want to copy and to the local database, and specify authSource in the URL. eg, connect to mydb/admin with the mongo shell and run:

> db.addUser({user: "replicator", pwd: "PASSWORD", roles: [], otherDBRoles: {local: ["read"], sourceDb: ["read"]}})

(Note that roles: [] ensures that this user has no special access to the admin database.) Now specify:

mosql --mongo mongodb://$USER:$PASSWORD@$HOST/sourceDb?authSource=admin

I have not yet tested using MoSQL with 2.6's rewritten "roles" support. Drop me a note if you figure out anything I should know.

Sharded clusters

MoSQL does not have special support for sharded Mongo clusters at this time. It should be possible to run a separate MoSQL instance against each of the individual backend shard replica sets, streaming into separate PostgreSQL instances, but we have not actually tested this yet.

Development

Patches and contributions are welcome! Please fork the project and open a pull request on github, or just report issues.

MoSQL includes a small but hopefully-growing test suite. It assumes a running PostgreSQL and MongoDB instance on the local host. To run the test suite, first install all of MoSQL's dependencies:

bundle install

Then, run the tests:

rake test

You can also point the suite at a different target via environment variables; See test/functional/_lib.rb for more information.

mosql's People

Contributors

andrewjshults avatar dpatti avatar ebroder avatar evan-stripe avatar fedot avatar flavio avatar glasser avatar jperichon avatar mark-roggenkamp-snapav avatar nelhage avatar nelhage-stripe avatar tomeara avatar toothrot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mosql's Issues

Symbols in fields being treated as columns

If a field in Mongo is a Ruby symbol mosql is trying to create that as a column.

example:

Have a field in Mongo with role and it is set to :deactivated

[COPPER] column "deactivated" does not exist
PG::Error: ERROR: column "deactivated" does not exist

Mongd DB 'local' not found in config file. Skipping.

Hi all! I am new to Mongodb and am struggling porting over my database to Postgresql. I run the "mosql --collections collections.yaml --skip-tail" command, a table gets created, and then no data seems to get posted. The error is:

INFO MoSQL: Mongd DB 'local' not found in config file. Skipping.
INFO MoSQL: Mongd DB 'admin' not found in config file. Skipping.

I have found the config file, what am I supposed to add in?

Thanks!

Collections with names having "." in them

If a collection has a name like ss.country we get the "No mapping for namespace" exception. Is this a bug or there is a way to work with such collection names?

Example stack trace is as follows.
My mongo db is test and collection name is ss.country.

DEBUG MoSQL: No mapping for ns: test.ss.country
/usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/schema.rb:120:in find_ns!': No mapping for namespace: test.ss.country (MoSQL::SchemaError) from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/schema.rb:255:intable_for_ns'
from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/sql.rb:25:in table_for_ns' from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:116:inimport_collection'
from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:102:in block (2 levels) in initial_import' from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:100:ineach'
from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:100:in block in initial_import' from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:96:ineach'
from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:96:in initial_import' from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/streamer.rb:28:inimport'
from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/cli.rb:158:in run' from /usr/local/Cellar/ruby/2.1.2/lib/ruby/gems/2.1.0/gems/mosql-0.3.1/lib/mosql/cli.rb:16:inrun'
from /Users/goutamdan/ruby_test/mosql-master/bin/mosql:5:in <top (required)>' from -e:1:inload'
from -e:1:in `

'

Duplicating records and tailing the oplog?

  1. Why if I run the command in this way

    mosql -c comments.yaml --mongo mongodb://django:django@localhost:27017/ --sql postgres://django:django@localhost/garage -v -v -v --no-drop-tables

    It quits when it's done importing?
    Should it not run forever reading the oplog?

    Is it not possible to let it run forever tailing the oplog (until stopped)?

  2. If I run the migration with a schema as below multiple times, my records get duplicated.

    This is maybe because I am not specifying an ID so I'm letting Postgres create new records with new ids.
    But is it not possible to make it only create new record if not there already in another way?

    I just want to make sure that if I have to run it twice I don't get duplicate data..

    alexandria:
      searchable_comment:
        :columns:
        - text: TEXT
        - status: TEXT
        - author_id:
          :source: user.user_id
          :type: INT
        - product_id: INT
        - date: TIMESTAMP

        :meta:
          :table: dp_comments_comment

Thanks

Bug in version on master?

Hi everyone again, I wanted to use the fix done for the date and milliseconds so I installed mosql from github using this command:

gem specific_install -l https://github.com/stripe/mosql

After I do that and try to run it again however I get the following error.
Is that something wrong I've done or a bug?
Thanks

D, [2014-08-22T12:26:39.830805 #13367] DEBUG -- : (0.001841s) SET standard_conforming_strings = ON
D, [2014-08-22T12:26:39.832235 #13367] DEBUG -- : (0.001114s) SET client_min_messages = 'WARNING'
D, [2014-08-22T12:26:39.833352 #13367] DEBUG -- : (0.000774s) SET DateStyle = 'ISO'
D, [2014-08-22T12:26:39.838507 #13367] DEBUG -- : (0.003212s) CREATE TABLE IF NOT EXISTS "mosql_tailers" ("service" TEXT, "timestamp" INTEGER, PRIMARY KEY ("service"))
D, [2014-08-22T12:26:39.844902 #13367] DEBUG -- : (0.005652s) SELECT ("timestamp") FROM "mosql_tailers" WHERE ("service" = 'mosql') LIMIT 1
 INFO MoSQL: Creating table 'dp_comments_comment'...
E, [2014-08-22T12:26:39.847054 #13367] ERROR -- : PG::SyntaxError: ERROR:  syntax error at or near "NULL"
LINE 1: ...INT, "product_id" INT, "date" TIMESTAMP, PRIMARY KEY (NULL))
                                                                 ^: CREATE TABLE IF NOT EXISTS "dp_comments_comment" ("text" TEXT, "status" TEXT, "author_id" INT, "product_id" INT, "date" TIMESTAMP, PRIMARY KEY (NULL))
/var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `async_exec': PG::SyntaxError: ERROR:  syntax error at or near "NULL" (Sequel::DatabaseError)
LINE 1: ...INT, "product_id" INT, "date" TIMESTAMP, PRIMARY KEY (NULL))
                                                                 ^
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `block in execute_query'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/logging.rb:37:in `log_yield'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `execute_query'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:148:in `block in execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:148:in `execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:492:in `_execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `block (2 levels) in execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:513:in `check_database_errors'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `block in execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/connecting.rb:250:in `block in synchronize'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/connection_pool/threaded.rb:104:in `hold'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/connecting.rb:250:in `synchronize'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `execute'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/query.rb:50:in `execute_dui'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/query.rb:43:in `execute_ddl'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/schema_methods.rb:632:in `create_table_from_generator'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/schema_methods.rb:188:in `create_table'
    from /var/lib/gems/2.0.0/gems/sequel-4.13.0/lib/sequel/database/schema_methods.rb:212:in `create_table?'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/schema.rb:83:in `block (2 levels) in create_schema'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/schema.rb:77:in `each'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/schema.rb:77:in `block in create_schema'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/schema.rb:76:in `each'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/schema.rb:76:in `create_schema'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:90:in `initial_import'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:28:in `import'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/cli.rb:162:in `run'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/cli.rb:16:in `run'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/bin/mosql:5:in `<top (required)>'
    from /usr/local/bin/mosql:23:in `load'
    from /usr/local/bin/mosql:23:in `<main>'

Unsupported Signal error - not working on windows?

From what I have seen trying to fix this issue on my end, SIGUSR2 is not supported on windows platform, or am I having some other noob related problem?

ruby 1.9.3p385 (2013-02-06) [i386-mingw32]

C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:28:in trap': u
nsupported signal SIGUSR2 (ArgumentError)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:28
:inblock in setup_signal_handlers'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:27
:in each'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:27
:insetup_signal_handlers'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:23
:in initialize'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:15
:innew'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:15
:in run'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mosql-0.1.0/bin/mosql:7:in p (required)>'
from C:/Ruby193/bin/mosql:23:in load'
from C:/Ruby193/bin/mosql:23:in

hstore for _extra_props (instead of JSON)

I'm interested in implementing the hstore datatype for _extra_props in MoSQL.

Does anyone know if this has been worked on before?

Are there any foreseeable gotchas or roadblocks? I know that hstore only allows simple key/value storage (no nested "documents"), I'm not sure yet how I'll deal with that.

Reimport takes several hours

Great project, it's been super effective for us when exporting small amounts of data from mongo, but as we've added more data we're seeing it take several hours rather than several minutes. Has anyone hit a similar wall? It is a substantial amount of data, but maybe we're doing something wrong?

Fields Objects
3 42K
8 1.4M
5 18K
10 180K
4 1.5K
5 13K

Creating relational tables

One extension that I am looking at working on is some sort of support for many-to-many relationships. Typically you would have a table that contains the primary keys of each of the relevant rows, but in MongoDB you might use an array within a document. So I was thinking about something like this:

my_blog:
  posts:
    :columns:
      - _id: VARCHAR(24)
      - title: VARCHAR(150)
    :meta:
      :table: posts
      :extra_props: true
    :related:
      user_post_likes:
        - _id: VARCHAR(24)
        - users_liked[]: VARCHAR(24)
      user_post_ratings:
        - _id: VARCHAR(24)
        - ratings[].user_id: VARCHAR(24)
        - ratings[].score: INT

So basically it creates two relational tables, and in the first it just reads an array of ObjectIDs or strings from users_liked, whereas the second reads an array of subdocs containing user_id and score attributes. The [] indicates that a new row should be inserted for each item in the array. I don't know much about how the oplog tailer works, but there could be massive problems that I'm overlooking.

Any thoughts or feedback?

Issue with non english postgreSql server and error checking

The type of exception is checked through a test on the message of the exception, like in sql.rb:

raise e unless e.message =~ /duplicate key value violates unique constraint/

Problem is this message is localized. I had to patch two files (sql.rb and cli.rb ) to change this string.

Not sure what would be the best way to solve the problem ? Ideally a check on the exception type and not the exception message.

Can't express ObjectId as a SQL literal for update ops

As mentioned in #8, it looks like another case is not being handled in updates.

Stack here:

DEBUG MoSQL: processing op: {"ts"=>seconds: 1360870248, increment: 2, "h"=>-5242060077508462325, "v"=>2, "op"=>"u", "ns"=>"user.favorites", "o2"=>{"_id"=>BSON::ObjectId('511d3b6677f7ebd6d1c9c2e6')}, "o"=>{"$set"=>{"userId"=>BSON::ObjectId('511d3a8677f7ebd6d1c9c257')}}}
DEBUG MoSQL: resync user.favorites: 511d3b6677f7ebd6d1c9c2e6 (update was: {"$set"=>{"userId"=>BSON::ObjectId('511d3a8677f7ebd6d1c9c257')}})
/var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1127:in `literal_other_append': can't express BSON::ObjectId('511d3b6677f7ebd6d1c9c2e6') as a SQL literal (Sequel::Error)
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:117:in `literal_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:465:in `complex_expression_sql_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/adapters/shared/postgres.rb:1057:in `complex_expression_sql_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/sql.rb:92:in `to_s_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1087:in `literal_expression_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:95:in `literal_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1319:in `select_where_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `block in clause_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `each'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `clause_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:15:in `delete_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/actions.rb:131:in `delete'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:260:in `sync_object'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:293:in `handle_op'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:250:in `block in optail'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/abstract_persistent_tailer.rb:27:in `block in stream'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/tailer.rb:94:in `stream'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/abstract_persistent_tailer.rb:26:in `stream'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:249:in `optail'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:143:in `run'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:16:in `run'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/bin/mosql:7:in `<top (required)>'
        from /usr/local/bin/mosql:19:in `load'
        from /usr/local/bin/mosql:19:in `<main>'

Awesome project by the way :)

Can't express BSON::ObjectId() as a SQL literal

I just finished my initial sync and am starting to tail the oplog, getting the following:

EBUG MoSQL: processing op: {"ts"=>seconds: 1360869334, increment: 8, "h"=>7185246412687638613, "v"=>2, "op"=>"d", "ns"=>"user.favorites", "b"=>true, "o"=>{"_id"=>BSON::ObjectId('50e097b777f7ebd6d1cd88c4')}}
/var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1127:in `literal_other_append': can't express BSON::ObjectId('50e097b777f7ebd6d1cd88c4') as a SQL literal (Sequel::Error)
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:117:in `literal_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:465:in `complex_expression_sql_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/adapters/shared/postgres.rb:1057:in `complex_expression_sql_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/sql.rb:92:in `to_s_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1087:in `literal_expression_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:95:in `literal_append'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:1319:in `select_where_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `block in clause_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `each'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:835:in `clause_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/sql.rb:15:in `delete_sql'
        from /var/lib/gems/1.9.1/gems/sequel-3.44.0/lib/sequel/dataset/actions.rb:131:in `delete'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:298:in `handle_op'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:246:in `block in optail'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/abstract_persistent_tailer.rb:27:in `block in stream'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/tailer.rb:94:in `stream'
        from /var/lib/gems/1.9.1/gems/mongoriver-0.1.0/lib/mongoriver/abstract_persistent_tailer.rb:26:in `stream'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:245:in `optail'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:139:in `run'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/lib/mosql/cli.rb:16:in `run'
        from /var/lib/gems/1.9.1/gems/mosql-0.1.0/bin/mosql:7:in `<top (required)>'
        from /usr/local/bin/mosql:19:in `load'
        from /usr/local/bin/mosql:19:in `<main>'

Oplog tailing is *significantly* slower than importing.

I have a dozen MongoDB collections that are updated periodically, causing ~80,000 documents to be removed and replaced in each of them.

When I have tailing enabled, this change takes forever (about 3.5 hours) to replicate into PostgreSQL, because tailing occurs at a rate of ~100 documents/sec on the machine I'm running it on.

For comparison, doing --reimport on exactly the same collections runs at ~3000 documents/sec on the same machine. This process finishes in less than 2 minutes.

I'm aware that processing oplog entries introduces some overhead, but it surely shouldn't reduce throughput by a factor of ten. Am I doing something wrong, or is this simply a limitation that exists in the current mosql implementation?

Better error message if collections.yml is invalid.

/home/nelhage/.rbenv/versions/1.9.3-p374-fast-require/lib/ruby/1.9.1/psych.rb:203:in `parse': (<unknown>): mapping values are not allowed in this context at line 16 column 64 (Psych::SyntaxError)
    from /home/nelhage/.rbenv/versions/1.9.3-p374-fast-require/lib/ruby/1.9.1/psych.rb:203:in `parse_stream'
    from /home/nelhage/.rbenv/versions/1.9.3-p374-fast-require/lib/ruby/1.9.1/psych.rb:151:in `parse'
    from /home/nelhage/.rbenv/versions/1.9.3-p374-fast-require/lib/ruby/1.9.1/psych.rb:127:in `load'
    from /home/nelhage/stripe/mosql/lib/mosql/cli.rb:124:in `load_collections'
    from /home/nelhage/stripe/mosql/lib/mosql/cli.rb:130:in `run'
    from /home/nelhage/stripe/mosql/lib/mosql/cli.rb:16:in `run'
    from bin/mosql:7:in `<main>'

Could be more helpful. Such as mentioning collections.yaml at all.

import inner document

Hi! How can I import inner document to postgresql ?

e.g. {"_id": ObjectId("1111"), "f1": "f1", "innerDocument" : { "f2" : "f2", "f3" : "f3" } };

How to do this mapping ?

UndefinedColumn error

We're getting strange errors in our MoSQL log like the one below when a Mongo document is updated.

WARN MoSQL: Ignoring row (id=5233b00caddfe6c60c000003): PG::UndefinedColumn: ERROR:  column "shipped" does not exist
LINE 1: ...ncisco', '94109', 'CA', 'United States', 'ed8fd', "shipped",...
                                                             ^

The odd thing is that the caret is pointing not to a column, but to a value. Also, our MoSQL config for this collection does include the column that the value is for.

Has anyone else seen anything like this?

[Question] Replication back to MongoDB

Hey Guys,

Kudos

Just started exploring mosql, and I just want to thank you for deciding to open source this tool. Was a breeze to setup, and has scaled very nicely. I budgeted 3 days to get this working and it took 10 minutes...

Background

My current setup looks something like this ->

  • Dozens of client side meteor applications writing to their own minimongo dbs in the browser
  • Master mongodb on the server that handles client reactivity, publications, and subscriptions
  • MoSQL replicating the mongoDB data to postgreSQL

I've load tested this setup by running 10 simultaneous meteor clients all creating new documents every milisecond. MoSQL was able to handle this no problem, and the PostgreSQL db was only a couple hundred milliseconds behind.

You can check out the app here

Question

The goal of this whole exercise was to try to create what ammounts to Master-Master replication between MongoDB and PostgreSQL. MongoDB -> PostgreSQL is working fine thanks to mosql, but I was wondering if you had any insight on PostgreSQL -> MongoDB.

I've seen a couple of independent tutorials using Bucardo, but I am afraid to introduce another replication engine as I'm sure it will just create a replication loop.

Have you guys invested any brain time in trying to solve this problem? If so what was your approach, and if not do you see any potential gotchas that I may be missing.

Thanks,
Austin Rivas

Connecting to Mongo

I'm hosting my db on MongoHQ and followed the query string connection as they specified (I'm able to log in through my nodejs - express app). Still getting this error when I try to connect:

/home/ubuntu/.rvm/gems/ruby-2.1.1/gems/mongo-1.10.0.rc1/lib/mongo/functional/authentication.rb:205:in `issue_authentication': Failed to authenticate user 'analytics' on db 'admin'. (Mongo::AuthenticationError)

mongodb://user:[email protected]:10280/dbname
mongodb://user:[email protected]:10280/admin

(tried both and neither work - slightly different errors).

thoughts?

Optimisation with update operations

Update ops seems to take a looong time, especially if there is no impact on the postgres

Why not check in case of set/unset field if this field is part of the postgres schema? If not, don't do the update except if there is the extra field with the json struct?

Help!!! How to deal with embed list to postgres by using mosql?eg:cus.utm.utm_source

  1. mongo's table
    table: cus
    column: utm
    embed column: utm_source
    etc:

db.cus.findOne();
{
"status" : 5,
"assign_status" : 1,
"insure_for" : "s",
"email" : "",
"payment_scope" : null,
"sex" : 1,
"has_social_insure" : null,
"updated_at" : ISODate("2013-05-14T08:27:06.400Z"),
"created_at" : ISODate("2013-05-14T07:16:48.239Z"),
"utm" : {
"utm_source" : null,
"utm_medium" : "organic",
"utm_campaign" : "product",
"utm_term" : null,
"utm_content" : "floatlink_bottom(detail)",
"ab_version" : "B Version",
"_id" : 13659
},
"_id" : 13663
}

2.collections.yml

mongodb:
cus:
:columns:
- id:
:source: _id
:type: INTEGER
- status:
:source: status
:type: INTEGER
- insure_for:
:source: insure_for
:type: VARCHAR(20)
- email:
:source: email
:type: VARCHAR(20)
- nick:
:source: nick
:type: VARCHAR(100)
- payment_scope: VARCHAR(10)
- sex: INTEGER
- has_social_insure: BOOLEAN
- updated_at: DATE
- created_at: DATE
- utm: VARCHAR(4000)
:meta:
:table: cus
:extra_props: false

Querying a secondary server

I'm trying to query a secondary server and it does not let me query it, the error is

/var/lib/gems/2.0.0/gems/mongo-1.10.2/lib/mongo/cursor.rb:149:in `next': not master and slaveOk=false (Mongo::ConnectionFailure)
    from /var/lib/gems/2.0.0/gems/mongo-1.10.2/lib/mongo/cursor.rb:334:in `each'
    from /var/lib/gems/2.0.0/gems/mongo-1.10.2/lib/mongo/db.rb:259:in `collect'
    from /var/lib/gems/2.0.0/gems/mongo-1.10.2/lib/mongo/db.rb:259:in `collection_names'
    from /var/lib/gems/2.0.0/gems/mongo-1.10.2/lib/mongo/db.rb:268:in `collections'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:115:in `block in initial_import'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `each'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `initial_import'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:28:in `import'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/cli.rb:162:in `run'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/lib/mosql/cli.rb:16:in `run'
    from /var/lib/gems/2.0.0/gems/mosql-0.3.2/bin/mosql:5:in `<top (required)>'
    from /usr/local/bin/mosql:23:in `load'
    from /usr/local/bin/mosql:23:in `<main>'

Is there a way to tell to mosql to set that thing? (It's just rs.slaveOK() from the cli really).
Thanks

Can't connect to Amazon Redshift: `unrecognized configuration parameter "standard_conforming_strings"`

Not sure if this is a Sequel issue. Also tried https://github.com/remind101/sequel-redshift with same issue.

/var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:161:in `async_exec': PG::UndefinedObject: ERROR:  unrecognized configuration parameter "standard_conforming_strings" (Sequel::DatabaseConnectionError)
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:161:in `block in execute_query'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/database/logging.rb:33:in `log_yield'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:161:in `execute_query'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:148:in `block in execute'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:148:in `execute'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:236:in `block in connect'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:236:in `each'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/postgres.rb:236:in `connect'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool.rb:96:in `make_new'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:158:in `make_new'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:137:in `available'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:127:in `block in acquire'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:194:in `block in sync'
    from <internal:prelude>:10:in `synchronize'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:194:in `sync'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:126:in `acquire'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/connection_pool/threaded.rb:94:in `hold'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/database/connecting.rb:229:in `synchronize'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/shared/postgres.rb:442:in `server_version'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/adapters/shared/postgres.rb:458:in `supports_create_table_if_not_exists?'
    from /var/lib/gems/1.9.1/gems/sequel-4.8.0/lib/sequel/database/schema_methods.rb:196:in `create_table?'
    from /var/lib/gems/1.9.1/gems/mosql-0.3.1/lib/mosql/tailer.rb:4:in `create_table'
    from /var/lib/gems/1.9.1/gems/mosql-0.3.1/lib/mosql/cli.rb:147:in `run'
    from /var/lib/gems/1.9.1/gems/mosql-0.3.1/lib/mosql/cli.rb:16:in `run'
    from /var/lib/gems/1.9.1/gems/mosql-0.3.1/bin/mosql:5:in `<top (required)>'
    from /usr/local/bin/mosql:23:in `load'
    from /usr/local/bin/mosql:23:in `<main>'

Can't express BSON::ObjectId() as a SQL literal

Getting this error on initial import. Seems to be related to previous issues posted here with extracting an ID out of MongoDB. I'm not sure what info you need about the MongoDB setup. It's on compose.io, and other collection IDs import fine, so it may be something to do with this particular collection. I can get more info on how this collection may differ from others if you can direct me as to what to look for. On the surface this collection seems the same as all the others.

INFO MoSQL: Importing for api-uat.commercials...
/usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:1230:in `literal_other_append': can't express BSON::ObjectId('53a93e7efd487b0c9eb267e2') as a SQL literal (Sequel::Error)
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:107:in `literal_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/extensions/pg_array.rb:517:in `block in _literal_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/extensions/pg_array.rb:512:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/extensions/pg_array.rb:512:in `_literal_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/extensions/pg_array.rb:496:in `sql_literal_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:1226:in `literal_other_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:107:in `literal_append'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:1543:in `block in update_set_sql'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:1535:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:1535:in `update_set_sql'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:228:in `_update_sql'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/sql.rb:174:in `update_sql'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/sequel-4.13.0/lib/sequel/dataset/actions.rb:769:in `update'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/sql.rb:51:in `upsert!'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:61:in `block (2 levels) in bulk_upsert'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:39:in `unsafe_handle_exceptions'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:60:in `block in bulk_upsert'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:57:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:57:in `rescue in bulk_upsert'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:52:in `bulk_upsert'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:150:in `block (4 levels) in import_collection'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:85:in `track_time'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:149:in `block (3 levels) in import_collection'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mongo-1.10.2/lib/mongo/cursor.rb:335:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:144:in `block (2 levels) in import_collection'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:70:in `block in with_retries'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:68:in `times'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:68:in `with_retries'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:143:in `block in import_collection'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mongo-1.10.2/lib/mongo/collection.rb:291:in `find'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:142:in `import_collection'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:119:in `block (2 levels) in initial_import'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:117:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:117:in `block in initial_import'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `each'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `initial_import'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/streamer.rb:28:in `import'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/cli.rb:162:in `run'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/lib/mosql/cli.rb:16:in `run'
    from /usr/local/lib64/ruby/gems/2.1.0/gems/mosql-0.3.2/bin/mosql:5:in `<top (required)>'
    from /usr/local/bin/mosql:23:in `load'
    from /usr/local/bin/mosql:23:in `<main>'

Does it work on a standalone mongodb instance?

First, thank you for the great tool, I am just trying to make it work.

When started, I have the following warning:
WARN MoSQL: mongodb://localhost' is not a replset. WARN MoSQL: Will run the initial import, then stop. WARN MoSQL: Pass--skip-tail' to suppress this warning.

does this mean that it will only works for replica-set? I am currently testing it on a preproductio server, so there is no replica.

Geospatial / PostGIS support

Do you know if anyone has used MoSQL with geospatial data from Mongo to PostGIS. We just have geocoded data that is stored as arrays, i.e. lon/lat pairs. Presumably, it would have use the ST_GeomFromText or ST_MakePoint PostGIS function. I would be happy to work a pull; I just don't want to reinvent the wheel.

Support for TEXT ARRAY

Here's my collections.yml snippet:

  products:
    :columns:
    - id:
      :source: _id
      :type: TEXT
    - foodshed: TEXT
    - name: TEXT
    - price: INTEGER
    - tags: TEXT ARRAY
    :meta:
      :table: products

But I'm getting an error:

LINE 1: ...ini Pumpkin Spice Trio', "price" = 6.49, "tags" = '["Cookies...
PG::InvalidTextRepresentation: ERROR:  missing dimension value (Sequel::DatabaseError)

Full stack trace is here https://gist.github.com/adborden/de92b19a2bbe1feebbdc

Is this expected?

Very large import, losing changes?

I was looking at the code in streamer.rb and it made me wonder.
Suppose you're tailing and the initial import takes a couple of days, considering the oplog is actually a capped collection, what happens if some of the changes get overwritten before it gets to reading the oplog?

It would seem to me that:

  • starting the tailing in a different thread
  • starting the import in the main thread

Would solve this issue, what do you think?

  @streamer.import

  unless options[:skip_tail]
    @streamer.optail
  end

Doesn't handle BSON::DBRef correctly

See #9 (comment)

@christinang89 -- looks like the problem is that your data has a BSON::ObjectRef, which we haven't implemented support for. I'm not sure what the right thing to do with those in general is, but I suspect that the right answer is to just extract out the referenced _id and store that in the same form as the normal _id fields, so that you can do JOINs on them in SQL. Would that work for your use case?

Getting Errors While Installing

Tried this 3-4 times (uninstalling ruby from scratch) and keep getting an error when I try to do "gem install mosql".

Installing on an AWS EC2 instance after following this guide: https://www.digitalocean.com/community/articles/how-to-install-ruby-on-rails-on-ubuntu-12-04-lts-precise-pangolin-with-rvm

Thanks in advance for the help!

ERROR:  Error installing mosql:
    ERROR: Failed to build gem native extension.

    /home/ubuntu/.rvm/rubies/ruby-2.1.1/bin/ruby extconf.rb
checking for pg_config... no
No pg_config... trying anyway. If building fails, please try again with
 --with-pg-config=/path/to/pg_config
checking for libpq-fe.h... no
Can't find the 'libpq-fe.h header
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
    --with-opt-dir
    --without-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/home/ubuntu/.rvm/rubies/ruby-2.1.1/bin/ruby
    --with-pg
    --without-pg
    --with-pg-config
    --without-pg-config
    --with-pg_config
    --without-pg_config
    --with-pg-dir
    --without-pg-dir
    --with-pg-include
    --without-pg-include=${pg-dir}/include
    --with-pg-lib
    --without-pg-lib=${pg-dir}/lib

extconf failed, exit code 1

Gem files will remain installed in /home/ubuntu/.rvm/gems/ruby-2.1.1/gems/pg-0.17.1 for inspection.
Results logged to /home/ubuntu/.rvm/gems/ruby-2.1.1/extensions/x86_64-linux/2.1.0/pg-0.17.1/gem_make.out

Can't import BinData fields?

I have binary fields in my mongo db, BinData(3,"...") and BinData(0,"..."). Is there any way to import these with mosql? I see encoding issues in the COPY from STDIN command.

example record:

> db.entities.findOne();
{
        "_id" : BinData(3,"DHSvnWWKRQ6hydIk/B4RaQ=="),
        "owners" : [
                BinData(3,"Y1nPwQcVSMuiS1IS3IXCyA=="),
                BinData(3,"IUec/mYQSO6ctTpo2ilU2g==")
        ],
        "display_name" : "SportsOnEarth",
        "modified_time" : 1407781851.226008,
        "created_time" : 1406242278.356365,
        "handle" : "sportsonearth",
        "type" : "organization",
        "slug" : "DHSvnWWKRQ6hydIk_B4RaQ"
}

collections.yml

db2:
  entities:
    :columns:
    - slug: TEXT
    - id:
      :source: _id
      :type: BYTEA
    :meta:
      :table: entities
      :extra_props: false

snippet of error:

[vagrant@air64] 1105% mosql --skip-tail -v                                                                      :) (git)-[orm] (env) ~s/airtv/airtv/mosql
 INFO MoSQL: Creating table 'entities'...
 INFO MoSQL: Mongd DB 'admin' not found in config file. Skipping.
 INFO MoSQL: Importing for Mongo DB db2...
 INFO MoSQL: Importing for db2.entities...
DEBUG MoSQL: Transformed: ["DHSvnWWKRQ6hydIk_B4RaQ", "\ft\257\235e\212E\016\241\311\322$\374\036\021i"]
DEBUG MoSQL: Transformed: ["Fa8CsZ5yTd6GaC4WLajFpg", "\025\257\002\261\236rM\336\206h.\026-\250\305\246"]
DEBUG MoSQL: Transformed: ["IJxB5N_DRF-s6myLffWfcQ", " \234A\344\337\303D_\254\352l\213}\365\237q"]
DEBUG MoSQL: Transformed: ["IUec_mYQSO6ctTpo2ilU2g", "!G\234\376f\020H\356\234\265:h\332)T\332"]

...

DEBUG MoSQL: Transformed: ["9Gc5D8a5SU6BBw47V2jdgw", "\364g9\017\306\271IN\201\a\016;Wh\335\203"]
DEBUG MoSQL: Transformed: ["9ZIeR9H4SPGFbBnG1uYaSw", "\365\222\036G\321\370H\361\205l\031\306\326\346\032K"]
DEBUG MoSQL: Transformed: ["9n-Ljn2CTY21WIuMEpsUFQ", "\366\177\213\216}\202M\215\265X\213\214\022\233\024\025"]
DEBUG MoSQL: Bulk insert error (PG::CharacterNotInRepertoire: ERROR:  invalid byte sequence for encoding "UTF8": 0xaf
CONTEXT:  COPY entities, line 1
), attempting invidual upserts...
ERROR MoSQL: Error processing {"slug"=>"DHSvnWWKRQ6hydIk_B4RaQ", "id"=>"\ft\257\235e\212E\016\241\311\322$\374\036\021i"} for db2.entities.
/usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `async_exec': PG::CharacterNotInRepertoire: ERROR:  invalid byte sequence
 for encoding "UTF8": 0xaf (Sequel::DatabaseError)

        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `execute_query'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/database/logging.rb:33:in `log_yield'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:161:in `execute_query'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:148:in `execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:148:in `execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:492:in `_execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:513:in `check_database_errors'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/database/connecting.rb:250:in `synchronize'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/connection_pool/threaded.rb:104:in `hold'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/database/connecting.rb:250:in `synchronize'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/adapters/postgres.rb:316:in `execute'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/database/query.rb:50:in `execute_dui'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/dataset/actions.rb:917:in `execute_dui'
        from /usr/lib/ruby/gems/1.8/gems/sequel-4.13.0/lib/sequel/dataset/actions.rb:773:in `update'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/sql.rb:51:in `upsert!'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:61:in `bulk_upsert'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:39:in `unsafe_handle_exceptions'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:60:in `bulk_upsert'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:57:in `each'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:57:in `bulk_upsert'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:162:in `import_collection'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:119:in `initial_import'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:117:in `each'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:117:in `initial_import'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `each'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:105:in `initial_import'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/streamer.rb:28:in `import'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/cli.rb:162:in `run'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/lib/mosql/cli.rb:16:in `run'
        from /usr/lib/ruby/gems/1.8/gems/mosql-0.3.2/bin/mosql:5
        from /usr/bin/mosql:19:in `load'
        from /usr/bin/mosql:19

Dropping milliseconds while reading

It looks like there is a small issue when importing documents with datetimes.

Basically what happens is that the milliseconds get dropped, these are the values in Mongo and Postgres:

datetime.datetime(2014, 8, 21, 17, 27, 9, 424000)
datetime.datetime(2014, 8, 21, 18, 27, 9)

I suspect the problem is in here, is it possible?

      case v
      when BSON::Binary, BSON::ObjectId, Symbol
        v = v.to_s
      when BSON::DBRef
        v = v.object_id.to_s
      when Hash
        v = JSON.dump(v)
      when Array
        if col[:array_type]
          v = Sequel.pg_array(v, col[:array_type])
        else
          v = JSON.dump(v)
        end
      end

Thanks

Replicating from Mongo 1.8.2

I get "not master and slaveok=false" when I try to replicate from a secondary. I get "Invalid Mongo URI option readpreference" when I try to append "?readPreference=secondary" to the mongodb:// command line option.

Is there anyway around this if I need to replicate from a secondary with 1.8.2?

Hanging on create table?

I think it's just because I haven't understood how to use it yet, but I didn't manage to get the sync working.

Basically what I do is this:

$ mosql -c collections.yaml --mongo mongodb://user:password@localhost:27017/ --sql postgres://user:password@localhost/garage --skip-tail
INFO MoSQL: Creating table 'engine_comment'...

And the configuration is very very simple:

alexandria:
  comments:
    :columns:
    - text: TEXT
    - status: TEXT

    :meta:
      :table: engine_comment
      :extra_props: true

So a couple of things I don't understand:

  • why is it creating the table? The table is already there and it already has a schema I don't want that to change, why can't it just insert things?
  • why it's hanging there forever on creating the table?

Any hint welcome I'm a bit lost.. thanks a lot

Recommendation/support for creating Postgres indices

Is there support in the mapping YAML file for defining indices when creating the Postgres schema?

Since I assume many folks are using the Postgres-mapped data for reporting and integration with other tools, I would think that indices in the Postgres database would be a common need.

Would you suggest just creating the indices outside of MoSQL after the schema has been created or is there a better approach?

Thanks!

`block in bulk_upsert': undefined method `primary_sql_key_for_ns' for #<MoSQL::CLI:0x00000000e8e428> (NoMethodError)

 INFO MoSQL: Imported 29397000 rows (8254.963520826s, 4295.059250159007s SQL)...
 INFO MoSQL: Imported 29398000 rows (8255.24602006s, 4295.177102176007s SQL)...
somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:167:in `block in bulk_upsert': undefined method `primary_sql_key_for_ns' for #<MoSQL::CLI:0x00000000e8e428> (NoMethodError)
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:164:in `each'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:164:in `rescue in bulk_upsert'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:159:in `bulk_upsert'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:233:in `block (4 levels) in import_collection'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:190:in `track_time'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:232:in `block (3 levels) in import_collection'
        from /usr/local/rvm/gems/ruby-1.9.3-p125/gems/mongo-1.8.3/lib/mongo/cursor.rb:271:in `each'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:227:in `block (2 levels) in import_collection'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:175:in `block in with_retries'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:173:in `times'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:173:in `with_retries'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:226:in `block in import_collection'
        from /usr/local/rvm/gems/ruby-1.9.3-p125/gems/mongo-1.8.3/lib/mongo/collection.rb:269:in `find'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:225:in `import_collection'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:208:in `block (2 levels) in initial_import'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:206:in `each'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:206:in `block in initial_import'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:202:in `each'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:202:in `initial_import'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:143:in `run'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/lib/mosql/cli.rb:16:in `run'
        from somedir/.bundler/ruby/1.9.1/mosql-5945b627f05c/bin/mosql:5:in `<top (required)>'
        from /usr/local/rvm/gems/ruby-1.9.3-p125/bin/mosql:23:in `load'
        from /usr/local/rvm/gems/ruby-1.9.3-p125/bin/mosql:23:in `<main>'
worker_ui@app6:/home/mgordon/mosql$ bundle exec mosql --sql postgres://postgres@myhost/mydb --mongo mongodb://mongohost:27017?slaveOk=true
 INFO Mongoriver: Starting oplog stream from seconds: 1363616697, increment: 0
 INFO Mongoriver: Saved timestamp: seconds: 1363616867, increment: 70 (2013-03-18 14:27:47 +0000)
 INFO Mongoriver: Saved timestamp: seconds: 1363616928, increment: 1 (2013-03-18 14:28:48 +0000)
 INFO Mongoriver: Saved timestamp: seconds: 1363616989, increment: 1 (2013-03-18 14:29:49 +0000)
 INFO Mongoriver: Saved timestamp: seconds: 1363617050, increment: 1 (2013-03-18 14:30:50 +0000)

Saw this while trying to reimport data.

(Side question) Would you expect it to be safe to restart MoSQL if it stops running for a few days or does this generally require a reimport? Is its ability to catch up constrained by whether that many days of oplogs are available?

mapping nested objects to other fields or tables

I try to create a table to map the nested objects (eg:utm.yml),but it failed to sync data to postgres sql, and i also try to create a text column to store as JSON,but when mongo data execute update record i find PG::Error: ERROR: invalid input syntax for integer: "_id" in postgres log the data can't up to date,Is't any good idea to deal with mapping nested objects?

mongo's collections

> db.cus.findOne();
{
        "status" : 5,
        "assign_status" : 1,
        "insure_for" : "s",
        "email" : "",
        "payment_scope" : null,
        "sex" : 1,
        "has_social_insure" : null,
        "updated_at" : ISODate("2013-05-14T08:27:06.400Z"),
        "created_at" : ISODate("2013-05-14T07:16:48.239Z"),
        "utm" : {
                "utm_source" : null,
                "utm_medium" : "organic",
                "utm_campaign" : "product",
                "utm_term" : null,
                "utm_content" : "floatlink_bottom(detail)",
                "ab_version" : "B Version",
                "_id" : 13659
        },
        "_id" : 13663
}

Inconsistent replication

Have you seen or heard any accounts of inconsistent data? Our records don't line up precisely between Mongo and PostgreSQL. Some records don't get replicated at all despite trying to run the sync from scratch. We also see duplicate inserts which are blocked by the unique constraint on the id column in the PostgreSQL logs.

Happy to provide any info that would be helpful.

mosql should support doing imports against non-replset nodes

If you try to run it against a replset, it will warn, try to continue anyways, and die with

 /home/mosql/.rvm/gems/ruby-1.9.3-p385/gems/mosql-0.1.2/lib/mosql/cli.rb:192:in `initial_import': undefined method `[]' for nil:NilClass (NoMethodError) 

Shardset support

Hi,

We've implemented a mosql setup at our company, and it would've worked great if it supported shardsets.

At the moment we're running a nightly job to migrate the whole DB over, but it sometimes fails. Our data is also growing at such a pace that we can't keep on doing this, so we'll have to stop using mosql until its got shardset support.

I know the readme explicitly states that it's not supported, but I'm creating the issue nonetheless so that we can keep an eye on it in case somebody picks it up.

Rate limiting?

I've tried to run a migration today on production and it's really fast, I got

550 k rows in 2 minutes, which is around 5k per second.
However that's too fast, because the app suffered while I was doing that!
In verbose mode it did less than 3k per second and that didn't really affect the db that much..

Is there a way to do some rate limiting to make it run a bit slower (even a bit hacky changing the code)?
Even introducing a silly sleep here and there if that makes the trick would be enough for now..

null value in column "_id" violates not-null constraint

I synced over a Mongo collection without any issues and now have MoSQL tailing the oplog to keep the Postgres table up to date. I did not see any warnings or errors from MoSQL during import but now that we're tailing the oplog I see messages that say:

WARN MoSQL: Ignoring row (_id=): PG::Error: ERROR: null value in column "_id" violates not-null constraint

Is MoSQL trying to pull records from Mongo before the automatically generated _id field is populated? Any workaround?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.