jgaskins / perpetuity Goto Github PK

Persistence gem for Ruby objects using the Data Mapper pattern

Home Page: http://jgaskins.org/blog/2012/04/20/data-mapper-vs-active-record/

License: MIT License

Ruby 100.00%

perpetuity's Introduction

Perpetuity

Perpetuity is a simple Ruby object persistence layer that attempts to follow Martin Fowler's Data Mapper pattern, allowing you to use plain-old Ruby objects in your Ruby apps in order to decouple your domain logic from the database as well as speed up your tests. There is no need for your model classes to inherit from another class or even include a mix-in.

Your objects will hopefully eventually be able to be persisted into whichever database you like. Right now, there are only a PostgreSQL adapter and a MongoDB adapter. Other persistence solutions will come later.

How it works

In the Data Mapper pattern, the objects you work with don't understand how to persist themselves. They interact with other objects just as in any other object-oriented application, leaving all persistence logic to mapper objects. This decouples them from the database and allows you to write your code without it in mind.

Installation

Add the following to your Gemfile and run bundle to install it.

gem 'perpetuity-mongodb', '~> 1.0.0.beta'  # if using MongoDB
gem 'perpetuity-postgres'                  # if using Postgres

Note that you do not need to explicitly declare the perpetuity gem as a dependency. The database adapter takes care of that for you. It works just like including rspec-rails into your Rails app.

Configuration

The only currently-1.0-quality adapter is MongoDB, but stay tuned for the Postgres adapter. The simplest configuration is with the following line:

Perpetuity.data_source :mongodb, 'my_mongo_database'
Perpetuity.data_source :postgres, 'my_pg_database'

Note: You cannot use different databases in the same app like that. At least, not yet. :-) Possibly a 1.1 feature?

If your database is on another server/port or you need authentication, you can specify those as options:

Perpetuity.data_source :mongodb, 'my_database', host: 'mongo.example.com',
                                                port: 27017,
                                                username: 'mongo',
                                                password: 'password'

If you are using Perpetuity with a multithreaded application, you can specify a :pool_size parameter to set up a connection pool. If you omit this parameter, it will use the data source's default pool size.

Setting up object mappers

Object mappers are generated by the following:

Perpetuity.generate_mapper_for MyClass do
  attribute :my_attribute
  attribute :my_other_attribute

  index :my_attribute
end

The primary mapper configuration will be configuring attributes to be persisted. This is done using the attribute method. Calling attribute will add the specified attribute and its class to the mapper's attribute set. This is how the mapper knows what to store and how to store it. Here is an example of an Article class, its mapper and how it can be saved to the database.

Accessing mappers after they've been generated is done through the use of the subscript operator on the Perpetuity module. For example, if you generate a mapper for an Article class, you can access it by calling Perpetuity[Article].

class Article
  attr_accessor :title, :body
end

Perpetuity.generate_mapper_for Article do
  attribute :title
  attribute :body
end

article = Article.new
article.title = 'New Article'
article.body = 'This is an article.'

Perpetuity[Article].insert article

Loading Objects

You can load all persisted objects of a particular class by sending all to the mapper object. Example:

Perpetuity[Article].all

You can load specific objects by calling the find method with an ID param on the mapper and passing in the criteria. You may also specify more general criteria using the select method with a block similar to Enumerable#select.

article  = Perpetuity[Article].find params[:id]
users    = Perpetuity[User].select { |user| user.email == '[email protected]' }
articles = Perpetuity[Article].select { |article| article.published_at < Time.now }
comments = Perpetuity[Comment].select { |comment| comment.article_id.in articles.map(&:id) }

These methods will return a Perpetuity::Retrieval object, which will lazily retrieve the objects from the database. They will wait to hit the DB when you begin iterating over the objects so you can continue chaining methods, similar to ActiveRecord.

article_mapper = Perpetuity[Article]
articles = article_mapper.select { |article| article.published_at < Time.now }
                         .sort(:published_at)
                         .reverse
                         .page(2)
                         .per_page(10) # built-in pagination

articles.each do |article| # This is when the DB gets hit
  # Display the pretty articles
end

Unfortunately, due to limitations in the Ruby language itself, we cannot get a true Enumerable-style select method. The limitation shows itself when needing to have multiple criteria for a query, as in this super-secure example:

user = Perpetuity[User].select { |user| (user.email == params[:email]) & (user.password == params[:password]) }

Notice that we have to use a single & and surround each criterion with parentheses. If we could override && and ||, we could put more Rubyesque code in here, but until then, we have to operate within the boundaries of the operators that can be overridden.

Associations with Other Objects

The database can natively serialize some objects. For example, MongoDB can serialize String, Numeric, Array, Hash, Time, nil, true, false, and a few others. For other objects, you must determine whether you want those attributes embedded within the same document in the database or attached as a reference. For example, a Post could have Comments, which would likely be embedded within the post object. But these comments could have an author attribute that references the Person that wrote the comment. Embedding the author in this case is not a good idea since it would be a duplicate of the Person that wrote it, which would then be out of sync if the original object is modified.

If an object references another type of object, the association is declared just as any other attribute. No special treatment is required. For embedded relationships, make sure you use the embedded: true option in the attribute.

Perpetuity.generate_mapper_for Article do
  attribute :title
  attribute :body
  attribute :author
  attribute :comments, embedded: true
  attribute :timestamp
end

Perpetuity.generate_mapper_for Comment do
  attribute :body
  attribute :author
  attribute :timestamp
end

In this case, the article has an array of Comment objects, which the serializer knows that the data source cannot serialize. It will then tell the Comment mapper to serialize it and it stores that within the array.

If some of the comments aren't objects of class Comment, it will adapt and serialize them according to their class. This works very well for objects that can have attributes of various types, such as a User having a profile attribute that can be either a UserProfile or AdminProfile object. You don't need to declare anything different for this case, just store the appropriate type of object into the User's profile attribute and the mapper will take care of the details.

If the associated object's class has a mapper defined, it will be used by the parent object's mapper for serialization. Otherwise, the object will be Marshal.dumped. If the object cannot be marshaled, the object cannot be serialized and an exception will be raised.

When you load an object that has embedded associations, the embedded attributes are loaded immediately. For referenced associations, though, only the object itself will be loaded. All referenced objects must be loaded with the load_association! mapper call.

user_mapper = Perpetuity[User]
user = user_mapper.find(params[:id])
user_mapper.load_association! user, :profile

This loads up the user's profile and injects it into the profile attribute. All loading of referenced objects is explicit so that we don't load an entire object graph unnecessarily. This encourages (forces, really) you to think about all of the objects you'll be loading.

If you want to load a 1:N, N:1 or M:N association, Perpetuity handles that for you.

article_mapper = Perpetuity[Article]
articles = article_mapper.all.to_a
article_mapper.load_association! articles.first, :tags # 1:N
article_mapper.load_association! articles, :author     # All author objects for these articles load in a single query - N:1
article_mapper.load_association! articles, :tags       # M:N

Each of these load_association! calls will only execute the number of queries necessary to retrieve all of the objects. For example, if the author attribute for the selected articles contains both User and Admin objects, it will execute two queries (one each for User and Admin). If the tags for all of the selected articles are all Tag objects, only one query will be executed even in the M:N case.

Customizing persistence

Setting the ID of a record to a custom value rather than using the DB default.

Perpetuity.generate_mapper_for Article do
  id { title.gsub(/\W+/, '-') } # use the article's parameterized title attribute as its ID
end

The block passed to the id macro is evaluated in the context of the object being persisted. This allows you to use the object's private methods and instance variables if you need to.

Indexing

Indexes are declared with the index method. The simplest way to create an index is just to pass the attribute to be indexed as a parameter:

Perpetuity.generate_mapper_for Article do
  index :title
end

The following will generate a unique index on an Article class so that two articles cannot be added to the database with the same title. This eliminates the need for uniqueness validations (like ActiveRecord has) that check for existence of that value. Uniqueness validations have race conditions and don't protect you at the database level. Using unique indexes is a superior way to do this.

Perpetuity.generate_mapper_for Article do
  index :title, unique: true
end

Also, some databases provide the ability to specify an order for the index. For example, if you want to query your blog with articles in descending order, you can specify a descending-order index on the timestamp for increased query performance.

Perpetuity.generate_mapper_for Article do
  index :timestamp, order: :descending
end

Applying indexes

It's very important to keep in mind that specifying an index does not create it on the database immediately. If you did this, you could potentially introduce downtime every time you specify a new index and deploy your application. Additionally, if a unique index fails to apply, you would not be able to start your app.

In order to apply indexes to the database, you must send reindex! to the mapper. For example:

class ArticleMapper < Perpetuity::Mapper
  map Article
  attribute :title
  index :title, unique: true
end

Perpetuity[Article].reindex!

You could put this in a rake task to be executed when you deploy your app.

Rails Integration

Let's face it, most Ruby apps run on Rails, so we need to be able to support it. Beginning with 0.7.0, Perpetuity automatically detects Rails when you configure it and will load Rails support at that point.

Dynamic mapper reloading

Previous versions of Perpetuity would break when Rails reloaded your models in development mode due to class objects being different. It now reloads mappers dynamically based on whether the class has been reloaded.

In order for this to work, your mapper files need to be named *_mapper.rb and be stored anywhere inside your project's app directory. Usually, this would be app/mappers, but this is not enforced.

ActiveModel-compliant API

Perpetuity deals with POROs just fine but Rails does not. This is why you have to include ActiveModel::Model in your objects that you want to pass to various Rails methods (such as redirect_to, form_for and render).

In your models, including ActiveModel::Model in Rails 4 (or the underlying modules in Rails 3) will give you the API that Rails expects but that won't work with Perpetuity. For example, ActiveModel assumes an id method but your model may not provide it, so instead of including ActiveModel we provide a RailsModel mixin.

class Person
  include Perpetuity::RailsModel
end

This will let Rails know how to talk to your models in the way that Perpetuity handles them.

Contributing

There are plenty of opportunities to improve what's here and possibly some design decisions that need some more refinement. You can help. If you have ideas to build on this, send some love in the form of pull requests, issues or tweets and I'll do what I can for them.

Please be sure that the tests run before submitting a pull request. Just run rspec.

The tests include integration with an adapter. By default, this is the MongoDB adapter, but you can change that to Postgres by setting the PERPETUITY_ADAPTER environment variable to postgres.

When testing with the MongoDB adapter, you'll need to have MongoDB running. On Mac OS X, you can install MongoDB via Homebrew and start it with mongod. No configuration is necessary.

When testing with the Postgres adapter, you'll need to have PostgreSQL running. On Mac OS X, you can install PostgreSQL via Homebrew and start it with pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start. No other configuration is necessary, as long as the user has rights to create a database. NOTE: The Postgres adapter is incomplete at this time, and the tests do not yet pass with this adapter.

perpetuity's People

Contributors

Stargazers

Watchers

Forkers

happy-happy-toast hovsater jasonswett krisleech boochtek acook jeremy6d spencerx xescugc arthurtalkgoal alaibe pm11

perpetuity's Issues

Load partial associations

If a user has a mailbox which has thousands of messages in it, loading all of them is a huge waste when they are likely to only view the latest 2 or 3 messages. We need some kind of syntax that limits which associated objects we want. The only idea I've got:

# Load messages 20 through 40 (exclusive)
mapper.load_association! mailbox, :messages, 20...40

More Enumerable-like syntax

We have a Mapper#select, which is great, but I think we can do even better.

~~#find_all as an alias for #select~~
#find/#detect already exists, and I like it as a shorthand, but if we could make it optionally take a block like #select instead, that would be even better. To clarify, I don't want to get rid of Mapper#find(id). The good news is, this is trivial: find is just select(&block).limit(1).first. Bam.
~~#map/#collect to only return the specified attribute. This would require implementing this behavior into the DB adapter, as well.~~
~~#reject to find non-matching criteria. This would also require a DB-adapter implementation. For MongoDB, I believe we can simply use { '$not' => query }. Need to check that, though.~~
~~#count should accept an optional block for criteria.~~
#min_by/#max_by: We may need to accept a symbol rather than a block here, or convert #sort to take a block, because we'll need to use #sort to determine min and max.
#take as an alias for #limit/#per_page
#drop(n) to skip the first n objects. This is similar to #page, except #drop won't depend on the value provided to #per_page/#limit.
#any?/#all?/#one?/#none? based on #count
#entries as an alias for #to_a

Most of the rest of Enumerable can be done on the resulting array.

Both Mapper and Retrieval will need to implement this functionality. This raises a red flag in my mind and means we may need to merge the classes. Maybe merge the two classes so that using Enumerable methods just instantiates a new Mapper instead of a Retrieval?

The idea behind Retrieval was to represent a result set, but the result set is actually the array returned from Retrieval#to_a. The current implementation of Retrieval doesn't represent a query result at all; it's just query metadata.

I'll experiment with it a bit and see what I come up with.

Need an intermediate form to store serialized data

Going straight from PORO to database-serialized form is hard to generalize, so I think we need an intermediate serialization layer, which will allow the serializer to be pulled out of the adapters and back into the core perpetuity gem. This intermediate form will expose all of the attributes stored in the source object so the adapter serializers won't need to dig into the user's POROs to snag instance variables and it'll help centralize all of the "what data are we storing" logic so we can start on #15.

This intermediate form could just be a hash (preferably a wrapped hash so we can give extra functionality to it if necessary), mapping the column name to the value, and the database adapter would do its serialization/sanitization based on that.

I doubt this will make it into the 1.0 release (which I'd like to do very soon), but maybe in 1.1.

Attribute declarations

Currently, attribute declarations require that the class of the object expected to be occupying the particular attribute. I don't personally see that this is necessary for MongoDB (though it would be for DBs with rigid structures like SQL) and maybe it could be extracted to be in the bag of options (where embedded: true is currently).

Neither Ruby nor MongoDB require you to specify ahead of time what the class of a particular attribute is. If we remove it being mandatory, we can add some metadata to the BSON object we send to MongoDB that lets us know which particular class to instantiate and stick the data into.

This is especially weird because we can have, for example, an Article that contains an array of Comments. The mapper doesn't know that it's meant to have Comment objects inside the array, just that it has an array. I think for now, since MongoDB is all we support, we should remove the class specifier.

Enumerable predicates on query attributes

One requested feature was adding Enumerable predicates to query attributes.

We could add integration specs to test for enumerable predicates on query attributes to make sure they're implemented in the database adapters (if possible).

I'll add separate issues in the Postgres and MongoDB adapter repos for DB-specific discussion, as well.

saving references to classes with attributes that have time datatypes

Hello there,

I have created a Project class and saved a couple of objects to the database. Those projects have attributes like delivery_deadline. Now created a class Document and instantiated it. I try to associate each document with a project and so trying to save it to the database. But this seems to fail as the reference is saved as json and seems like the simple quotes are the issue. This sql is generated by the ORM so is this a bug that I cannot save references of objects that have a time datatype attribute?

If I am doing something wrong here, please let me know. All the best.

Mapper#load_association! performance

When retrieving an object from the DB that references another object in the DB, we have to send load_association! to the mapper to convert the reference object into its actual value. When retrieving individual 1:1-associated objects, this is fine, but the current implementation results in N+1 queries. We need a way to load entire sets of objects.

Use cases

A forum topic with many posts.

Currently, we load the topic, then send queries for each post object. We should be able to fire off something like a mapper.select { id.in? posts.map(&:id) } call. Something like this:

mapper.load_association! topic, :posts`

In this case, the posts object is an array, so we detect that and run with it.

List of users that each have a profile object.

Each user has a 1:1 association with a profile object.

mapper.load_association! users, :profile

In this case, the first parameter is the array. We could hold a hash-like object that uses the classes of the referenced objects as keys and the IDs as an array of values. This wouldn't necessarily be a single query, but would allow users to store objects that weren't all the same class. We would use the minimum number of queries necessary, which would always be the number of distinct classes of objects used for that particular attribute in that particular set of objects.

Example:

{ UserProfile => [1, 2, 3], AdminProfile => [4, 5, 6] }
{ Profile => [1, 2, 3, 4, 5, 6] }

In the first example, we must use multiple queries, but the second example would only be one. Here's some pseudocode:

references = users.map(&attribute)
references.each do |klass, ids|
  Perpetuity[klass].select { id.in? ids }
end

Articles containing lists of tags

The ActiveRecord has_and_belongs_to_many association.

mapper.load_association! articles, :tags

This would maybe need to be done using some sort of identity map? I'm not sure. I'll plug away at it soon.

Objects pulled from the DB not marshalable

When objects are retrieved from the DB, an id method is added to every single one of them so we can easily get ahold of its id in the database. However, if we want to marshal any of these objects for any reason with Marshal.dump, we can't because we define that singleton method on them. This violates least surprise because objects might be able to be marshaled before being persisted, but they definitely cannot after.

So rather than adding id methods, maybe the mapper should just reach into the object and pull out its @id instance variable.

Extract serialization/deserialization

Objects are currently serialized in Mapper but deserialized in Retrieval. I felt horrible about doing it this way, but it was necessary to get the functionality I needed. What I'd like to do is extract all of that behavior from both of those classes into Serializer objects.

Then in the Mapper, we can do something like Serializer.new(object).serialize and in Retrieval Serializer.new(serialized_data).unserialize. This will also make it easier to extract state injection from Mapper and Retrieval objects (where it doesn't belong).

Hashes are not serialized properly

Hashes are natively serializable in MongoDB (most document databases, really), but at the moment are treated as serialized objects of the class for the mapper being used to talk to the database.

I worked for a little while today on fixing that, but ran into another issue: since MongoDB stores JSON (well, BSON) objects, all keys must be strings. At the database-driver level, if you store a hash with symbol keys, you get back a hash with string keys; if you store a hash with non-string/symbol keys, you get an exception.

So even if this bug didn't exist, you could only get back hashes with string keys. This could lead to some incredibly subtle bugs in apps using Perpetuity to store objects. I propose that a Hash mapper be created to store them to get around this, similar to #11. The trick would be keeping it queryable, but this may need to happen in the DB's DSL implementation.

Make mappers friendlier to Structs, etc

We assume that object state is stored in instance variables, so we don't go in via accessor methods — using accessor methods might modify state, trigger API calls, etc. Using instance variables means that your mappers have intimate knowledge of your models, but your models can remain completely oblivious to the mapper.

However, sometimes when you want to hammer out a quick proof of concept, you use a Struct instead of defining a class. Structs don't store their state into instance variables:

2.0.0p247 :004 > Boy = Struct.new(:name)
2.0.0p247 :007 > Boy.new('Sue').instance_variables
 => []

I'm not sure I want to make Structs a special case, but I'm open to ideas on how to use accessor methods instead of ivars.

Classes in Modules cause Problems

This does not work and causes NameError
perpetuity/serializer.rb:77:in `const_get': wrong constant name CRM::Person

I understand that this is caused because there is no constant "CRM::Person". Instead it should refer to the Class "Person" in module "CRM".

require 'perpetuity'

db = Perpetuity::MongoDB.new(db: 'iso')
Perpetuity.configure do
  data_source db
end

module CRM
  class Customer
    attr_accessor :name, :contacts
  end

  class Person
    attr_accessor :name
  end

  Perpetuity.generate_mapper_for Person do
    index :id

    attribute :name
  end

  Perpetuity.generate_mapper_for Customer do
    index :id

    attribute :name
    attribute :contacts
  end




  p = Person.new
  p.name = 'peter'


  c = Customer.new
  c.name = 'gmbh'
  c.contacts = [p]

  Perpetuity[Customer].insert c

  p c.contacts

  p Perpetuity[Customer].all.first
end

Fix IdentityMap to return the same instance of the passed-in object

Originally, this is how Perpetuity::IdentityMap worked, but then I began using it for dirty tracking, which was the wrong choice because it meant it has to duplicate the object.

An Identity Map is meant to return the same object instance to eliminate the effect of aliasing (updating two different instances of the same object).

Pass attributes and other mappings to queries

If we pass attributes defined in the mapper to the queries, this will allow us to do several different things:

Convert from given attribute on the object to how it is stored in the database a la #15.
- This is an important one for Data Mappers to implement. It's one of the primary reasons to use a Data Mapper over an Active Record.
- This also makes database migrations a lot less of a necessity since if you need to rename something on your model, there isn't a corresponding database change.
Determine how to serialize different types of queries
- Given the following query on a Postgres database: mapper.select { |article| article.author.name == 'Jamie' }
- If the author attribute is referenced, we could do a JOIN and not only get that article, but side-load the author, as well.
- If the author is embedded, we would know that it is serialized as JSON and use the appropriate operators to dig into it and retrieve its name attribute.
- This could also extend to doing things like this to find articles with specific tags: mapper.select { |article| article.tags.any? { |tag| tag == 'perpetuity' } }
Passing the class->table mapping will become important once we support mapping to a different table than the class name, as well, so when we do JOIN queries we're able to join the right table.

Add indexing capability

Possible syntaxes:

Perpetuity.generate_mapper_for Article do
  index :title, unique: true # Unique index at the DB level, get rid of the need for uniqueness validations
  index :published_at, order: :descending # Mongo (as well as many other DBs) supports specifying index order
  index :views, background: true # Reindex in a background thread
end

I don't think it'd be a good idea to ensure these indexes are up each time the mapper is loaded into memory. This would force the DB to reindex on the next deploy if the indexes are changed, possibly shutting the DB down for a while, pissing off ops and business people. For this reason, I think that indexes should be put into effect explicitly through a reindex method that can be triggered by a rake task.

Mongo supports backgrounding of index creation, so we might be able to run indexes safely when the code is loaded. I'll have to test this, though. For example, there may be issues if this gets triggered when multiple instances of an app are deployed (say, a Heroku app on multiple dynos). I'm not sure how MongoDB reacts when an index is created multiple times on the same collection.

I might also look into how Mongoid does this for some inspiration on that front.

Embedded and referenced objects

Embedded-object weirdness

Currently, embedded objects are serialized in the DB as a hash that has a really weird structure. Off the top of my head, I think it's like this:

{
  type: :object,
  class: 'Widget',
  attributes: {
    name: 'Widget name',
    other_attribute: 123
  }
}

Clearly, this isn't optimal. In fact, there isn't any reason for it since the associated object's class is specified inside the mapper class. I would like to store embedded objects as plain hashes if possible (some DBs, like MongoDB, support hashes, but SQL DBs don't) and Marshal.dumped otherwise. Marshal.dump will serialize the object to a string to save to the DB that we can call Marshal.load on to get the object back to its original form when we retrieve it.

The reason we don't want to marshal embedded objects is because MongoDB (and probably other DBs that support data structures) supports querying on embedded objects, so we could eventually modify the query syntax to support that.

Referenced-object weirdness

Related objects are also stored in a weird way. I don't remember the structure of the value we actually store into the field, but I think it's just the value of the id field of the referenced object. :-) We could probably use the same Marshal.dump on some sort of Perpetuity::ObjectReference object that has a method that will retrieve that object from the DB.

ActiveModel wrapper

What do we need ActiveModel for?

Rails counts on a lot of ActiveModel methods in order to do things like form_for in views and redirect_to @foo in controllers.

How should it be done?

As always, several possibilities floating around in my head.

Using mappers as ActiveModel wrappers

Consider this gist. Unfortunately, this means that we need to make the mapper respond to all of the attributes (for form_for) and the id field (for redirect_to). This is trivial to do, but feels weird.

Include wrapper functionality into domain objects

Another possibility is to include the ActiveModel wrapper into the domain class (essentially just moving the include Perpetuity::ActiveModelWrapper call in the aforementioned gist). This would make the ActiveModel functionality expected inside those objects, but it means that the objects themselves aren't entirely decoupled from Rails. The primary idea behind Perpetuity is that there are no other dependencies other than those required for the domain objects to do their job.

Create an entirely different class specifically for wrapping domain objects to use ActiveModel

This idea maintains the purity of the domain objects, keeps the domain-object calls outside of the mappers and still allows us to use ActiveModel functionality. This seems like it might be the best solution, but I'm not 100% sold on it yet. Here's an example. If we could come up with a smoother way of calling it, it might be better.

Automatically persist all object state

Since document databases have no strict schema, there's no reason we can't persist every instance variable inside an object without having to specify them explicitly. If we do it this way, all objects would be in the exact same state as they were when we saved them, which makes more sense to me.

Originally, this was the idea behind Perpetuity, that all object state would be saved in MongoDB. I made it explicit because I wanted to be able to customize persistence of certain attributes, such as embedded vs referenced attributes for objects that must be mapped (i.e. the DB cannot innately serialize them) or mapping attributes to different keys (see #15).

This would probably change the mapper DSL somewhat, though. There might be some attributes a user doesn't want to persist for whatever reason and I'm not sure what to do about customizing attribute serialization. Maybe something like the following:

Perpetuity.generate_mapper_for Article do
  persist_all embed: [:comments], map: { title: 't', body: 'b', comments: 'c' }
end

The idea here would be that the :embed field would contain a list of attributes that would be embedded within the same document rather than referencing another document and the :map field specifies the mapping between attributes and document keys.

Obviously, if we end up creating an adapter for SQL databases (which I would like to do at some point), those would require explicit declaration of which attributes would be persisted, though maybe we could have a PostgreSQL extension to store additional state in hstore.

Intermediate query form

Looking at the query serialization for both the MongoDB and Postgres adapters, there is a lot of similarity (the QueryAttribute classes are nearly identical) and I think a lot of that could be extracted back into the core perpetuity gem and stored in some intermediate form that the database adapters would build their queries from.

The Perpetuity::Retrieval class already has something like this, and I think we should extract that and build on it to add support for things like inserts, updates and deletions. Along with #43, this would allow us to translate between the object-attribute name and the database-column name more easily.

Error undefined method `human' for Modelname:Class

Hi Jamie,
I am trying to implement your perpetuity gem in my app and seem to have some problems- not sure if this is bug or me doing something incorrectly.

I have created an Address class and included 'Perpetuity::RailsModel' in it. However each time I call _form.html.erb I get the following error: undefined method `human' . The trace is pointing to <%= f.submit %>.

I thought that adding Perpetuity:RailsModel was supposed to take care of all the ActiveModel problems but it seems it did not. Any thoughts on this? Can this be fixed or am I doing something wrong? By the way I am using Rails version 3.2. Thanks and best regards.

Query syntax

Retrieval

When we're working with data sets, Ruby already has an outstanding interface for finding data: the Enumerable module.

#select — Retrieve a subset of the data
#detect or #find — Retrieve the first occurrence of matching data

Modification

Other operations such as insertion, deletion and in-place updating can be influenced by Ruby's Array class, since a database collection/table is pretty much a persisted array.

Insertion: #insert (minus the index argument), #<< (the "shovel" operator)
Deletion: #delete_if, #keep_if
Updating: #[]= (passing in the object or the id as the index)

The hard part

What I'd really like to do is to be able to call something like this: ArticleMapper.select { |article| article.published_at < Time.now }. Single-condition queries aren't too bad; just involves a little metaprogramming. This is not so with multiple conditions: ArticleMapper.select { |article| article.published_at < Time.now && article.author == current_user } since we can't override the && operator (syntax error when we try).

If we can get the AST for the block, we can figure it out that way (relatively simple since most calls will be this == that, but will require far more code), but I haven't been able to find a good way to do that in a VM-agnostic way.

Object serialization

Serializing objects into something we can store in the DB is a little tricky at the moment and not at all clean. I think we can clean it up by taking a more generic process:

Inside Mapper#serialize, when iterating over the attributes to serialize, we ask the DB adapter (Perpetuity::MongoDB) if it can serialize the class of each one. If it can, we just add it to the hash of attributes.
If the attribute is a collection (Array, Set, Hash, other Enumerable or anything responding to each), we need to run this same process over each of the elements.
If we come to one that can't be serialized natively by the DB driver, we check to see if we have a mapper for it. If there is a mapper registered for that class, we run its serialize method on the object.
If we get to this point, the object is unserializable and we can either Marshal.dump it or raise an error.

For example, let's say we have an Article class that contains some strings, an author (maybe a Person or User object) and an array of Comment objects:

class Article
  attr_accessor :title, :body, :comments, :author
  def initialize(title='', body='', author=nil, comments=[])
    @title, @body, @author, @comments = title, body, author, comments
  end
end

class Person
  attr_accessor :name

  # ...
end

Comment = Struct.new(:body, :author)

Perpetuity.generate_mapper_for Article do
  attribute :title
  attribute :body
  attribute :comments, embedded: true
  attribute :author
end

Perpetuity.generate_mapper_for Person do
  attribute :name
end

Perpetuity.generate_mapper_for Comment do
  attribute :body
  attribute :author
end

Let's say the author attribute is a reference to a document in another collection while the comments array embeds the Comment objects inside the document. When we serialize an article, we want the resulting data to look something like this:

{
  title: 'Perpetuity rocks!',
  body: 'Lorem ipsem dolor sit amet …',
  author: {
    __metadata__: {
      class: 'Person',
      id: 42
    }
  },
  comments: [
    {
      __metadata__: {
        class: 'Comment'
      }
      body: 'zomg lol',
      author: {
        __metadata__: {
          class; 'Person'
          id: 42,
        }
      }
    },
  ]
}

The author gets serialized as a referenced attribute, so enough information is stored in some metadata hash to be able to recreate it.

The comments in this case would be configured to be an embedded attribute, so the comments are saved entirely within the article''s serialized representation rather than within the Comment collection.

Both the author and comments attributes would need their respective mapper objects.

Make MapperRegistry an object

The MapperRegistry class is currently never instantiated and just uses class-level state. What I'd like to see is a MapperRegistry object created and stored within the Configuration object. This will allow us to be able to instantiate multiple Perpetuity configurations each with its own MapperRegistry.

Also, class-level state is a smell I'd rather not have.

Rails integration

We should automatically detect whether we're running under Rails and implement other classes and modules to work with it. Examples:

Mapper reloading

In the development environment, when code is reloaded between requests, the model class constants are reassigned to different class objects. This means that we get No mapper for Foo exceptions when trying to access mappers.

If we reload the mapper definitions, it will pick up the changes to the constants. I figure this could be done with Rack middleware.

ActiveModel headaches

Perpetuity can deal with POROs just fine, but Rails can't. It expects all model classes and objects to respond to ActiveModel methods, which means we can either provide a module to be included into the models or force users to define all this behavior on their models, which is unfortunate.

To get a PORO to perform like an ActiveModel object, we need to do the following:

include ActiveModel::Conversion
extend ActiveModel::Naming
def persisted? Determines whether the model is persisted.
def to_param The ID of the model or nil otherwise.
def to_key Defined in ActiveModel::Conversion, but it won't work if we remove the id method from the models.

If we provide that functionality in a Perpetuity::RailsModel to be included when the model class is loaded, Rails will know how to handle these objects when using redirect_to, form_for, content_tag_for, etc.

Mapper#reindex! should delete unused indexes

Currently, if you specify an index in the mapper DSL and call reindex!, it creates any of those indexes that don't exist in the DB.

reindex! should synchronize the DB with what is currently specified in the mapper DSL. This way, if you redeploy with a removed index DSL call, running reindex! on the mapper would remove that index for real.

Removing indexes (using Mapper#remove_index!) currently seems to be broken (it removes them from the set of indexes, but not from the DB) and I've been beating my head against my desk for hours trying to figure out a good way to fix it.

Extract mapper registry

The Mapper class currently keeps track of all mappers. Instead, we should extract that functionality into a MapperRegistry and keep Mapper as a base class for defining attributes to serialize to the DB.

This will make several things easier, as well. For example, in #10, I had circular dependencies due to mappers instantiating serializers which would then need to get attribute lists from Mapper. If we extract this, the serializers won't depend on Mapper to get information about other mappers, they'll both instead rely on MapperRegistry.

Date ranges do not work

I cannot implement the date range query:
start_date = '2014-03-30'
end_date = '2014-08-05'
documents =Perpetuity[Document].select { |document| document.revision_date.in (start_date.to_time..end_date.to_time) }
puts documents.to_a.size

This generates
2014-08-23 14:36:54 BST ERROR: syntax error at or near "2014" at character 49
2014-08-23 14:36:54 BST STATEMENT: SELECT * FROM "Document" WHERE revision_date IN 2014-03-30 00:00:00 UTC..2014-08-05 00:00:00 UTC
instead of
2014-08-23 14:36:54 BST ERROR: syntax error at or near "2014" at character 49
2014-08-23 14:36:54 BST STATEMENT: SELECT * FROM "Document" WHERE revision_date IN '2014-03-30 00:00:00 UTC'..'2014-08-05 00:00:00 UTC'

JOIN queries

All the major SQL and a few NoSQL databases support JOIN queries. Currently, there is nothing in here to support them. Everything is queried like we're doing an ActiveRecord includes to load associated records, but I'd like to support joins:

article_mapper = Perpetuity[Article]
article_mapper.select { |article| article.author.name == 'Jamie' }

The mapper would then look at the author attribute, see that it's a reference to another object, then set up that join. And along with #44, we need a way to indicate that in the intermediate query form.

Remove validations?

I'm on the fence about whether including validations with persistence is a good idea, leaning more toward removing them. Since we're only serializing object state, if you keep invalid state out of the objects you don't need to worry about persisting invalid data.

Since Perpetuity validations are extremely minimal — we only validate presence and string length at the moment, IIRC — we could probably gut this with minimal ramifications.

Validating data is extremely important in production apps and it isn't a simple task, so if there isn't a nice Ruby gem that validates POROs, we may need to keep — and drastically improve — validation support.

Creating objects in memory uses `Object.allocate`

Currently, objects are not instantiated with .new because constructors may require parameters, which would raise an exception. Using allocate isn't necessarily a bad thing, but it could cause a lot of trouble (and confusion) if a class' constructor initializes variables that aren't initialized anywhere else.

class Widget
  def initialize
    @subwidgets = []
  end
end

@subwidgets will not be initialized. If it's one of the attributes that gets persisted, this isn't a problem, but if it isn't it will be nil after being retrieved from the DB when the object expects it to be an Array.

Customized mapping for column and table ?

Is there any way to map an object property to a table column with different name ?
Same thing for tables ?

Do you plan to have this feature ?

Ex:

Perpetuity.generate_mapper_for MyClass do
table :my_odd_table_name
attribute :my_attribute, column: :my_column_with_odd_name
attribute :my_other_attribute
end

Validations

Every ORM needs validations of some sort, right?

I'm not too sure about this, to be honest, because determining if an object is valid is pretty specific to the object. However, there are several possibilities.

Collection-scoped versus object-scoped validations

Validations can be scoped at either the object or the collection level. In an object-scoped validation, the object's validity can be determined without looking at any other object in the collection. This refers to things like validating that a field isn't blank or that an e-mail address field is, in fact, formatted as an e-mail address.

On the other hand, collection-scoped validations have everything to do with the values of other objects in the collection. Uniqueness is an example of this sort of validation — and is, in fact, the only one I can think of off the top of my head.

Ideas for object-scoped validations

Object-scoped validations seem best to leave to the class of the domain object, I think, but it would be nice to provide an easy way for them to ensure their validity.

A simple interface would be that the domain object's class defines a method called valid? that determines whether or not the object's data is as expected. If the object does not respond to that method, we could assume that it has no object-level validations.

Ideas for collection-scoped validations

Collection-scoped validations could be done a few different ways. The first is listing the validations in an ActiveRecord-style validates_X_of :attribute fashion inside the mapper. This gets ugly pretty quickly if there are more than 2 or 3, though. But since I can't personally think of anything more than uniqueness at this scope, this is still a possibility.

If, however, we decide we need several validations at this scope, we can do something like:

class ArticleMapper
  validations do
    unique :title
    something_else :attr1
    something_entirely_different :attr2
  end
end

This both consolidates validations in the mapper (which we should be doing anyway) and gets rid of the ugly repetition of validates_X_of calls.

If anyone on the internets has any ideas, I'd love to hear them.

Allow mapping object attributes to different DB fields

In document databases like MongoDB, it's common practice to abbreviate keys in objects to reduce the size of its footprint. We can support that without forcing users to abbreviate attribute names by offering an option to map attributes to differently named database fields/keys. Example:

class ArticleMapper < Perpetuity::Mapper
  map Article

  attribute :title, maps_to: 't'
  attribute :body, maps_to: 'b'
  attribute :comments, maps_to: 'c'
end

Associations

I'm trying to decide how to deal with relationships between objects. Let's say we have two classes, Article and Comment. An Article should have a collection of Comments associated with it.

There are a couple ways we can do this. One way would be to explicitly call the CommentMapper, like so:

article = ArticleMapper.retrieve article_id
article.comments = CommentMapper.comments_for article

Or we could go with something like

article = ArticleMapper.retrieve article_id
comments = article.comments

The first example seems to be the most true to the Data Mapper pattern, but the second one is ridiculously convenient and can be done with a little metaprogramming. If we are storing the article_id field inside each Comment record, we could define a method on the article object when it's retrieved that retrieves its comments if they haven't been retrieved, something like:

article.define_singleton_method :comments do
  @comments ||= CommentMapper.retrieve(article_id: self.id)
end

This method wouldn't exist before retrieving the article, which means it would fall back to its attr_reader or whatever method it was using as its comments attribute. I'm pretty sure this violates the Data Mapper pattern and just moves back toward the ActiveRecord pattern, unfortunately. I just like the convenience of it.

Another way of handling this that I've been thinking of is to get the ArticleMapper class to stuff the comments into the article:

article = ArticleMapper.first
ArticleMapper.load_comments_for article

There are advantages and disadvantages to each. If you have to load each association explicitly, you see how much you're actually hitting the database. This allows you, for example, to trim up a Rails controller by culling some of the calls if you don't actually need them. However, for pages that are expected to have a lot of data on them, this could become ugly very quickly.

Provide mappers for some stdlib classes

We should include mappers for some common core and standard-library classes so they can be persisted properly (i.e. without being Marshal.dumped). For example, an Article might have a list of tags, but they're in a Set rather than an Array.

Classes that should get mappers (will update if I come up with more):

Set
Bignum / BigDecimal
URI
Complex (maybe?)
Range

This could be a little tricky since mappers need to have intimate knowledge of mapped objects and some of these classes may be implemented differently on various Ruby implementations. In the case of the Set class, the data is kept in an internal hash called @hash in both MRI and Rubinius, but this isn't guaranteed to be true across other Ruby implementations that we might want to support.

Additionally, Set (and, presumably, other standard-library classes) doesn't provide direct access to that ivar, so we'll need to update serialization and deserialization to reach in via instance_variable_{set,get} if there is no attr_accessor. The good news is that this will remove the need for users to put attr_accessors on all their classes. I think it's okay for mappers to bypass an object's public API to have access to the state of an object since its sole purpose is to persist that state.

Rework Mapper implementation

The current implementation of the Mapper superclass is kind of ridiculous. When you generate a mapper for a class, what happens is that a new Mapper object is created and the block is evaluated in the context of that object.

It seemed like a good idea at the time, but this can cause weird problems. My primary concern is that mapper objects become global state. If the mapper is modified in any way for any reason, it stays like that for as long as the Ruby VM is running.

Here is my idea for how to get mappers working decently. This will allow users to use the factory form or the class form.

Perpetuity assumes that objects respond_to?(:id).

Perpetuity::Mapper#find assumes that it can call object.id for retrieved objects. I don't think we can/should assume that. Everywhere else, we use instance_variable_get via id_for. I think we should use that in #find as well.

Let me know if this makes sense, and you'd like me to create a PR.

load_association! does not make sense

I feel like load_association! doesnt make sense.

Yeah, sometime we want to retrieve some entity from the DB without quering all the references it has. But... we could do a lazy initialization to those references.

When i run this (for example):

user = Perpetuity[User].first
user.friends #=> [<User:#objectid @name="John">]

it should say "oh, you are trying to retrieve the friends from user, let me query them for you".

Does this make sence? Maybe is not that easy to implement

RSpec compatibility

Hi there,
im trying out perpetuity and like it so far. Im not sure how it goes in production or a bigger project, but i like separating persistence and data/logic concerns.

while testing a had two problems:

i need to have a real mongodb running. a in memory db would be great (i saw there is something in the issues here, so feel free to ignore this)
i have to clear the db after each test. i did not found a way to do this without going into the internals like this

spec/support/perpetuity

require 'perpetuity'

db = Perpetuity::MongoDB.new(db: 'test')
Perpetuity.configure do
  data_source db
end

# hack perpetiuty to have each mapper delete_all
mapper_registry = Perpetuity.mapper_registry
def mapper_registry.mappers
  @mappers.keys.map { |c| self[c] }
end

RSpec.configure do |config|
  config.around do |example|
    example.run

    Perpetuity.mapper_registry.mappers.each do |mapper|
      mapper.delete_all
    end

  end
end

i guess the easiest way would be to have a public MapperRegistry#mappers method so that i dont have to rely on the internals of perpetuity. or even better, support for transactions and rollbacks like AR has (which probably is a bit larger feature).

anyways, do you, or someone else, use perpetuity in small to middle applications in production?

unable to run performance tests with benchmark

Hi ,
I am trying to run rake test:benchmark which works with ActiveRecod but when I use it with Perrpetuity gem I get uninitialized constant Perpetuity::RailsModel (NameError). I tried adding 'require perpetuity' to the top of the test but it looks like it is not as simple. Is this something that can be fixed with minimal effort?

License missing from gemspec

RubyGems.org doesn't report a license for your gem. This is because it is not specified in the gemspec of your last release.

via e.g.

spec.license = 'MIT'
# or
spec.licenses = ['MIT', 'GPL-2']

Including a license in your gemspec is an easy way for rubygems.org and other tools to check how your gem is licensed. As you can image, scanning your repository for a LICENSE file or parsing the README, and then attempting to identify the license or licenses is much more difficult and more error prone. So, even for projects that already specify a license, including a license in your gemspec is a good practice. See, for example, how rubygems.org uses the gemspec to display the rails gem license.

There is even a License Finder gem to help companies/individuals ensure all gems they use meet their licensing needs. This tool depends on license information being available in the gemspec. This is an important enough issue that even Bundler now generates gems with a default 'MIT' license.

I hope you'll consider specifying a license in your gemspec. If not, please just close the issue with a nice message. In either case, I'll follow up. Thanks for your time!

Appendix:

If you need help choosing a license (sorry, I haven't checked your readme or looked for a license file), GitHub has created a license picker tool. Code without a license specified defaults to 'All rights reserved'-- denying others all rights to use of the code.
Here's a list of the license names I've found and their frequencies

p.s. In case you're wondering how I found you and why I made this issue, it's because I'm collecting stats on gems (I was originally looking for download data) and decided to collect license metadata,too, and make issues for gemspecs not specifying a license as a public service :). See the previous link or my blog post aobut this project for more information.

Select DSL breaks when using ivars

The following query doesn't do as you might think:

Perpetuity[User].select { email == @email }

This is because the DSL evaluates in the context of a Perpetuity::MongoDB::Query to make use of method_missing to find out which attributes are being used. We might be able to do a different type of select that resembles Enumerable more closely (as in mapper.select { |user| user.email == @email }) by calling the block with a Query as an argument rather than using instance_exec on the block.

Will investigate.

Circular references cause infinite recursion

This code:

require 'perpetuity/postgres'

Perpetuity.data_source 'postgres://localhost/perpetuity_gem_test'

class Foo
  def initialize
    @bar = Bar.new(self)
  end
end

class Bar
  def initialize foo
    @foo = foo
  end
end

Perpetuity.generate_mapper_for Foo do
  attribute :bar, type: Bar
end

Perpetuity.generate_mapper_for Bar do
  attribute :foo, type: Foo
end

Perpetuity[Foo].insert Foo.new

… causes this error: stack level too deep (SystemStackError)

The problem is that when an object is serialized, referenced objects are persisted to get their ids so we can store those references. That is, when the Foo's bar attribute is serialized, it is inserted into the database to get its id to store in the bar column, which then repeats the process the other way around, causing infinite recursion.

I have no idea how to solve this.