In this issue I note in a ad-hoc way things of relevance to selecting our database. It can be summarised as follows: I think our most suitable databases are RethinkDB or PostgreSQL (or some combination of those).
Requirements
We require a database for two main purposes:
- store results from observations from iOS client (interaction events)
- storing results from user evaluations
- storing a cache of tweets to work around the Twitter API rate limits
What ever our selection will be there must be good support for writing our DB interfacing code in Python.
Potential document DB candidates:
Potential relational DB candidates:
Notes
This issue tracks notes on what data stores might suit us best.
http://www.stackoverkill.com/ranking/sql-nosql
By mentions on Stack Overflow MongoDB dwarfs the other NoSQL DBs. Next up comes Redis and Cassandra.
List of document databases on Wikipedia: https://en.wikipedia.org/wiki/Document-oriented_database#Implementations
- Redis: 18,590
- RethinkDB: 14,419
- MongoDB: 9,558
- PouchDB: 6,462
- Titan: 3,791
- FlockDB: 2,875
- NEO4J: 2,749
- CouchDB: 2,572
and more
Public cloud service to host various databases:
- MongoDB
- ElasticSearch
- RethinkDB
- Redis
- ETCD
- RabbitMQ
Document Databases
CrateIO
Didn't dive deep into it. A share-nothing document database that seems to be designed for containerised environments (Docker).
https://en.wikipedia.org/wiki/CrateIO
https://crate.io/
ElasticSearch
Originally I thought this was only a search engine but apparently it's a document database as well. The official website contains only ambiguous and general marketing material.
This video gave me a pretty good quick start picture of what ElasticSearch is and how you do basic queries:
https://www.youtube.com/watch?v=60UsHHsKyN4
RethinkDB
A realtime document database that pushes changes to clients. Native driver languages: Ruby, Python, Java, and JavaScript. From what I've read I've only good impressions. Documentation is good, second most starred database on GitHub, after Redis.
MongoDB
Redis
Probably not what we are looking for. Redis is a fast in-memory key-value store, not a document database.
Relational Databases
On first glance we think that a relational database might be the best option to store the results from user evaluations as well as interaction observations.
Originally we aimed for a document DB to store the tweet cache (JSON). However on further investigation it seems relational databases have been adding support for JSON over the years. If the support is good enough it would probably be best to use one database instead of two.
PostgreSQL
PostgreSQL has had JSON support since version 9.2. In version 9.5 support was added to modify JSON in place.
JSON data type: https://www.postgresql.org/docs/9.5/static/datatype-json.html
JSON functions: https://www.postgresql.org/docs/9.5/static/functions-json.html
Resources
Rob Conery - Document Storage Techniques with PostgreSQL and JSONB
Link: https://www.youtube.com/watch?v=rg_GiOZ5Owk
Summary: With recent addition of JSON support to PostgreSQL it has now acquired one of the main benefit of NoSQL databases, get started fast. He takes an example that the appeal of NoSQL is that you don't have to worry about the schema'a right away. To get going fast, just add a column or a table with JSONB (JSON Binary representation) and store the information you need. As time goes on and you need to access the data you slowly normalise the data into tables over time.
Is PostgreSQL Your Next JSON Database?
Link: https://www.compose.io/articles/is-postgresql-your-next-json-database/
Summary: PostgreSQL is nice if we are not changing the JSON a lot. If we are, a document database would be more suitable. According to this blog you can't update JSON fields in place, you have to have some external (Python) code to take it out, add it, then dump it all in again.
MySQL
Todo.
References
Document databases, Wikipedia: https://en.wikipedia.org/wiki/Document-oriented_database