Comments (9)
I think this is a non-issue.
from moebius.
@robconery there's another angle of the same issue, i.e. returning atomized keys. Check out my use case:
db(:cqrs_events) |> insert(%{event: "Foo", payload: %{"I come from user" => "yup"}}) |> Cqrs.Events.Db.run
[%{created_at: "2016-5-20 15:14:9", event: "Foo", id: 5,
payload: %{"I come from user": "yup"}, updated_at: "2016-5-20 15:14:9"}]
I am saving the payload exactly like user submits it from JavaScript. There can be any keys. Please correct me if I am wrong, but given a smart user, bad intentions or just an accidental case of system generating those atoms from user input, we are screwed.
from moebius.
I am saving the payload exactly like user submits it from JavaScript.
I believe that's your problem right there, isn't it?
from moebius.
Well, that is actually not entirely true in my case, I do use structs in betweens. So I am safe.
But one way or another, I would expect the document storage to be safe to use as a back end for unstructured data. Great for rapid prototyping etc. Why don't we change it so it returns string keys?
from moebius.
It is perfectly safe :). If you want to store ad hoc values maybe JSON.stringify? Is this any different for any other system? If you're using structs you have atom keys already don't you?
from moebius.
Yeah well. In principle database is external source. System should be able to handle any values that come from external source, weather it's user input, database, file upload etc.
In current set up, cleverly prepared database contents can bloat the virtual machine with atoms. No, it is not perfectly safe, and it should be.
As soon as serious businesses start using moebius this is going to be your problem, and I think it's easier to address it now than later.
from moebius.
System should be able to handle any values that come from external source, weather it's user input, database, file upload etc
Which system are you referring to?
In current set up, cleverly prepared database contents can bloat the virtual machine with atoms. No, it is not perfectly safe, and it should be.
If by "cleverly prepared" you mean a database with 1,048,576 unique columns that you query against - then yes - you will bring your VM to its knees. At that point, friend, the atom table won't be your problem - losing your job for being a bad programmer will. Also, you'll likely run completely out of RAM if you bring over a million columns of data in....
I understand your point, and I've had this discussion quite a few times. I even wrote a post about it:
http://rob.conery.io/2016/02/20/red4-store-part-4/
I'm not trying to be dismissive, I promise. I have given this a whole lot of thought and every time I come away thinking that it's just not an issue.
The primary challenge I receive on this is exactly yours:
What happens when a bad user causes your system to create a bunch of atoms?
And my counter is "that's your fault, not mine". Do you see what I mean? Your argument amounts to "what if I let a user destroy my database and my VM?". That's not something I feel I have to protect you from - you're smart enough to avoid that problem aren't you?
If you need ad-hoc, unstructured data then use a system built for it. Postgres can do this for you, but if you're worried about a "keys gone wild" then put a cap on it or, better yet, just store the data as a string. You'll blow up your indexing anyway if you allow users to ad-hoc your DB (which you should never do).
Even if you store the data as a string, what's stopping Bad User from sending in a 10G bombity bomb that blows your RAM? Would you do such a thing?
Never.
So, if you want to debate this I'm open - as I said I don't want to be dismissive and while I've deemed this to be a "non-issue", I'm still very open to the discussion.
I'd like to ask you first, however, to:
- Offer a reasonable scenario that doesn't come down to "if I do something really dumb"
- Offer a position that doesn't start with "in principle"
- Offer a summary that doesn't condescendingly end with "when Moebius is taken seriously".
Deal?
from moebius.
@robconery I must say I do really enjoy your approach to debating stuff online.
1,048,576 unique columns that you query against
You don't have to query, it's enough that you insert some amount of documents into the database. And not a large amount of documents at all. Let's say you allow your users to send out "unstructured" json data, that you seiralize to database as "notes". Note can have 500 keys in json. That's 2000 documents your user saves in the database that kills off your Erlang VM. Or when it's external database, it's 2000 documents that you read that'll kill your VM.
I would actually assume it's safe to take JSON from user and dump it into the database. A lot of API back-ends for JavaScript web apps (or mobile apps) do just that. Quite frankly, I don't see a reason why we can't returns string keys and feels slightly safer. I do not think this is user doing something really dumb. Not at all.
I just checked ActiveRecord and Ecto. They both return String keys for JSONB documents. This is also how PostgreSQL stores JSONB keys internally. I expect it to be Strings, not atoms for keys. I would also expect that whatever I store to this JSONB column will get back to me from database in similar format. I don't see a reason why would it change type of keys in map returned to atoms (I can easily justify otherwise: strings are internal representation in Postgres).
Moreover, in Moebius, this is only the problem with DocumentQuery stuff. If I stick to using Query, I get the values from JSONB fields as Strings, just as old gods wanted it to be.
db(:cqrs_events) |> Cqrs.Events.Db.run
%{body: %{"event" => "PotDeactivated",
"payload" => %{"pot_id" => "2e1198d8-162b-45b2-8107-7abd99d478b1"}},
created_at: "2016-5-20 14:24:40", id: 47, ...},
%{body: %{"event" => "PotDeactivated", ...}, created_at: "2016-5-20 14:24:50",
...}, %{body: %{...}, ...}, %{...}, ...]
so we're getting here atoms as database column names, and strings for JSONB keys.
If you want it to be consistent for documents/DocumentQuery I'd propose we switch to all String keys there.
from moebius.
I think there's some confusion here. An atom in in Erlang/Elixir is a singular entity, there aren't multiple :email
atoms, for instance - there's just one. So this:
Let's say you allow your users to send out "unstructured" json data, that you seiralize to database as "notes". Note can have 500 keys in json. That's 2000 documents your user saves in the database that kills off your Erlang VM. Or when it's external database, it's 2000 documents that you read that'll kill your VM.
That depends on the number of unique keys your users are saving. I suppose that it's likely that each of those 2000 documents might have 500 unique keys apiece, but that would be rather extreme. It would also be extremely foolish of you to allow JSONB to handle such a thing.
I would actually assume it's safe to take JSON from user and dump it into the database. A lot of API back-ends for JavaScript web apps (or mobile apps) do just that. Quite frankly, I don't see a reason why we can't returns string keys and feels slightly safer. I do not think this is user doing something really dumb. Not at all.
First: no, that's not a good assumption. You're letting your user store ad-hoc data in your system. If you're Firebase, sure OK go ahead. A lot of API backends for JS web apps do just that sounds kind of arm-wavy so I'll counter that they don't, in fact, do that.
I don't see a reason why we can't returns string keys and feels slightly safer.
For all the reasons I think you're outlining here. See your posts above for reference. Store the values as a JSON string and we're good. It's all strings anyway isn't it?
I just checked ActiveRecord and Ecto. They both return String keys for JSONB documents. This is also how PostgreSQL stores JSONB keys internally. I expect it to be Strings, not atoms for keys.
The JSON spec says that keys should be strings, that might be why you expect what you do. JSONB is stored as a binary tree so, no, the keys are not strings. To prove this to yourself, try to store duplicate keys.
Ecto/AR returns the data as a JSON dump. They don't have the notion of document abstraction the way we do. If you want this, then don't use our document stuff. By the way: Ecto returns query results using atom keys.
I would also expect that whatever I store to this JSONB column will get back to me from database in similar format. I don't see a reason why would it change type of keys in map returned to atoms (I can easily justify otherwise: strings are internal representation in Postgres).
Moebius translates a JSONB document to a map, is why. Yes I understand you "don't see a reason" - it's why we're having this discussion. PostgreSQL stores data in lots of ways, as does Elixir. It's one of the least fun parts of this effort: reconciling the two. As the creator of Moebius I made a choice to go with atom keys for a document DB abstraction on top of Postgres. They're easier to use and, to me, simpler to reconcile with your code (structs, mappings, etc).
Moreover, in Moebius, this is only the problem with DocumentQuery stuff. If I stick to using Query, I get the values from JSONB fields as Strings, just as old gods wanted it to be.
Right. You've discovered the difference between our Document abstraction and our Relational abstraction. Good work.
If you want it to be consistent for documents/DocumentQuery I'd propose we switch to all String keys there.
If I wanted it to be "consistent" I wouldn't have created the entire document abstraction in the first place. Look mate, I think we've exhausted this thread. You don't want atom keys, and there's a simple solution for you: don't use Moebius' document abstraction. That way, when your business gets "serious", you won't have a problem.
from moebius.
Related Issues (20)
- Update library HOT 5
- Make PostgresTypes configurable HOT 1
- Any plans of going to postgres 0.16.x HOT 6
- How do I ACTUALLY use it HOT 7
- Inflex not loading
- Does Moebius sanitize user input? HOT 5
- What does this error mean? HOT 12
- Hang problem. HOT 1
- run_with_psql doesn't use the configured connection info HOT 3
- Updating array column: HOT 8
- readme talks about existence operator, but doesn't show it? HOT 4
- Async test HOT 2
- `pool_mod` option does not work HOT 11
- Allow caching query information
- Is is possible to run moebius and ecto together? HOT 1
- Return values for IO actions HOT 2
- Dependency conflict with phoenix in umbrella app HOT 1
- Date Test Fails
- Moebius.DocumentQuery return only one entry? HOT 3
- limit not being honored alongside search HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moebius.