Giter Club home page Giter Club logo

Comments (9)

robconery avatar robconery commented on July 16, 2024

I think this is a non-issue.

from moebius.

hubertlepicki avatar hubertlepicki commented on July 16, 2024

@robconery there's another angle of the same issue, i.e. returning atomized keys. Check out my use case:

db(:cqrs_events) |> insert(%{event: "Foo", payload: %{"I come from user" => "yup"}}) |>  Cqrs.Events.Db.run
[%{created_at: "2016-5-20 15:14:9", event: "Foo", id: 5,
   payload: %{"I come from user": "yup"}, updated_at: "2016-5-20 15:14:9"}]

I am saving the payload exactly like user submits it from JavaScript. There can be any keys. Please correct me if I am wrong, but given a smart user, bad intentions or just an accidental case of system generating those atoms from user input, we are screwed.

from moebius.

robconery avatar robconery commented on July 16, 2024

I am saving the payload exactly like user submits it from JavaScript.

I believe that's your problem right there, isn't it?

from moebius.

hubertlepicki avatar hubertlepicki commented on July 16, 2024

Well, that is actually not entirely true in my case, I do use structs in betweens. So I am safe.

But one way or another, I would expect the document storage to be safe to use as a back end for unstructured data. Great for rapid prototyping etc. Why don't we change it so it returns string keys?

from moebius.

robconery avatar robconery commented on July 16, 2024

It is perfectly safe :). If you want to store ad hoc values maybe JSON.stringify? Is this any different for any other system? If you're using structs you have atom keys already don't you?

from moebius.

hubertlepicki avatar hubertlepicki commented on July 16, 2024

Yeah well. In principle database is external source. System should be able to handle any values that come from external source, weather it's user input, database, file upload etc.

In current set up, cleverly prepared database contents can bloat the virtual machine with atoms. No, it is not perfectly safe, and it should be.

As soon as serious businesses start using moebius this is going to be your problem, and I think it's easier to address it now than later.

from moebius.

robconery avatar robconery commented on July 16, 2024

System should be able to handle any values that come from external source, weather it's user input, database, file upload etc

Which system are you referring to?

In current set up, cleverly prepared database contents can bloat the virtual machine with atoms. No, it is not perfectly safe, and it should be.

If by "cleverly prepared" you mean a database with 1,048,576 unique columns that you query against - then yes - you will bring your VM to its knees. At that point, friend, the atom table won't be your problem - losing your job for being a bad programmer will. Also, you'll likely run completely out of RAM if you bring over a million columns of data in....

I understand your point, and I've had this discussion quite a few times. I even wrote a post about it:
http://rob.conery.io/2016/02/20/red4-store-part-4/

I'm not trying to be dismissive, I promise. I have given this a whole lot of thought and every time I come away thinking that it's just not an issue.

The primary challenge I receive on this is exactly yours:

What happens when a bad user causes your system to create a bunch of atoms?

And my counter is "that's your fault, not mine". Do you see what I mean? Your argument amounts to "what if I let a user destroy my database and my VM?". That's not something I feel I have to protect you from - you're smart enough to avoid that problem aren't you?

If you need ad-hoc, unstructured data then use a system built for it. Postgres can do this for you, but if you're worried about a "keys gone wild" then put a cap on it or, better yet, just store the data as a string. You'll blow up your indexing anyway if you allow users to ad-hoc your DB (which you should never do).

Even if you store the data as a string, what's stopping Bad User from sending in a 10G bombity bomb that blows your RAM? Would you do such a thing?

Never.

So, if you want to debate this I'm open - as I said I don't want to be dismissive and while I've deemed this to be a "non-issue", I'm still very open to the discussion.

I'd like to ask you first, however, to:

  • Offer a reasonable scenario that doesn't come down to "if I do something really dumb"
  • Offer a position that doesn't start with "in principle"
  • Offer a summary that doesn't condescendingly end with "when Moebius is taken seriously".

Deal?

from moebius.

hubertlepicki avatar hubertlepicki commented on July 16, 2024

@robconery I must say I do really enjoy your approach to debating stuff online.

1,048,576 unique columns that you query against

You don't have to query, it's enough that you insert some amount of documents into the database. And not a large amount of documents at all. Let's say you allow your users to send out "unstructured" json data, that you seiralize to database as "notes". Note can have 500 keys in json. That's 2000 documents your user saves in the database that kills off your Erlang VM. Or when it's external database, it's 2000 documents that you read that'll kill your VM.

I would actually assume it's safe to take JSON from user and dump it into the database. A lot of API back-ends for JavaScript web apps (or mobile apps) do just that. Quite frankly, I don't see a reason why we can't returns string keys and feels slightly safer. I do not think this is user doing something really dumb. Not at all.

I just checked ActiveRecord and Ecto. They both return String keys for JSONB documents. This is also how PostgreSQL stores JSONB keys internally. I expect it to be Strings, not atoms for keys. I would also expect that whatever I store to this JSONB column will get back to me from database in similar format. I don't see a reason why would it change type of keys in map returned to atoms (I can easily justify otherwise: strings are internal representation in Postgres).

Moreover, in Moebius, this is only the problem with DocumentQuery stuff. If I stick to using Query, I get the values from JSONB fields as Strings, just as old gods wanted it to be.

db(:cqrs_events) |> Cqrs.Events.Db.run
 %{body: %{"event" => "PotDeactivated",
     "payload" => %{"pot_id" => "2e1198d8-162b-45b2-8107-7abd99d478b1"}},
   created_at: "2016-5-20 14:24:40", id: 47, ...},
 %{body: %{"event" => "PotDeactivated", ...}, created_at: "2016-5-20 14:24:50",
   ...}, %{body: %{...}, ...}, %{...}, ...]

so we're getting here atoms as database column names, and strings for JSONB keys.

If you want it to be consistent for documents/DocumentQuery I'd propose we switch to all String keys there.

from moebius.

robconery avatar robconery commented on July 16, 2024

I think there's some confusion here. An atom in in Erlang/Elixir is a singular entity, there aren't multiple :email atoms, for instance - there's just one. So this:

Let's say you allow your users to send out "unstructured" json data, that you seiralize to database as "notes". Note can have 500 keys in json. That's 2000 documents your user saves in the database that kills off your Erlang VM. Or when it's external database, it's 2000 documents that you read that'll kill your VM.

That depends on the number of unique keys your users are saving. I suppose that it's likely that each of those 2000 documents might have 500 unique keys apiece, but that would be rather extreme. It would also be extremely foolish of you to allow JSONB to handle such a thing.

I would actually assume it's safe to take JSON from user and dump it into the database. A lot of API back-ends for JavaScript web apps (or mobile apps) do just that. Quite frankly, I don't see a reason why we can't returns string keys and feels slightly safer. I do not think this is user doing something really dumb. Not at all.

First: no, that's not a good assumption. You're letting your user store ad-hoc data in your system. If you're Firebase, sure OK go ahead. A lot of API backends for JS web apps do just that sounds kind of arm-wavy so I'll counter that they don't, in fact, do that.

I don't see a reason why we can't returns string keys and feels slightly safer.

For all the reasons I think you're outlining here. See your posts above for reference. Store the values as a JSON string and we're good. It's all strings anyway isn't it?

I just checked ActiveRecord and Ecto. They both return String keys for JSONB documents. This is also how PostgreSQL stores JSONB keys internally. I expect it to be Strings, not atoms for keys.

The JSON spec says that keys should be strings, that might be why you expect what you do. JSONB is stored as a binary tree so, no, the keys are not strings. To prove this to yourself, try to store duplicate keys.

Ecto/AR returns the data as a JSON dump. They don't have the notion of document abstraction the way we do. If you want this, then don't use our document stuff. By the way: Ecto returns query results using atom keys.

I would also expect that whatever I store to this JSONB column will get back to me from database in similar format. I don't see a reason why would it change type of keys in map returned to atoms (I can easily justify otherwise: strings are internal representation in Postgres).

Moebius translates a JSONB document to a map, is why. Yes I understand you "don't see a reason" - it's why we're having this discussion. PostgreSQL stores data in lots of ways, as does Elixir. It's one of the least fun parts of this effort: reconciling the two. As the creator of Moebius I made a choice to go with atom keys for a document DB abstraction on top of Postgres. They're easier to use and, to me, simpler to reconcile with your code (structs, mappings, etc).

Moreover, in Moebius, this is only the problem with DocumentQuery stuff. If I stick to using Query, I get the values from JSONB fields as Strings, just as old gods wanted it to be.

Right. You've discovered the difference between our Document abstraction and our Relational abstraction. Good work.

If you want it to be consistent for documents/DocumentQuery I'd propose we switch to all String keys there.

If I wanted it to be "consistent" I wouldn't have created the entire document abstraction in the first place. Look mate, I think we've exhausted this thread. You don't want atom keys, and there's a simple solution for you: don't use Moebius' document abstraction. That way, when your business gets "serious", you won't have a problem.

from moebius.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.