Comments (10)
First of all, if you send a pid between nodes, it's converted to a partisan_remote_reference because pid's can't be sent between nodes not connected with distributed Erlang, so it has to convert it to a format that works without disterl. That's what that represents.
Second, without seeing the code that's inserting the values into the set, I don't know whether or not this is a bug or not. It could be that you're using the awset incorrectly.
from lasp.
Thanks for consideration.
Like you, I directly thought there would be an issue in my own code.
This is why I freshly cloned Lasp from this repo to have a clean version with no modification at all and check if the same behaviour shows up.
I simply followed the instruction from readme:
clone
make
open two terminals
rebar3 shell --name [email protected] in first terminal
rebar3 shell --name [email protected] in second terminal
From node a:
lasp_peer_service:join('[email protected]').
lasp:update({<<"test">>, state_awset}, {add, "5"}, self()).
(simply putting a silly value inside a set so that the node will send its state. If I'm not wrong update will declare the state_awset so no need to explicitly declare it).
erlang:process_info(whereis(lasp_ets_storage_backend), memory). Manually from time to time to visualize the process size.
Result:
It is very slow but with time the size gets bigger and bigger. As an example, it started around 140 kB and was around 500-600 kB after 10 minutes running (with the exact cluster of 2 nodes just described).
Previously (not on the clean version just cloned), I did some little modifications and it was making the memory utilization increasing much faster but still the fact even on the clean correct version it grows seems like a potential issue to me.
As a remark:
If I create a bigger cluster (let's say 5 nodes instead of 2) the growth in memory utilization growth is faster.
If I decrease the state_interval (default was 10000ms) let's say to 1000ms, the memory utilization growth is a lot faster.
By simply adding a little print in the lasp_ets_storage_backend to see what it was recording into ets table, which makes the terminal outputs disgusting but does not modify any behaviour, I see something similar to the screenshot above (that screenshot was on a cluster of 5 nodes) where some references are repeated many times. I don't know why it does this. But the fact it is exact redundancy inside a single record seems strange to me.
I will put a screenshot of that little print in the clean just cloned version to see if it's the exact same thing as before (printing the Record inside do_put function just before ets:insert).
I might be wrong in my approach, if you believe it's the case, please let me know.
from lasp.
Here is the outputs from the io:format printing what is put in ets table.
This is from the exact 2 nodes local cluster described just above, node b is the source of the initial update (putting element "5" in awset). With time, the size of the record increases with more and more references to node b slowly making the process memory size increasing.
As explained, this is from the clean just cloned version of this repo with no modification at all (only adding the little print in lasp_ets_storage_backend).
After waiting a little bit, it goes like this (clearly showing some strange redundancy):
from lasp.
I want you to try something else.
Instead of:
lasp:update({<<"test">>, state_awset}, {add, "5"}, self()).
try:
lasp:update({<<"test">>, state_awset}, {add, "5"}, term_to_binary(node())).
This should help narrow the problem.
Related: you really shouldn't be using pid()
or self()
as part of CRDT updates without converting it to a list or binary first. Process identifiers are special memory references that appear different (and are manipulated by Distributed Erlang on the wire, causing nodes to be connected/disconnected) depending on what node you look at it from. For example: 0.156.0 on node A is going to appear as something like 1.156.0 on node B, for example (the leading zero indicates that a process is local and not remote, therefore process identifiers are relative -- w.r.t. the node -- references and not absolute.)
from lasp.
Oh. That surprises me because using self() in the CRDT update line was precisely what is written in the readme example I followed.
I will try with term_to_binary(node()) and check if it changes something.
from lasp.
Yes, well. The README hasn't been updated in like 4 years where the software has and we've learned a lot since then.
from lasp.
Wow. Process size is not increasing anymore and the Record content is much cleaner !
Well I got tricked by the README... spent hours not understand what was causing this, haha.
Thanks for the easy resolution, was not obvious to me.
from lasp.
I suppose using term_to_binary(node()) instead of self() in the README easily closes this :)
from lasp.
#312 addresses the README update.
from lasp.
#313 addresses prevention of non-iolist values as actors to ensure proper node independent serialization.
from lasp.
Related Issues (20)
- upgrade to latest gen_flow HOT 3
- enforce_once triggering more than expected HOT 3
- Can't use in an Elixir mix project HOT 12
- Lasp.stream/2 callbacks are not invoked when state_orset changes HOT 2
- Provide an option to trigger sync on update HOT 1
- Allow to force gossip to syncrhonize HOT 2
- Tree based dissemination mode is crashing. HOT 1
- peer service gossips to all members HOT 1
- Deltas don't support blocking sync option HOT 1
- Deltas don't support forced propagation option HOT 1
- Make it possible to undeclare variables HOT 13
- Syncing initial state or state after a crash where state is lost
- Fail to Compile: Getting log of git dependency failed in /dir/to/lasp-master/. Falling back to version 0.0.0 HOT 9
- Using the lasp-erlang-client? HOT 18
- Erlang 21 incompatible HOT 4
- Delayed lasp:stream function call
- Read with threshold and maximum blocking duration HOT 1
- Deadlock when calling lasp:query/1 HOT 2
- Website is down HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lasp.