elem-azar-unis / crdt-redis Goto Github PK
View Code? Open in Web Editor NEWCRDTs implemented in Redis
License: BSD 3-Clause "New" or "Revised" License
CRDTs implemented in Redis
License: BSD 3-Clause "New" or "Revised" License
This bug is found using our tool MET. Yet we found that both Remove-win List and RWF List have this issue. The script for MET testing RWF List to trigger this bug is:
2
1 1 Add 0 1 10 ; 2 1 Add 2 2 10 ; 1 2 effect ; 3 2 Rmv 1 ; 4 2 Add 1 2 10 ; 3 1 effect
1 -1 0 null 0,1
The execution trace and the server metadata outputted by our MET testing are:
[op_table]:
- rwflinsert list1 null 1 e 10 0 0 0 ; rwflinsert list1 null 1 e 10 0 0 0 1,0,0 0,0|0
- rwflinsert list1 2 2 e 10 0 0 0 ;
- rwflrem list1 1 ; rwflrem list1 1 0,1|1
- rwflinsert list1 1 2 e 10 0 0 0 ; rwflinsert list1 1 2 e 10 0 0 0 2,1,0 0,0|1
[steps]:
- FULL 0 0
- FULL 1 0
- EFFECT 0 1
- FULL 2 1
- FULL 3 1
- EFFECT 2 0
[script state] -- 1 -2 0 null [0,1] ;
[server state] -- 1 -1 10 null [0,1] ;
1 -1 10 null [0,1] ; 2 1 10 null [] ;
The CRDT Lists allow the client to add a new element after the element that has been removed from the list. This is confusing and should not happen.
Server 2 accepted the operation of adding element 2 after element 1, while element 1 has been removed before. This should not happen. Such add requests from clients should have been rejected.
The bug is fixed in the commit a4f7d68. The check for such cases is added.
This bug is found using our tool MET. The script for MET testing RWF List to trigger this bug is:
2
1 1 Add 0 1 10 ; 2 1 Rmv 1 ; 2 2 effect ; 1 2 effect
1 -1 0 null 0,1
Add and remove the element on server 1. The remove is then synchronized to server 2 before the add operation. This causes server 2 not to synchronize the basic information of the element, such as the content of the element, from the add operation that gets synchronized later.
This is fixed in the commit b0601f3. We let the basic info of elements always get initialized while synchronizing add operations.
This bug is found using our tool of model-based exhaustive testing (MET). The script for MET testing RWF RPQ to trigger this bug is:
2
1 1 Add 10 ; 2 1 Add 10 ; 3 1 Inc -3 ; 3 2 effect ; 4 2 Add 20 ; 1 2 effect ;
1 10 -3 n ; 1 10 -3 n ;
We find that the elements in the list may have an order that is not consistent with the order that they are designated when inserting them.
The reason for the bug is that we didn't perfectly implement the "existence" semantics according to the RWF algorithm. It affects both the RWF RPQ and RWF List. Then the existence of the elements becomes odd under certain execution traces, like the trace found by the MET tool.
There is a chance that the element should have not existed at some point, but it accepted some client requests (it shouldn't). This causes the semantics of the CRDT to become undefined.
An example here.
Client_0 first adds element "a" and then changes the value of "a" (say, it changes the font of character "a" in a string in a collaborative editing scenario). Then client_1 queries the value of the string and obtains string "[a]". After reading the value of the string, client_1 inserts "b" after "a" into the string. However, in our buggy implementation, client_0 may read string "[b,a]".
This bug can be triggered by the following adversarial pattern of events. Server_0 first receives the operation "changing the font of a" (without adding a first). Then server_0 will receive the operation "adding b after a". At the design level, this "adding b after a" operation is quite safe, since it is protected by the precondition that "before adding b after a, a must exist". The design does not explicitly state what to do if the precondition does not hold. The precondition implicitly states that, if the element does not exist, nothing should be done. However, in our implementation, an illegal position is assigned to b, and then to a. In this way, client_0 gets "[b,a]".
The bug is fixed in the commit b1adb29.
We implemented the more accurate "existence" for elements in the CRDTs using RWF.
The bug is found by our manual MET approach:
The Rmv-Win List and Rmv-Win RPQ get stuck sometimes. They will not synchronize with other servers after executing certain synchronizations.
The trace to trigger the bug is shown here:
Server2 does not synchronize b and c correctly when it receives a.Then when server2 later receives d, it gets stuck since the required delivery of b and c are not available.
The bug lies in the part of our implementation of causal delivery for the remove-win CRDTs. The part where we should notify all waiting messages but we notified one.
The bug is fixed in 2b1eb9b. Change notify one to notify all.
This is found during our process of model checking the Remove-Win CRPQ.
With the help of the traces of model checking, we get the trace that trigger the violation:
The server0 here will record the old add operation of server1, together with the newest add operation initiated by client0. The two add operations are not concurrent. This violates the correctness property that the recorded operations should be concurrent. The old add record should have been removed.
We add the logic for the Remove-Win CRPQ that the add operations should clear the old add record. The bug is fixed.
The specification was fixed before it was implemented into the CRDT-Redis system.
This bug is found using our tool MET. Yet we found that both Remove-win List and RWF List have this issue. The script for MET testing RWF List to trigger this bug is:
1
1 1 Add 0 1 10 ; 2 1 Add 1 3 10 ; 3 1 Add 1 4 10 ; 4 1 Add 1 5 10
1 1 10 null 0 ; 5 1 10 null 0 ; 4 1 10 null 0 ; 3 1 10 null 0
For example, sometimes I add an element A after B, the list tells me it becomes "AB", while it should be "BA".
The order of elements is not as expected. The reason for the bug lies in our implementation of the position ID for the elements in the list. The CRDT list algorithms require a position ID generator that generate the IDs that have the following property:
The algorithms don't specify how to implement the generators. Our implementation of the ID generator works fine for most of the cases, yet sometimes fails property 3.
The bug is fixed in the commit cf44f6f. We fixed the bug in the position ID generator.
This is found during our process of model-based testing of RWF List. And it further affects the Remove-win List.
With the help of the traces of model checking, we analyzed our CRDT List designs and refined the readd specifications as follows.
Note that the insert operation should always insert a unique new element
with a designated position (i.e. *prev* element), or a previously added element
with "readd" being its *prev*. Otherwise, it is undefined behavior.
This is reasonable that for collaborative text editing, the newly inserted element (characters, words, ...)
should always be unique. And the "readd" semantics, which may be used for undo/redo scenarios,
means that you have seen the element. The element has been added at least from a global perspective.
The specification is fixed in the commit f6e7687.
The implementation is changed accordingly in the commit 99220aa.
The steps of the random testing that triggers this bug:
There are some circumstances where the servers won't converge to the same state. The time of triggering the bug is about 2 hours on average.
We first locate the root cause of the bug.
Through in-depth investigation, it was found that this bug was triggered by the readd operation. The readd operation is the reverse of the remove operation. It is meant to undo the remove or redo the add of some element in the list.
A minimized example of this bug is shown here.
According to our design of the Rwf-List, the location of list element ๐ is decided when it is first added on ๐ ๐๐๐ฃ๐๐0. If we remove and then re-add ๐ (such as undoing the removal of words in collaborative editing), its position will not change. The client may send the ๐๐-๐๐๐(๐) operation to ๐ ๐๐๐ฃ๐๐1, which has not yet gotten the previous ๐๐๐(๐) operation from ๐ ๐๐๐ฃ๐๐0. ๐๐๐๐ฃ๐๐1 should reject this ๐๐-๐๐๐ operation or buffer this operation for processing later. But in our buggy design, ๐ ๐๐๐ฃ๐๐1 accepts this request and assigns an illegal position to ๐. When ๐ ๐๐๐ฃ๐๐1 receives the delayed ๐๐๐(๐) operation, it will ignore this operation, which causes the divergence among replicas.
This is fixed in the commit 99e2fe6 by adding a check for this condition. The invalid readd is rejected now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.