Giter Club home page Giter Club logo

dslabs's Introduction

DO NOT DISTRIBUTE OR PUBLICLY POST SOLUTIONS TO THESE LABS. MAKE ALL FORKS OF THIS REPOSITORY WITH SOLUTION CODE PRIVATE.

Distributed Systems Labs and Framework

Ellis Michael
University of Washington

DSLabs is a new framework for creating, testing, model checking, visualizing, and debugging distributed systems lab assignments.

The best way to understand distributed systems is by implementing them. And as the old saying goes, "practice doesn't make perfect, perfect practice makes perfect." That is, it's one thing to write code which usually works; it's another thing entirely to write code which works in all cases. The latter endeavor is far more useful for understanding the complexities of the distributed programming model and specific distributed protocols.

Testing distributed systems, however, is notoriously difficult. One thing we found in previous iterations of the distributed systems class at UW is that many students would write implementations which passed all of our automated tests but nevertheless were incorrect, often in non-trivial ways. Some of these bugs would only manifest themselves in later assignments, while others would go entirely unnoticed by our tests. We were able to manually inspect students' submissions and uncover some of these errors, but this approach to grading does not scale and does not provide the immediate feedback of automated tests.

The DSLabs framework and labs are engineered around the goal of helping students understand and correctly implement distributed systems. The framework provides a suite of tools for creating automated tests, including model checking tests which systematically explore the state-space of students' implementations. These tests are much more likely to catch many common distributed systems bugs, especially bugs which rely on precise orderings of messages. Moreover, when a bug is found, these search-based tests output a trace which generates the error, making debugging dramatically simpler. Finally, DSLabs is integrated with a visual debugging tool, which allows students to graphically explore executions of their systems and visualize invariant-violating traces found by the model-checker.

Programming Model

The DSLabs framework is built around message-passing state machines (also known as I/O automata or distributed actors), which we call nodes. These basic units of a distributed system consist of a set of message and timer handlers; these handlers define how the node updates its internal state, sends messages, and sets timers in response to an incoming message or timer. These nodes are run in single-threaded event loops, which take messages from network and timers from the node's timer queue and call the node's handlers for those events.

This model of computation is typically the one we use when introduce distributed systems for the first time and the one we use when we want to reason about distributed protocols and prove their correctness. The philosophy behind this framework is that by creating for students a programming environment which mirrors the mathematical model distributed protocols are described in, we put them on the best footing to be able to reason about their own implementations.

Testing and Model Checking

The lab infrastructure has a suite of tools for creating automated test cases for distributed systems. These tools make it easy to express the scenarios the system should be tested against (e.g., varying client workloads, network conditions, failure patterns, etc.) and then run students' implementations on an emulated network (it is also possible to replace the emulated network interface with an interface to the actual network).

While executing certain scenarios is useful in uncovering bugs in students' implementations, it is difficult to test all possible scenarios that might occur. Moreover, once these tests uncover a problem, it is a challenge to discover its root cause. Because the DSLabs framework has its node-centric view of distributed computation, it enables a more thorough form of testing – model checking.

Model checking a distributed system is conceptually simple. First, the initial state of the system is configured. Then, we say that one state of the system, s₂, (consisting of the internal state of all nodes, the state of their timer queues, and the state of the network) is the successor of another state s₁ if it can be obtained from s₁ by delivering a single message or timer that is pending in s₁. A state might have multiple successor states. Model checking is the systematic exploration of this state graph, the simplest approach being breadth-first search. The DSLabs model-checker lets us define invariants that should be preserved (e.g. linearizability) and then search though all possible ordering of events to make sure those invariants are preserved in students' implementations. When an invariant violation is found, the model-checker can produce a minimal trace which leads to the invariant violation.

While model checking distributed systems is useful and has been used extensively in industry and academia to find bugs in distributed systems, exploration of the state graph is still a fundamentally hard problem – the size of the graph is typically exponential as a function of depth. To extend the usefulness of model checking even further, the test infrastructure lets us prune the portion of the state graph we explore for an individual test, guiding the search towards common problems while still exploring all possible executions in the remaining portion of the state space.

The DSLabs model is built to be usable by students and be as transparent as is practical. Students will be required to make certain accommodations for the model checker's sake, but we try to limit these and provide tools that help validate the model checker's assumptions and debug model checking performance issues. Moreover, the model checker itself is not designed with state-of-the-art performance as its only goal. Building a model checker that can test student implementations of runnable systems built in a general-purpose language such as Java requires striking a balance between usability and performance.

Visualization

This framework is integrated with a visual debugger. This tool allows students to interactively explore executions of the distributed systems they build. By exploring executions of their distributed system, students can very quickly test their own hypotheses about how their nodes should behave, helping them discover bugs in their protocols and gain a deeper understanding for the way their systems work. Additionally, the tool is used to visualize the invariant-violating traces produced by the model-checker.

Assignments

We currently have four individual assignments in this framework. In these projects, students incrementally build a distributed, fault-tolerant, sharded, transactional key/value store!

  • Lab 0 provides a simple ping protocol as an example.
  • Lab 1 has students implement an exactly-once RPC protocol on top of an asynchronous network. They re-use the pieces they build in lab 1 in later labs.
  • Lab 2 introduces students to fault-tolerance by having them implement a primary-backup protocol.
  • Lab 3 asks students to take the lessons learned in lab 2 and implement Paxos.
  • Lab 4 has students build a sharded key/value store out of multiple replica groups, each of which uses Paxos internally for replication. They finish by implementing a two-phase commit protocol to handle multi-key updates.

Parts of this sequence of assignments (especially labs 2 and 4) are adapted from the MIT 6.824 Labs. The finished product is a system whose core design is very similar to production storage systems like Google's Spanner.

We have used the DSLabs framework and assignments in distributed systems classes at the University of Washington.

Directory Overview

  • framework/src contains the interface students program against.
  • framework/tst contains in the testing infrastructure.
  • framework/tst-self contains the tests for the interface and testing infrastructure.
  • labs contains a subdirectory for each lab. The lab directories each have a src directory initialized with skeleton code where students write their implementations, as well as a tst directory containing the tests for that lab.
  • handout-files contains files to be directly copied into the student handout, including the main README and run-tests.py.
  • grading contains scripts created by previous TAs for the course to batch grade submissions.
  • www contains the DSLabs website which is built with Jekyll.

The master branch of this repository is not setup to be distributed to students as-is. The Makefile has targets to build the handout directory and handout.tar.gz, which contain a single JAR with the compiled framework, testing infrastructure, and all dependencies. The handout branch of this repository is an auto-built version of the handout.

Contributing

The main tools for development are the same as the students' dependencies — Java 17 and Python 3. You will also need a few utilities such as wget to build with the provided Makefile; MacOS users will need gtar and gcp provided by the coreutils Homebrew package, and gsed provided by its own Homebrew package.

IntelliJ files are provided and include a code style used by this project. In order to provide IntelliJ with all of the necessary libraries, you must run make dependencies once after cloning the repository and whenever you add to or modify the project's dependencies. You will also need the Lombok IntelliJ plugin.

This project uses google-java-format to format Java files. You should run make format before committing changes. If you want IntelliJ to apply the same formatting, you will need the google-java-format IntelliJ plugin, and you will need to apply the necessary post-install settings in IntelliJ.

If you add fields to any student-visible classes (all classes in the framework package as well as SearchState and related classes), you should take care to ensure that toString prints the correct information and that the classes are cloned correctly. See dslabs.framework.testing.utils.Cloning for more details. Also see Lombok's @ToString annotation for more information about customizing its behavior. In particular, note that transient and static fields are ignored by default by all cloning, serialization, and toString methods.

Acknowledgements

The framework and labs have been improved thanks to valuable contributions from:

  • Alex Saveau
  • Andrew Wei
  • Arman Mohammed
  • Doug Woos
  • Guangda Sun
  • James Wilcox
  • John Depaszthory
  • Kaelin Laundry
  • Logan Gnanapragasam
  • Nick Anderson
  • Paul Yau
  • Sarang Joshi
  • Thomas Anderson

The lab assignments, especially labs 2 and 4, were adapted with permission from the MIT 6.824 labs developed by Robert Morris and colleagues.

Contact

Bug reports and feature requests should be submitted using the GitHub issues tool. Email Ellis Michael ([email protected]) with any other questions.

If you use these labs in a course you teach, I'd love to hear from you!

dslabs's People

Contributors

armanbm177 avatar corneliusosei2 avatar dfpetrin avatar emichael avatar gnanabite avatar johndepaszthory avatar nowei avatar sgdxbc avatar supercilex avatar tonystew avatar wasabifan avatar wilcoxjay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dslabs's Issues

Gradle build fails

Describe the bug
Opening the project in IntelliJ Idea 2023.1 and running the gradle build fails with the below error

A problem occurred evaluating root project 'dslabs'.
Could not set unknown property 'classifier' for task ':scaffoldingSourcesJar' of type com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar.

Screenshots
image

Environment

  • OS: Windows 11
  • JDK (output of java --version): OpenJDK 14
  • DSLabs githash: e708af3

Use MacOS Menu Bar

On MacOS, we should use the system menu bar in the new viz tool instead of creating a menu bar in swing and docking it at the top of the window.

Use failAndContinue liberally throughout tests

Most of the tests (with the exception of search tests with searches for goal states) exit when they fail. It would be better if we ran tests through to completion by reporting failures with the failAndContinue method that was recently added. (Of course, some tests must fail because a precondition for continuing wasn't met.)

Additional requirements:

  • Cleanup BaseJUnitTest#failedSearchTest. It's no longer needed if failAndContinue exists.
  • Failures should be printed as they happen so that the errors are in their proper context. They shouldn't be printed twice though. This might have to be done with a static IdentityHashSet that tracks which exceptions have been printed. This isn't the current behavior of failAndContinue.
  • Users should be able to disable this behavior and get all tests to fail fast with a flag to run_tests.py.

Confusion about the network of dslabs

I read Network.java, RunSettings.java, RunState.java and TestSetting.java. But I did not see the code about delays, duplications, and reorderings for messages. Is the network in dslabs not completely asynchronous?

Add a demo of @Log to Lab 0

It would be nice to introduce students to proper logging infrastructure using @Log. I think the best place for this would be a short writeup as part of Lab 0, that we probably wouldn't cover explicitly in week 1 (because student heads are already full) but we can point back to later while students are working on Lab 2 ish. Generally, student log messages should be at level FINE or above.

The writeup should also include a description of why logging is better than println. The main reason being that they are off by default (when properly leveled), and in particular they are off when run in gradescope. (Every quarter we have a handful of students submit solutions that call println in every event, generating gigabytes of log data. Our gradescope script parses the log, and it will choke if the log does not fit in memory (which on Gradescope is a couple of gigs at most).)

Don't count invariant computation time against clients' wait times

Tests like this one

public void test16SinglePartition() throws InterruptedException {
final int nClients = 5, nServers = 5;
setupStates(nServers);
runSettings.addInvariant(RESULTS_OK);
runState.start(runSettings);
// Startup the clients
for (int i = 1; i <= nClients; i++) {
runState.addClientWorker(client(i), differentKeysInfiniteWorkload,
false);
}
Thread.sleep(5000);
assertRunInvariantsHold();
// Partition off some servers and the clients
List<Address> partition =
Lists.newArrayList(server(1), server(2), server(3));
for (int i = 1; i <= nClients; i++) {
partition.add(client(i));
}
runSettings.partition(partition);
Thread.sleep(1000);
assertRunInvariantsHold();
// Heal the partition
runSettings.reconnect();
Thread.sleep(5000);
// Shut the clients down
runState.stop();
runSettings.addInvariant(LOGS_CONSISTENT);
assertRunInvariantsHold(); // report invariant errors first
assertMaxWaitTimeLessThan(3000);
}

evaluate invariants first before calculating each client's maximum wait time. There will always be an outstanding request at the end of the run, so the time it takes to actually compute the invariants will be added to the latency of that last request.

We still want invariant violations to take precedence over maxWaitTime violations, though. That is, if a test run violates an invariant and has a client max wait time greater than the limit, the invariant violation should always be reported. We need to decide on some mechanism for doing that and then fix all of the tests with this pattern.

Add Checker Framework nullness checker

  1. Add a GitHub action check that runs the nullness checker
  2. Standardize on a set of null/notnull annotations to use, ban all uses of others with an automatic check

Make problems while installing dslabs on M1 ARM processors.

While installing the dslabs framework on my system(M1 Pro Macbook), I came across a few problems.

  1. wget is not preinstalled on MacBooks and this creates an error where the system fails to make the build files due to the non-presence of wget.
  2. gcp -> This toolkit is a fancy version of cp but at the same time is more complicated than cp. While running the gcp command, the command was unable to find a directory. The change I made was to remove gcp and replace it with cp in the MakeFile. That solved the problem and I successfully built the framework to work.

I think this should be added to the wiki since most of the students are operating ARM machines. I don't think this is a ARM specific error but maybe gcp misbehaves while cherry-picking files and transferring them.
Screen Shot 2022-01-14 at 12 49 59 PM

Feature Request: add detail messages for more/all invariant predicates

Currently, some invariant predicates do not have detail messages. The most prominent one is the APPENDS_LINEARIZABLE predicate. It could be useful to explain why the invariant was violated (e.g., for APPENDS_LINEARIZABLE, we could write something like "AppendResult(y) is not a valid result for Append(x)", or "AppendResult(y) for client 2 is inconsistent with AppendResult(x) for client 1"). The other invariant I see is MULTI_GETS_MATCH.

These omissions might be intentional (e.g., maybe the goal is to get students to identify what's wrong in the sequence), so I'm not opening a pull request yet.

Lab 1: clarify what should be done when handling "old requests"

In this section, it is unclear what is the desired behavior for old requests. I was stuck on this for a bit and figured it out by experimenting with different behaviors until the tests passed.

You will also need to deal with what happens when the server receives "old" Requests.

I'd write up suggestions or a PR for how it could be improved but it's possible that the ambiguity is intended to be a part of the challenge.

Print formatted results to file

All of the current grading infrastructure is based on the output from stderr/stdout. This is suboptimal for a couple reasons. First, it means that the output must be parsed by grading scripts, which is error prone and subject to potential mischief. Second, stderr/stdout necessarily contain all of students' logging statements. Asking students to disable all logging before submitting has historically not had a high success rate, and sufficiently verbose logging can result in very large outputs of test runs.

There should be a JUnit test listener which, when a global flag is set, logs results as they happen and at the end of a test run outputs them to a file in a structured format (probably JSON). This output should optionally contain a copy of stdout/stderr, which can be obtained with the TeeStdOutErr utility. I'm not sure what the default should be.

Both of these configuration options should probably be accessible through run_tests.py. Students might want to use this test output themselves.

Ultimately, some sort of schema for the test output would be really useful. It would likely evolve over time, but at least we would have something that grading scripts (in this repo and developed by other instructors for their use cases) could reference.

Visual debugger display too small on Arch with Xmonad WM

Describe the bug
When running the visual debugger using either ./run-tests.py .. --debug ... or ./run-tests.py .. --start-viz ... the display is very small and cannot be resized.

Screenshots
image

Environment

  • OS: Arch Linux
  • Window manager: XMonad
  • JDK (output of java --version): openjdk 17.0.6 2023-01-17

Error running make: `gcp` is not recognized on zsh shell

Describe the bug
The file copier tool gcp is not recognized on zsh shell on MacOS 13.2.1 (the problem may be more general, but at least covers this extent). For Darwin OS, the preferred file copier ( $(CP) )on dslabs Makefile is gcp instead of cp. However, make clean all fails when gcp is attempted to be used (in line: $(CP) -r labs handout-files/. $(OTHER_FILES) $@). On changing gcp to cp the problem is fixed. Of course, changing to cp might break other cases for which you probably had gcp in the first place, but there should at least be a line to check in the Makefile whether the current shell recognizes gcp or not. Users should be able to run make out of the box once they have the required/recommended tools set up (Python3, Make, Java 14, IntelliJ)

Screenshots
gcp-command-not-found
macOS-version-shell-info

Environment

  • OS: MacOS Ventura 13.2.1
  • JDK (output of java --version):
    java 17.0.4.1 2022-08-18 LTS
    Java(TM) SE Runtime Environment (build 17.0.4.1+1-LTS-2)
    Java HotSpot(TM) 64-Bit Server VM (build 17.0.4.1+1-LTS-2, mixed mode, sharing)
  • DSLabs githash: 35df0e9 (master branch)

Add search option to not exit after goal found

Throughout the search tests now, we have the pattern:

searchSettings.maxTimeSecs(30);
searchSettings.addGoal(CLIENTS_DONE);
searchSettings.addInvariant(RESULTS_OK);
bfs(initSearchState);
assertGoalFound();

searchSettings.clearGoals();
bfs(initSearchState);

If the first bfs is not instant, this wastes significant time. It would be nice to have searchSettings.stopOnGoal() and instruct the search to continue even if it hits a goal-matching state (while still logging the goal-matching state). This will require some re-architecture of SearchResults and BaseJUnitTest as well as Search.java. Additionally, trace minimization should probably be done on the main thread instead of worker threads.

Switch to Google Java formatter

The weirdnesses in IntelliJ's formatter keep cropping up, and there are changes with major version releases. We should standardize on the Google Java formatter. This means:

  1. Enforcing formatting with a GitHub action check.
  2. Removing the formatting files from the maintainer's and student's IntelliJ settings dirs.
  3. Creating a make format or similar target that applies for formatter.
  4. Setting up Google Java formatting in IntelliJ and adding the necessary settings to the maintainer's settings dir.

Show delivered messages when opened viz is started with a trace

Right now, "View delivered messages" is disabled by default on startup. If the new viz tool is started with a trace that delivers duplicate messages, then when you're stepping through the trace, it looks like those messages come out of nowhere. There are two options here; both have merit.

  1. Detect if the trace a DebuggerWindow is opened with uses duplicate message. If so, enable "View delivered messages" on startup.
  2. Always display the "next" event in the linear history displayed by the events panel, even if the event is a duplicate message and "View delivered messages" is disabled by default.

Visual Debugger

Hello, how long should it take until the list of servers appears and is ready to start the visual debugger? It stays pending for me. I ran python3 run-tests.py --lab 0 --debug 1 1 GET:foo and python3 run-tests.py --lab 1 --debug 1 1 GET:foo. Are we supposed to implement anything first? I have not modified the code yet.

Environment:
Python: 3.7.3
Java:
openjdk 14 2020-03-17
OpenJDK Runtime Environment (build 14+36-1461)
OpenJDK 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)
OS: Debian GNU/Linux 10 (buster)
browser: Google Chrome 87.0.4280.88

image

Search Tests Hanging without Logging

Hi
I am a graduate student at Georgia Tech and we are using these labs as part of our Distributed Computing course. I often see the search tests getting hung up. But when I run them with FINEST logging, they terminate. The earlier hanging up is probably due to some thread getting stuck.
An example being as follows:

`vpb@vpb-Inspiron-7560:~/gatech/sem2/cs7210/assignments/7210-assignments$ ./run-tests.py --lab 3 --test 21

TEST 21: Single client, no progress in minority [SEARCH] (15pts)

Starting breadth-first search...
Explored: 0, Depth: 0 (0.01s, 0.00K states/s)
Explored: 24085, Depth: 7 (5.01s, 4.81K states/s)
Explored: 59821, Depth: 8 (10.01s, 5.98K states/s)
Explored: 89775, Depth: 9 (15.05s, 5.96K states/s)
Explored: 109218, Depth: 9 (20.74s, 5.27K states/s)
Explored: 147122, Depth: 9 (25.85s, 5.69K states/s)
Explored: 158166, Depth: 9 (30.00s, 5.27K states/s)
Search finished.

Starting breadth-first search...
Explored: 1, Depth: 0 (0.00s, 1.00K states/s)
Explored: 37368, Depth: 173 (6.74s, 5.54K states/s)
Explored: 82224, Depth: 256 (17.14s, 4.80K states/s)
Explored: 103777, Depth: 288 (28.56s, 3.63K states/s)
Explored: 113273, Depth: 301 (30.00s, 3.78K states/s)
Search finished.

...PASS (60.32s)

Tests passed: 1/1
Points: 15/15 (100.00%)
Total time: 60.331s

ALL PASS

vpb@vpb-Inspiron-7560:~/gatech/sem2/cs7210/assignments/7210-assignments$ ./run-tests.py --lab 3 --test 21

TEST 21: Single client, no progress in minority [SEARCH] (15pts)

Starting breadth-first search...
Explored: 0, Depth: 0 (0.01s, 0.00K states/s)
Explored: 21888, Depth: 7 (5.01s, 4.37K states/s)
Explored: 25142, Depth: 7 (10.01s, 2.51K states/s)
Explored: 25142, Depth: 7 (15.01s, 1.68K states/s)
Explored: 25142, Depth: 7 (20.01s, 1.26K states/s)
Explored: 25142, Depth: 7 (25.01s, 1.01K states/s)
^CTraceback (most recent call last):
File "./run-tests.py", line 183, in
main()
File "./run-tests.py", line 179, in main
assertions=args.assertions)
File "./run-tests.py", line 90, in run_tests
subprocess.call(command)
File "/usr/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 1099, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.7/subprocess.py", line 125, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
`
But when I run these with logging enabled, they consistently pass. Is there any framework issue related to this?

Don't make test numbers dependent on filters

Currently, if you filter out any tests (e.g., by choosing a part of the lab or by only running search tests etc.), all of the tests are renumbered. This is because test numbers are assigned by JUnit after all of the filtering takes place. It would be much nicer to have a consistent numbering scheme. Test numbers should probably have "fully qualified" names, assigned by annotation. For example, the "number" of test 4 in part 2 would be "2.4" (a String rather than an int).

Then, from ./run-tests.py if --part is specified, individual tests can be selected without referring to the fully qualified name (i.e., you can specify --lab 1 --part 2 -n 4 or --lab 1 -n 2.4 but not lab 1 -n 4).

A complication is labs with only one part. Their test numbers shouldn't be 1.1, 1.2 etc. but just 1, 2. And referring to tests this way in ./run-tests.py should be valid.

It would also be nice to sort tests based on the numbering annotation rather than method name. Test numbers could then be removed from method names. I'm not sure if this is possible with the version of JUnit we're using.

Once question is whether the annotation should have the fully qualified name, or a simple integer and pick up the part number from the test class. There are arguments for both approaches, but having the fully qualified name (e.g., @TestNumber("2.4")) is probably best because it allows students to easily look at the method and know how to run it.

Lastly, it would be great if we could validate in an automated test that test numbers are sequential, there are no duplicates, every test has a number, etc. This should go in the tst-self directory.
https://github.com/emichael/dslabs/tree/master/framework/tst-self/

Possible deadlock in the framework with the search test

When running some students' solutions with the latest dslab framework, I noticed that the search test would get stuck. e.g. A 30 seconds timeout test would run forever. Using jstack to look into the threads, I found a deadlock between the search threads. It seems they are all blocked at Search.java:499 discovered.add(successor.wrapped()).

The following is printed by jstack

Found one Java-level deadlock:

"Thread-12":
waiting to lock monitor 0x00007fa138008f00 (object 0x0000000780b1c3f8, a java.util.Hashtable),
which is held by "Thread-17"

"Thread-17":
waiting to lock monitor 0x00007fa13800f400 (object 0x0000000780b2a3b0, a java.util.Hashtable),
which is held by "Thread-16"

"Thread-16":
waiting to lock monitor 0x00007fa138009400 (object 0x0000000781588248, a java.util.Hashtable),
which is held by "Thread-14"

"Thread-14":
waiting to lock monitor 0x00007fa13800f400 (object 0x0000000780b2a3b0, a java.util.Hashtable),
which is held by "Thread-16"

Java stack information for the threads listed above:

"Thread-12":
at java.util.Hashtable.hashCode([email protected]/Hashtable.java:864)
- waiting to lock <0x0000000780b1c3f8> (a java.util.Hashtable)
at dslabs.primarybackup.PBServer.hashCode(PBServer.java:19)
at java.util.Objects.hashCode([email protected]/Objects.java:117)
at java.util.HashMap$Node.hashCode([email protected]/HashMap.java:298)
at java.util.AbstractMap.hashCode([email protected]/AbstractMap.java:527)
at dslabs.framework.testing.AbstractState.hashCode(AbstractState.java:49)
at dslabs.framework.testing.search.SearchState.hashCode(SearchState.java:68)
at dslabs.framework.testing.search.SearchState$SearchEquivalenceWrappedSearchState.hashCode(SearchState.java:623)
at java.util.concurrent.ConcurrentHashMap.putVal([email protected]/ConcurrentHashMap.java:1012)
at java.util.concurrent.ConcurrentHashMap.put([email protected]/ConcurrentHashMap.java:1006)
at java.util.Collections$SetFromMap.add([email protected]/Collections.java:5654)
at dslabs.framework.testing.search.BFS.exploreNode(Search.java:499)
at dslabs.framework.testing.search.BFS.lambda$getWorker$0(Search.java:479)
at dslabs.framework.testing.search.BFS$$Lambda$133/0x0000000800c07440.run(Unknown Source)
at dslabs.framework.testing.search.Search.lambda$run$0(Search.java:275)
at dslabs.framework.testing.search.Search$$Lambda$132/0x0000000800c07040.run(Unknown Source)
at java.lang.Thread.run([email protected]/Thread.java:832)
"Thread-17":
at java.util.Hashtable.size([email protected]/Hashtable.java:248)
- waiting to lock <0x0000000780b2a3b0> (a java.util.Hashtable)
at java.util.Hashtable.equals([email protected]/Hashtable.java:822)
- locked <0x0000000780b1c3f8> (a java.util.Hashtable)
at dslabs.primarybackup.PBServer.equals(PBServer.java:19)
at java.util.AbstractMap.equals([email protected]/AbstractMap.java:493)
at dslabs.framework.testing.AbstractState.equals(AbstractState.java:49)
at dslabs.framework.testing.search.SearchState.equals(SearchState.java:68)
at java.util.Objects.equals([email protected]/Objects.java:78)
at dslabs.framework.testing.search.SearchState$SearchEquivalenceWrappedSearchState.equals(SearchState.java:606)
at java.util.concurrent.ConcurrentHashMap.putVal([email protected]/ConcurrentHashMap.java:1039)
- locked <0x00000007814a8c08> (a java.util.concurrent.ConcurrentHashMap$Node)
at java.util.concurrent.ConcurrentHashMap.put([email protected]/ConcurrentHashMap.java:1006)
at java.util.Collections$SetFromMap.add([email protected]/Collections.java:5654)
at dslabs.framework.testing.search.BFS.exploreNode(Search.java:499)
at dslabs.framework.testing.search.BFS.lambda$getWorker$0(Search.java:479)
at dslabs.framework.testing.search.BFS$$Lambda$133/0x0000000800c07440.run(Unknown Source)
at dslabs.framework.testing.search.Search.lambda$run$0(Search.java:275)
at dslabs.framework.testing.search.Search$$Lambda$132/0x0000000800c07040.run(Unknown Source)
at java.lang.Thread.run([email protected]/Thread.java:832)
"Thread-16":
at java.util.Hashtable.size([email protected]/Hashtable.java:248)
- waiting to lock <0x0000000781588248> (a java.util.Hashtable)
at java.util.Hashtable.equals([email protected]/Hashtable.java:822)
- locked <0x0000000780b2a3b0> (a java.util.Hashtable)
at dslabs.primarybackup.PBServer.equals(PBServer.java:19)
at java.util.AbstractMap.equals([email protected]/AbstractMap.java:493)
at dslabs.framework.testing.AbstractState.equals(AbstractState.java:49)
at dslabs.framework.testing.search.SearchState.equals(SearchState.java:68)
at java.util.Objects.equals([email protected]/Objects.java:78)
at dslabs.framework.testing.search.SearchState$SearchEquivalenceWrappedSearchState.equals(SearchState.java:606)
at java.util.concurrent.ConcurrentHashMap.putVal([email protected]/ConcurrentHashMap.java:1039)
- locked <0x0000000781565b00> (a java.util.concurrent.ConcurrentHashMap$Node)
at java.util.concurrent.ConcurrentHashMap.put([email protected]/ConcurrentHashMap.java:1006)
at java.util.Collections$SetFromMap.add([email protected]/Collections.java:5654)
at dslabs.framework.testing.search.BFS.exploreNode(Search.java:499)
at dslabs.framework.testing.search.BFS.lambda$getWorker$0(Search.java:479)
at dslabs.framework.testing.search.BFS$$Lambda$133/0x0000000800c07440.run(Unknown Source)
at dslabs.framework.testing.search.Search.lambda$run$0(Search.java:275)
at dslabs.framework.testing.search.Search$$Lambda$132/0x0000000800c07040.run(Unknown Source)
at java.lang.Thread.run([email protected]/Thread.java:832)
"Thread-14":
at java.util.Hashtable.size([email protected]/Hashtable.java:248)
- waiting to lock <0x0000000780b2a3b0> (a java.util.Hashtable)
at java.util.Hashtable.equals([email protected]/Hashtable.java:822)
- locked <0x0000000781588248> (a java.util.Hashtable)
at dslabs.primarybackup.PBServer.equals(PBServer.java:19)
at java.util.AbstractMap.equals([email protected]/AbstractMap.java:493)
at dslabs.framework.testing.AbstractState.equals(AbstractState.java:49)
at dslabs.framework.testing.search.SearchState.equals(SearchState.java:68)
at java.util.Objects.equals([email protected]/Objects.java:78)
at dslabs.framework.testing.search.SearchState$SearchEquivalenceWrappedSearchState.equals(SearchState.java:606)
at java.util.concurrent.ConcurrentHashMap.putVal([email protected]/ConcurrentHashMap.java:1039)
- locked <0x0000000780f7f258> (a java.util.concurrent.ConcurrentHashMap$Node)
at java.util.concurrent.ConcurrentHashMap.put([email protected]/ConcurrentHashMap.java:1006)
at java.util.Collections$SetFromMap.add([email protected]/Collections.java:5654)
at dslabs.framework.testing.search.BFS.exploreNode(Search.java:499)
at dslabs.framework.testing.search.BFS.lambda$getWorker$0(Search.java:479)
at dslabs.framework.testing.search.BFS$$Lambda$133/0x0000000800c07440.run(Unknown Source)
at dslabs.framework.testing.search.Search.lambda$run$0(Search.java:275)
at dslabs.framework.testing.search.Search$$Lambda$132/0x0000000800c07040.run(Unknown Source)
at java.lang.Thread.run([email protected]/Thread.java:832)

Found 1 deadlock.

Debugibility

Hi, I am currently a graduate student in Georgia Tech. We are using your system for our distributed computing course this semester. I have finish lab1 - lab3 currently. When I was doing labs, I found it can vary hard to debug about the search tests, especially for the BFS search. I have found a way to log the event sequence for the finished state between several BFS. And I used it by manually input in the vis debug. After I finish lab 4, I might try to add the log function that students can input the event sequences (copy and paste a series of events for BFS state) in the vis debug. Can you give me some suggestions about where to start? Thank you.

Allow for multiple rows of node states, reordering nodes

Right now, all nodes are laid out in a single row. On very wide screens, this is fine. On smaller screens, trying to view more than 4 nodes at once is difficult. Nodes usually don't need the entire vertical height of the screen. There is quite a lot of white space in the node state box, and scrolling through the message inbox/timer queue above a node is preferable to scrolling horizontally when space is tight.

It would be nice to be able to layout SingleNodePanels in multiple rows. I attempted a quick and dirty version where there were either one or two rows, and when there were two rows, they were separated by a JSplitPane. This ran into some bizarre issues. I think the "right" way is to use multiple JXMultiSplitPanes, one in vertical mode to hold the rows, and others in horizontal mode to hold each row, and use this trick to display only one row.

// XXX: Disgusting hack to show only 1 leaf node
if (numShown == 1) {
Leaf l = new Leaf("DUMMY-LEAF-1");
l.setWeight(0.0);
layoutNodes.add(new Divider());
layoutNodes.add(l);
}
split.setChildren(layoutNodes);
splitPane.setModel(split);
for (Address a : addresses) {
if (!nodesActive.get(a).isSelected()) {
continue;
}
splitPane.add(statePanels.get(a), a.toString());
}
if (numShown == 1) {
splitPane.add(new JPanel(), "DUMMY-LEAF-1");
layout.displayNode("DUMMY-LEAF-1", false);
}

The other piece that would be nice is a way for users to reorder nodes. This could be as simple as replacing the "Show/hide nodes" panel in the sidebar with a reorderable list of checkboxes. This tutorial might be helpful: http://www.java2s.com/Tutorial/Java/0240__Swing/Usedraganddroptoreorderalist.htm The best version of that feature would be the ability to drag and drop SingleNodePanels by dragging the node name, but that seems very difficult.

Don't replace timers on update, highlight new timers

The new viz tool currently replaces all timers on any state change. It also does not highlight newly added timers like it does with messages. Part of the issue is that timers are trickier to deal with since queues are semi-ordered and can have duplicates.

See:

/*
TODO: do the same thing for timers...
This is tricky, though. Messages are unique because of the network
model. But for timers, there might be multiple copies of the same
one in the queue. We need to diff the lists intelligently to make
sure we only pickup the newly added timers. We know the new ones are
at the back of the list, but if the most recent event was a timer
delivery, that makes things nasty.
Just updating the list with the new one is easy, but giving timers
the correct TreeDisplayType is hard.
*/

Lab 1 part 2 question

Why is it necessary to use a sequence number in messages? since I don't maintain state on the server and a request can be executed multiple times.

Or just use it to control what is the response expected by the client in handleReply() and onClientTimer().

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.