mixu / distsysbook Goto Github PK
View Code? Open in Web Editor NEWThe book Distributed systems: for fun and profit
Home Page: http://book.mixu.net/distsys/
The book Distributed systems: for fun and profit
Home Page: http://book.mixu.net/distsys/
It seems to me that, both naturally and from the HN post, that most people are going to be looking for practical decisionmaking help in understanding the available open source components and architectural paradigms in building real world distributed systems.
Right now the intro reads In this text I've tried to provide a more accessible introduction to distributed systems. To me, that means two things: introducing the key concepts that you will need in order to have a good time reading more serious texts, and providing a narrative that covers things in enough detail that you get a gist of what's going on without getting stuck on details. It's 2013, you've got the Internet, and you can selectively read more about the topics you find most interesting.
While that sounds fun it could be a little more explicit about the angle taken within the coverage. Didn't suggest a push to the intro since it's your book after all and I am interested to know what your thoughts are.
I think the following phrase sounds strange:
"Other consistency models expose some internals of the replication are visible to the programmer"
I think you just need to get rid of the 'are visible', as it's implied with 'expose'?
Currently the reference for ZooKeeper points to http://research.yahoo.com/pub/3280, which no longer hosts the paper. How about this reference?: http://www.usenix.org/event/usenix10/tech/full_papers/Hunt.pdf
(If I was sure of the appropriate replacement URL I'd just issue a PR.)
In "The FLP impossibility result", you use the terms "liveness" and "safety", but they are not defined at that point. I think you mean "termination" and "agreement" from the previous section, but I'm not entirely sure.
The intersection of distributed systems and devops style processes are significant and interesting. Things like continuous integration / continuous deployment, test infrastructure, infrastructure failure modelling and platform / service segregation are areas that seem to be under significant development within industry. I am sure that it would be very interesting to many parties if the book could discuss these topics to bring the formal theoretical knowledge through to its relevant to actual in-industry application. I would also be keen to write some of this content.
I love the part (chapter 3) where you talk about different types of weak consistencies. It will be awesome,if you could describe that in context with the current NoSQL vendors out there.
There is no license included in the tree, as far as I can tell.
How about using the GNU Free Documentation License (FDL)?
https://github.com/mixu/distsysbook/blame/master/input/0_index.md#L14
Reads:
This text is focused on distributed programming and systems concepts you'll need to understand commercial systems in the data center. [...]
I'm not really sure about the meaning of this sentence. Could it be wrong?
Under: "Communication links in our system model"
"Many books which discuss distributed algorithms assume is that there.."
Remove the 'is'. I think the 'which' should also be 'that' as it is restrictive.
Thanks for writing and publishing this!
I've been reading the version at http://book.mixu.net/distsys/ebook.htm, and I've noticed a couple issues that I'm not sure how to fix:
It seems like a lot of the areas the other issues I've opened to highlight missing content are more focused on real world implementation.
Perhaps it would be useful to split the overall structure of the text in to two sections, a concepts/theory section containing much of the current content, and an implementation/practice section containing more of the sort of information / current-state-of-the-(open source)-art stuff I'm hinting at.
What do you think?
Open source cluster systems such as Pacemaker / Corosync should be covered. They provide go-to implementations of some of the concepts already explained within the text and are a good way for people to get to grips with deploying some of the concepts in practice. I have experience with these and would be happy to write lots of content.
The build process duplicates lots of relationship information between chapters redundantly, so it's not easy to add new chapters or re-order them.
It might be worth considering revamping this to be more 'single point of truth' (eg. filesystem alphabetical ordering / naming driven) or switching to a different document markup/build system entirely.
It might be useful to consider enhancing the build process to facilitate better quality diagrams (generated with mscgen
or graphviz
, for example) for non ASCIIfied output.
To let more people learn from this.
On page 9 of the epub, there is what I assume is supposed to be a table of availability, and what downtime per year that translates to. In the current epub, this is just a list:
Availability %
How much downtime is allowed per year?
90% ("one nine")
More than a month
99% ("two nines")
Less than 4 days
99.9% ("three nines")
Less than 9 hours
(...)
If this is by design, feel free to just close this.
There is no book cover image to put on Hacker Shelf.
A simple and plain version of the red, white, and black header of the site, with the author’s name written beneath it, and arranged in an image with a portrait layout, would be enough. Or you can add whatever graphics you think fit the book.
A comparison of the feature set and limitations of available open source distributed storage/filesystems wouldn't go astray. This would have to discuss the historical feature set and assumptions of single host based block devices/filesystems and the challenges of distributed access.
There are multiple classes of distributed storage, ranging from more open/namespacey stuff like URIs, magnet links and freenet, through to cluster filesystems with strong availability guarantees to DRBD-backed conventional filesystems (offering consistency guarantees plus availability).
Again, this would assist with linking theoretical knowledge to more pragmatic real world systems architecture / deployment concerns.
The last line I see is: 'Systems' in the seminal papers section.
The generated EPUB is invalid - probably because of the script/noscript-sections.
Conversion of the MOBI file with calibre (gui) is producing a valid EPUB.
ubuntu-14.04 64bit, calibre-1.25
Useful to cover as they seem to be popularizing as a networking API in distributed systems programming.
I have begun a rambling introduction on mq
branch in my fork @ https://github.com/globalcitizen/distsysbook/blob/mqs/input/6_message-queues.html
Sorry for my bad English. I think there is something wrong with the code of merging two vector clocks:
VectorClock.prototype.merge = function(other) {
var result = {}, last,
a = this.value,
b = other.value;
// This filters out duplicate keys in the hash
// But why join b and the key of a?
(Object.keys(a)
.concat(b))
.sort()
.filter(function(key) {
var isDuplicate = (key == last);
last = key;
return !isDuplicate;
}).forEach(function(key) {
result[key] = Math.max(a[key] || 0, b[key] || 0);
});
this.value = result;
};
The code join the key of a and b, but not the key of b.
This link is broken:
"The diagram below, adapted from Ryan Barret at Google, describes some of the aspects of the different options:"
The link on "Google" is https://www.google.com/events/io/2009/sessions/TransactionsAcrossDatacenters.html
Also, great book!
Ch01 says
"In tasks that require only small amounts of communication between nodes, the performance advantage of high-end hardware is limited."
You've misread the graph. It's more: if your application requires large amounts of communication, the benefit of high end hardware is limited.
And: in larger clusters, large amounts of communication is near inevitable, eliminating nearly all performance advantages of high end hardware.
Chapter 2 contains the following phrase:
The difference seems immaterial, but it is worth noting that sequential consistency does not compose.
However, it's not clear what you mean when you say "sequential consistency does not compose".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.