Giter Club home page Giter Club logo

data-design's Introduction

Quick links: Our LicenseMaking ContributionsAtlas Builds

How We're Licensed

Data + Design is licensed under the terms of a Creative Commons BY-NC-SA license. The full legal code is at the link, but here's a quick human-readable breakdown for you as well. (Note: This is only a human-readable summary of, and not a substitute for or supplement to, the license.)

By Attribution

If you use this book yourself, credit us! Basically, you just need to include a little snippet saying:

Adapted from Data + Design by Trina Chiasson and Dyanna Gregory. Licensed under a Creative Commons BY-NC-SA 4.0 license. This work is not affiliated with or endorsed by the original authors.

About all you're saying is, "Hey, this was originally made by these peeps, go check it out. Any edits to this were made by me, not them." Pretty reasonable, right?

Non-Commercial

If you fork and make edits to this book, you can't do it for monetary gain. The purpose of this book is to help people make better-informed decisions in collecting, analyzing, and presenting their data. And to do that free availability of this book is kind of a big deal. So don't sell it, mmkay?

Share Alike

Finally, your version has to be licensed under the same terms as this one is. No telling people they can sell their derivatives, no telling people they don't have to attribute back to the original. Basically, just keep everything free and open.

Well, What If...

Of course there are edge cases. There are times when you're just not going to be sure if doing X or Y will be kosher with the license or not. That's fine! Just open an issue on GitHub and let us know what's up. We'll talk it over and help you figure out if what you want to do is cool beans or if you should tweak your plans a little bit. It never hurts to ask!

Quick links: Our LicenseMaking ContributionsAtlas Builds

Can I Make Edits?

Of course you can! That's part of the reason we're making this book totally openly sourced. We love when the community takes the time and effort to contribute to things like this. There are three different routes we recommend, depending on what you want to do.

Fork Us!

This is the best option if you want to go off in your own direction, or just want your own copy of the source for reference, or generally just don't think that you're in the mood to add a new chapter. For most people, this is the best option.

What's even better? This is super easy to do. Just go to the repository and click Fork in the upper right corner. Voila! You're now the proud owner of a brand spanking new Data + Design repository.

Please note that we are keeping the project's CSS in a separate repository (link). You'll need this if you want to generate your own builds of the book. Likewise, if you see HTML elements and CSS classes that don't look familiar, that's likely because the book is maintained according to HTMLBook specifications.

Fork Us! (Redux)

Hey, wait a minute! What's up with this? It's just the same thing all over again!

Whoops. You got us there. As it turns out, the first step is the same for both people who want to contribute back to our repo and those who don't: you just gotta fork it.

But alright, let's say you've got a great idea for a new chapter. What's the process for getting it accepted and published look like? Well,

  1. Fork the main repository and keep it synced.
  2. Open an issue with us and label it as a question. Let everyone know what you're thinking. Even if you have a great chapter idea, we can't guarantee that there's a good spot in the book to put it and we don't want you spending hours of your time writing something that we'll end up rejecting. Get feedback on the idea and make sure that it'll be a chapter we can use.
  3. Start authoring! Here's where your fork comes in handy: you'll be making all chapter edits there.
  4. Get reviewers. Go back to your GitHub issue, check with the commenters there, and see who would be willing to review your chapter for content, grammar, and style. Add them as contributors on your fork, let them make edits, etc.
  5. Once you all think that the chapter's in a good place, create a pull request back to the main repository. That's when we'll go through and do an 'official' edit and suggest any changes we'd like to see before accepting your chapter.
  6. Publish! We'll accept your pull request and you'll offically have a chapter in Data + Design Congratulate yourself on a job well done and make sure that we didn't forget to add your information to our page of contributors.

Use Google Docs

We realize that not everyone knows how to use GitHub or wants to learn how to. And that's fine! If you don't want to worry about HTMLbook specifications and forks and pull requests and all that jazz, we've got your back. The process is conceptually similar to Option #2:

  1. Open an issue with us and label it as a question. Let everyone know what you're thinking. Even if you have a great chapter idea, we can't guarantee that there's a good spot in the book to put it and we don't want you spending hours of your time writing something that we'll end up rejecting. Get feedback on the idea and make sure that it'll be a chapter we can use.
  2. Make a Google Doc and start authoring! If you don't have a Google account, we're sorry, but you'll just have to bite the bullet and create one. We'd like to cover every major email provider in the world, but that's not terribly practical.
  3. Get reviewers. Share your document with [email protected] and we'll get you in contact with some editors who are currently available. Make sure that you give us full editing rights so that we can add editors for you!
  4. Once you all think that the chapter's in a good place, drop us a line at [email protected]. We'll go through and do an 'official' edit and suggest any changes we'd like to see before accepting your chapter.
  5. Publish! We'll handle all the behind-the-scenes stuff to convert your chapter to HTML and add it to the published book. Congratulate yourself on a job well done and make sure that we didn't forget to add your information to our page of contributors.

Rejecting Submissions

Let's say you go through the whole process of proposing, authoring, editing, and submitting a new chapter and we reject it. What happens next?

Well, that'll depend on your exact situation. It might be that we don't think the chapter is written to the standards of the rest of the book. If that's the case, revising and resubmitting is nearly always going to be an option. See if you can get input from someone whose chapter has been accepted; get another couple rounds of edits. And then edit it some more for good measure. Then go ahead and resubmit and see what happens!

Alternately, it might end up that the chapter that was proposed and the chapter that was delivered are significantly different—to the point that your delivered chapter isn't a great fit for the book. If that's the case, don't despair: you've still written a great chapter and can publish it on your forked repo. We're sorry that it didn't work out, but ask for feedback and see what needs to be changed for it to be accepted as an official addition.

Or for some entirely other reason that we can't predict, we might reject your chapter without the chance to revise and resubmit. If and when this happens, we're sorry. (Legitimately, we don't get any sick joy out of this!) We try to put quality checks in place to prevent this from happening at all, but we can't catch everything. Regardless, you're always encouraged to make whatever edits you would like to your forked repository: after all, that sort of personalization and community involvement is why we open sourced this in the first place.

That said, the best strategy is to just be proactive about your submission: get feedback early and get feedback often. This book was written by so many incredibly dedicated individuals, you shouldn't have any difficulty getting good, constructive feedback at every stage of the writing process. Take advantage of this. Even if we don't end up accepting your chapter, it'll still be something to be proud of having written.

Quick links: Our LicenseMaking ContributionsAtlas Builds

Building with Atlas

[[In Progress]]

Although, yes, we already have an HTML structure for the book, we still use an external platform to publish it. O'Reilly's Atlas allows us to take our book and publish it to a finalized (and slightly prettied-up) HTML version as well as to PDF, MOBI, and EPUB formats.

Now, if you're making an official contribution to our repository, this probably isn't too important to you since we'll be handling the actual releases. However, if you're striking it out on your own, you might want an easy way to publish your book to multiple print and web platforms. Once you've finalized the edits to your forked repo, sign up for a free trial at the link above and you'll have the option to import your repository. As long as you've kept everything to HTMLbook spec, Atlas should be able to process the book without a hitch when you go to build a final release.

Note: You can also author directly in Atlas using a nice, friendly GUI (much like authoring in Google Docs or using Microsoft Word). You'll still need to import the pre-existing book, but once that's done it's super easy to add your content using Atlas's visual editor.

data-design's People

Contributors

bonifacio2 avatar dyannali avatar melissahilldees avatar nmpeterson avatar nychang avatar pwrignall avatar renemarcelo avatar shannonturner avatar suchow avatar trinachi avatar wetherc avatar zawarudo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-design's Issues

Recent changes not deployed

Hi there, I noticed that your recent changes adding arrow navigation have not made it to the live site. Any idea of an ETA on this?

Image has truncated y-axis

Image(06/[email protected])[https://github.com/infoactive/data-design/blob/master/images/sections/06/21-02_truncated-x-axis.png] has a truncated y-axis when the text refers to it as not having a truncated y-axis.

Why no epub downloads?

Just wondering why only pdf is offered for download. the open-access-ness seems pretty broken without it.

Ch10 remove paragraph

Finally and importantly, normal data cleaning can’t tell you if data were made up.
There are statistical analyses you can run that will point to potential
fraudulent data patterns, but regular data checking can’t tell you for sure
if a record was falsified. However, if you have tracking and restricted
access permissions on the dataset, you can tell when records were added,
altered, or deleted. {I feel like we should say something more concrete
about this but I don’t know what}

MathJax support?

I know there aren't a ton of formulas throughout the book, but where there are (e.g., equation and eqn in ch12), are we supporting MathJax rendering? Thanks!

Dumb quotes -> Smart quotes

Dumb quotes -> smart quotes

Chapter Location
01-intro/foreword.html Early data visualizations were also used to answer questions pertaining to issues of public health. Epidemiologist John Snow ***LOOK HERE '***s 1854 London cholera map was created to record instances of cholera in a London neighborhood, pinpointing the cause of the outbreak to a single well. This knowledge gained from patterns in lists of names, numbers, and locations was then used to persuade London ***LOOK HERE '***s populace to install sewer systems to alleviate the proliferation and spread of disease. The human brain is particularly adept at recognizing patterns, and a good data visualization, like Snow ***LOOK HERE ’***s, optimizes displays of these patterns through effective use of Gestalt theory, design principles, and color. (Or lack of it, as this case may be.)
01-intro/foreword.html Early visual explorations of data focused mostly on small snippets of data gleaned to expand humanity ***LOOK HERE '***s understanding of the geographical world, mainly through maps. Starting with the first recognized world maps of the 13th century, scientists, mathematicians, philosophers, and sailors used math to visualize the invisible. Stars and suns were plotted, coastlines and shipping routes charted. Data visualization, in its native essence, drew the lines, points, and coordinates that gave form to the physical world and our place in it. It answered questions like *"Where am I?", "How do I get there?", and *"How far is it?"
01-intro/foreword.html Florence Nightingale, famous more for her nursing skills than her analytic prowess, was nonetheless also a master data scientist and storyteller. Through data presented via her signature Coxcomb diagram (also known as polar or rose charts), she convinced the British army to invest in sanitation measures after illustrating that the majority of deaths in the Crimean War were the result of preventable diseases caused by the horrible sanitary conditions in hospitals. "Why are we sick?" she asked, then answering the question herself by giving visual form to data.
01-intro/foreword.html Looking at this graph, it is readily apparent that preventable diseases outnumbered all other causes of death. The area in blue represents deaths by preventable diseases, measured from the center, with red representing deaths caused by injuries and black indicating all other causes. Design principles at play here include the addition of color theory to take advantage of more Gestalt principles: "Similarity" and "Continuity". Color makes it easy for us to tell which segments belong to which category. It also helps to draw the eye in a continuous path around the graphic, making it easier to read.
01-intro/foreword.html My favorite description of data visualization comes from the prolific blogger, Maria Popova, who said that data visualization is "at the intersection of art and algorithm." To learn about the history of data visualization is to become an armchair cartographer, explorer, and statistician.
01-intro/foreword.html Snow ***LOOK HERE '***s visualization, with its absence of color, optimizes Gestalt ***LOOK HERE '***s theories of visual perception, most notably "Proximity" and "Figure and Ground." The small black dots, each one representing a single case of cholera are small black figures standing out in contrast against the ground: in this graphic, the lines and white space representing streets. The proximity of these dots around the affected well are what enabled Snow to determine the exact source of the outbreak. Today, even with our advanced computing systems and sophisticated tools for creating data visualizations, there is little you could do to improve the effectiveness of this chart. It is simple, beautiful, and true: a data visualization that saved lives.
01-intro/foreword.html There is debate over the quality of this chart. Some claim it one of the best, most memorable visualizations ever created, not solely because of its visual communication strength, but in spite of it. It is remembered because of the change it inspired. Others deride it, claiming it ***LOOK HERE '***s just a glorified pie chart, suffering from the same misrepresentation of the information by distorting the data: in terms of visual perception, humans have a hard time accurately judging measures represented by differences in area. Despite their ubiquity, pie charts, for this very reason, are an incredibly poor way to visualize data. A simple stacked bar chart with reference lines, while not as beautiful or visually intriguing, would have communicated more effectively and on a quicker read.
01-intro/foreword.html Unique insight is the essence of data, both big and small, and the result of the tools that allow us to access, probe, poke, prod, dissect, visualize, and hopefully, make sense of it. Tools which, through the democratization of data visualization, allow us to change our lens on the world, creating pictures of humanity from different perspectives, bringing into focus stories about humanity and the world that were previously invisible, allowing us insight into ourselves like we ***LOOK HERE '***ve never seen before.
02-data-fundamentals/ch01-basic-data-types.html Record runners LOOK HERE ' marathon times instead of what place they finish
02-data-fundamentals/ch01-basic-data-types.html Seeing that the time is 11:30, you think to yourself, “I’ve been in line for fifteen minutes already…???” When you start thinking about the time this way, it’s considered ratio data. Ratio data is numeric and a lot like interval data, except it does have a meaningful zero point. In ratio data, a value of zero indicates an absence of whatever you ***LOOK HERE '***re measuring - zero minutes, zero people in line, zero dairy products in your basket. In all these cases, zero actually means you don ***LOOK HERE '***t have any of that thing, which differs from the data we discussed in the interval section. Some other frequently encountered variables that are often recorded as ratio data are height, weight, age, and money.
03-collecting-data/03-section-cover.html When you don ***LOOK HERE ’***t have time to go shopping, you can give someone else money and have them do it for you. You don ***LOOK HERE ’***t always get exactly what you want, but that ***LOOK HERE '***s the tradeoff for not going to the store yourself. Don ***LOOK HERE ’***t have enough cash or need something immediately? Most of us have borrowed a cup of sugar or an egg from a neighbor at some point!
03-collecting-data/03-section-cover.html You also need to decide what types of each ingredient you want to include. If a recipe calls for bell peppers but doesn ***LOOK HERE '***t say what kind, you have to pick which colors to use. Do some items need to be a specific brand or variety? If you ***LOOK HERE ’***re making a strawberry pie, you may have a favorite type of strawberry you must have, but you may not care what baking soda you get, as long as it works.
03-collecting-data/ch03-intro-to-survey-design.html One of the disadvantages of handouts is that people may be rushed to complete the survey if you are catching them in passing, which can affect the quality of your data. You will also be limited to the population that is physically present in the location where you are giving the survey. This may not be an issue if you are targeting a specific group, such as college students, shoppers at a particular store, or residents of a certain region. If you ***LOOK HERE '***re looking for a more general audience, however, you may consider handing the survey out in several different locations to reach a more diverse audience.
03-collecting-data/ch03-intro-to-survey-design.html So far, we ***LOOK HERE '***ve talked about self-administered and administered surveys, but one of the most frequently-encountered type of survey is actually a combination of these. Say you want to hand out paper surveys and have people complete and return them immediately. The survey itself is self-administered, but since you have a trained person there who is available to answer questions, it also has features of an administered survey.
03-collecting-data/ch03-intro-to-survey-design.html Unlike phone surveys, face-to-face surveys allow the interviewer and the respondent to see each other’s facial expressions and body language. This can be helpful because the additional physical cues can help the interviewer and the respondent understand better understand each other; however, it can also lead to the respondent being further influenced by the interviewer ***LOOK HERE '***s behavior and appearance.
03-collecting-data/ch04-types-of-survey-questions.html You should also avoid using the word “and” if it is connecting two different ideas within a single question. Remember, each question should focus on only one issue at a time: otherwise, you won ***LOOK HERE '***t be collecting the best data that you can. By compounding multiple thoughts into a single question, you reduce the accuracy of participants’ responses and thereby limit the claims that you can make from those data. Instead, consider using filter questions to obtain the desired information. For example:
03-collecting-data/ch05-additional-data-collection-methods.html If any of the sources are ones that you don ***LOOK HERE '***t own, make sure to properly cite them. It ***LOOK HERE '***s important to credit others LOOK HERE ' work, and it ***LOOK HERE '***s also important to be able to support your research if anyone challenges your information later on.
03-collecting-data/ch05-additional-data-collection-methods.html Sometimes the data you need to collect are a matter of observation. Let ***LOOK HERE '***s go back to Fictionals Ice Cream Parlour for a moment. You recently purchased new furniture for the store, and you ***LOOK HERE '***re considering a couple of different layouts for it. You want to see which layout seems to work best for customer flow, so you set up the furniture one way for a few days and record your personal observations about customer movement within the shop. Then you switch the furniture to the other layout and again record what you notice. These data can help you figure out what other questions you might want to ask or what other data you need before making your decision.
03-collecting-data/ch05-additional-data-collection-methods.html There are some variables that should be measured rather than surveyed if you ***LOOK HERE '***re trying to obtain an exact, correct statistic. Many medical variables, for example, are difficult if not impossible to gather accurately using a survey. Let ***LOOK HERE '***s say you need to collect data on participants LOOK HERE ' weight at the beginning of a study. There are a few common reasons someone might report an inaccurate number.
03-collecting-data/ch05-additional-data-collection-methods.html What you ***LOOK HERE '***re measuring is easily and publicly observable;
03-collecting-data/ch05-additional-data-collection-methods.html When you ***LOOK HERE '***re considering a project as a whole, it is possible that not all the research questions you ***LOOK HERE '***re trying to address can be answered using data collected from just one of the methods discussed so far. You may find that your survey will need to be supplemented with some direct measurements, or you may need to have your focus group participants complete diary forms.
03-collecting-data/ch05-additional-data-collection-methods.html When you ***LOOK HERE '***re using other documents as the main source of your data, you should first set up a data collection plan, much the way that you design a survey. The plan should detail what pieces of data you ***LOOK HERE '***re looking for, the level of measurement you want to capture them at, the time frame you need (e.g. do you only want data from the last 6 months? the last 12?), and how much data you need (e.g. do you want to look at all the receipts or just a sample of them?).
03-collecting-data/ch05-additional-data-collection-methods.html Whether your data need to be exact depends on how you ***LOOK HERE '***re using the the information. If you ***LOOK HERE '***re not concerned about having a precise measurement and an estimate will work, then a survey might be fine as long as it asks something people will be able to reasonably estimate. If you have a variable is likely to be incorrectly self-reported and it is important that these data are current and accurate, direct measurement should be used instead of a survey.
03-collecting-data/ch05-additional-data-collection-methods.html You can use observation in this way to gain insight into naturalistic behavior. This can be especially useful if your subjects of interest are not human and can ***LOOK HERE '***t answer survey questions: scientists rely on observation as a data collection technique all the time!
03-collecting-data/ch06-finding-external-data.html Assessing data quality means looking at all the details provided about the data (including metadata, or "data about the data," such as time and date of creation) and the context in which the data is presented. Good datasets will provide details about the dataset’s purpose, ownership, methods, scope, dates, and other notes. For online datasets, you can often find this information by navigating to the “About” or “More Information” web pages or by following a “Documentation” link.
03-collecting-data/ch06-finding-external-data.html Don ***LOOK HERE '***t forget to obtain variable specifications, external data dictionaries, and referenced works.
03-collecting-data/ch06-finding-external-data.html For example, let ***LOOK HERE '***s say you ***LOOK HERE '***re using the U.S. Census Annual Survey of Public Employment and Payroll.
03-collecting-data/ch06-finding-external-data.html If it has and you ***LOOK HERE '***re using the data for an analysis, make sure your analysis is adding new insights to what you know has been done with the data previously.
03-collecting-data/ch06-finding-external-data.html If yes, you ***LOOK HERE '***ll need to include your data dictionary and a list of your additional data sources.
03-collecting-data/ch06-finding-external-data.html May allow you to work with data that requires more resources to collect than you have, or data that you wouldn ***LOOK HERE '***t otherwise have access to at all
04-preparing-data-for-use/ch08-data-cleaning.html Once we ***LOOK HERE '***ve cleaned our data, we’re left with a brand new problem: how can we (and others!) verify that what we’ve done is correct and that we haven’t corrupted the data by making these changes? After all, the processed data may look vastly different from the raw data we started out with.
04-preparing-data-for-use/ch08-data-cleaning.html The simple answer is to document everything, particularly if you think you might want to share your data later on with a statistician or other researchers. When you’re cleaning your data, it ***LOOK HERE '***s always a good idea to save any changes as an entirely separate file: this way you’re always able to go back and look at what changed between the raw and processed data, what rows and columns were dropped, etc. It also ensures that you can go back to the unprocessed data if you ever want to slice things up a different way that might involve different cleaning procedures.
04-preparing-data-for-use/ch08-data-cleaning.html You should be careful to write a set of instructions as you go, documenting exactly what was done in each step to identify bad data and which data points were removed. It ***LOOK HERE '***s crucial to write this while you’re actually cleaning your data: it’s always easier to document as you go than it is to try and remember every step that you took after all is said and done. If you’re using point-and-click software to manage your data (like Excel), you should take special care to record exactly what steps were taken in cleaning the data since everything is done by hand, rather than by computer code that can be easily re-run later on. A good rule of thumb is that if you aren ***LOOK HERE '***t able to easily follow the instructions you wrote and end up with the same results a second time, you shouldn ***LOOK HERE '***t expect anyone else to be able to.
04-preparing-data-for-use/ch10-what-data-cleaning-can-and-cant-catch.html What Data Cleaning Can and Can ***LOOK HERE '***t Catch
04-preparing-data-for-use/ch11-data-transformations.html You may not see transforms every day, but when you do, it ***LOOK HERE '***s helpful to know why they were used and how they affect your data. It ***LOOK HERE '***s important to be able to see different parts of the picture when working with data, and transformations give you another tool to help you do just that!
05-visualizing-data/05-section-cover.html It ***LOOK HERE '***s also important to think about how to present the dish. Some choices are a matter of focus, like when you want to highlight a few star ingredients. Some choices are functional, like when you serve a cake with a slice removed so you can display all the layers inside. Some choices still are based on appropriateness or lack thereof; if you serve soup on a flat cake plate, it will run all over the table.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html Depending on your question of interest, you can also group your data. Maybe you ***LOOK HERE '***re more interested in reporting how many devices people used rather than exactly what devices were. You could make a column chart like the one below.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html Don ***LOOK HERE '***t use pie charts if you ***LOOK HERE '***re basing your percentages on the number of respondents that selected each answer choice! Pie charts are used to represent part-to-whole relationships and the total percentage of all the groups has to equal 100%. Since the possible sum of the percentages is greater than 100% when you base these calculations on the number of respondents, pie charts are an incorrect choice for displaying these results.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html Keep in mind that this way of displaying data is based on the number of mentions of a device, not the number of consumers who use that device. While you may be tempted to say, "24% of consumers used a cellphone in the past six months," the bar chart above isn ***LOOK HERE '***t actually displaying that information.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html Let ***LOOK HERE '***s say you ***LOOK HERE '***re doing a survey and you ***LOOK HERE '***re interested in what multimedia devices your respondents have used over the last six months. You would use a checkbox response question if you wanted to find out what all the devices were that people used over the six-month period. A radio button only allows respondents to select a single answer, so you could only use it to find out, for example, which one device a person used most often during the six-month period. Each type of question has merit, it just depends on what the purpose of your question is and how you are going to use the results.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html Let ***LOOK HERE '***s take a look at a summary of possible responses to the checkbox question posed above.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html So here ***LOOK HERE '***s the most important thing to know about checkbox questions, and it ***LOOK HERE '***s why you have to consider how you graph the results of checkbox questions differently than you do the results of other types of questions. Checkbox questions aren ***LOOK HERE '***t really their own question type! They ***LOOK HERE '***re actually just a shorthand way to write a series of yes/no questions. A respondent checks a box if an answer choice applies and leaves it blank if it doesn ***LOOK HERE '***t.
05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html You may notice that the total number of responses (1,841) is greater than the number of people that did the survey (N=500)! Why? It ***LOOK HERE '***s because of the whole "a checkbox is really a bunch of yes/no questions rolled into one" thing. The total possible number of checked boxes in a checkbox question? It ***LOOK HERE '***s the (# of "real" answer options) X (# of respondents) (Here, a "real" answer option means one that isn ***LOOK HERE '***t "None," "N/A" or "Prefer not to Answer," since selecting one of those options would prevent a person from choosing any other answers in additional to that.) For this survey, there were 6 device options (aside from "None") that a person could select and there were 500 people that answered the survey. So the total number of boxes that had the potential to get checked during the survey was 3000, not just 500.
05-visualizing-data/ch14-anatomy-of-a-graphic.html Symbols can sometimes be a helpful alternative to words in text, but make sure to clearly state what a symbol represents. For example, you could use “$” instead of the word “dollar.” However, this symbol is used with the currency of more than 30 countries, so you should specify which country ***LOOK HERE '***s currency you mean (e.g. US$, A$, Mex$).
05-visualizing-data/ch14-anatomy-of-a-graphic.html There ***LOOK HERE '***s some debate over whether or not vertical axis labels should be aligned parallel to the axis or not. On the one hand, aligning the text vertically makes it very clear that it ***LOOK HERE '***s directly associated with the vertical axis. There ***LOOK HERE '***s often more room to write the axis title text if it ***LOOK HERE '***s rotated at a 90-degree angle. On the other hand, humans are not very good at reading vertical text, and readers may find themselves squinting and turning their heads to understand the chart. If you have enough room and the axis title text is short, consider keeping the text level instead of rotating it. We ***LOOK HERE '***ve also included two examples below that show what to avoid when placing your vertical axis titles.
05-visualizing-data/ch15-importance-of-color-font-and-icons.html The exact same data is being conveyed here, but it takes the viewer slightly longer to understand it. Why? Before the viewer can correctly interpret the data, they have to create a complex metaphor in their head. First, they’re being asked to imagine that this icon represents all twenty test subjects, but they ***LOOK HERE '***re then also being asked to divide that metaphorical person by another set of criteria. As simple as they may seem, icons are still asking the mind to pretend they represent something else. The more you ask symbols to represent, the less literal and less clear they become. When it comes to using symbols and icons of any sort, stick to one simple meaning each.
06-dont-be-shady/06-section-cover.html Don ***LOOK HERE '***t mix baking soda and vinegar.
06-dont-be-shady/06-section-cover.html Don ***LOOK HERE '***t serve pink chicken.
06-dont-be-shady/06-section-cover.html Don ***LOOK HERE '***t throw water on a grease fire.
06-dont-be-shady/06-section-cover.html Some cooking don ***LOOK HERE '***ts:
06-dont-be-shady/ch18-common-visualization-mistakes.html At first glance, there doesn’t appear to be a truncation issue here. The y-axis starts at zero, so that ***LOOK HERE '***s not a problem. The critical thing to understand is that it’s the x-axis that’s been truncated this time: we’re seeing sales from less than half the year. Truncating a time period like this can give the wrong impression, especially for things that go through cycles. And - you guessed it - the sale of allergy medicine goes through a seasonal cycle since allergy symptoms are typically higher in the spring and lower in the winter.
07-conclusion/glossary.html A heat map is a graph that uses colors to represent categorical data in which the saturation of the color reflects the category ***LOOK HERE '***s frequency in the dataset.

02/number-order.png coloring

The "Strongly Agree" in the bottom right: it that color correct? That looks the same as the "Disagree" on the same row, but might just be my color blindness...

Ch4 Missing content

{ insert warning box with short set of info from data cleaning chapter about the Prefer not to answer choice }

TO-DO: Add chapter hyperlinks

Some mentions of other chapters link to them; others don't. Why not make it a party?

Chapter Location
02-data-fundamentals/ch01-basic-data-types.html To avoid confusion, we’ll be sticking with the level of measurement terms above throughout the rest of this book, except in our discussion of long-form qualitative data in the <a href="#">survey design chapter</a>. If you come across terms “categorical,” “qualitative data,” or “quantitative data” in other resources or in your work, make sure you know which definition is being used and don’t just assume!
03-collecting-data/ch05-additional-data-collection-methods.html Other existing documents that are frequently used to compile information include books, newspapers, web traffic logs, and webpages. There are also entire datasets that are available for use. These are covered in more detail in the chapter on <a href="#">Finding External Data</a>.
04-preparing-data-for-use/ch07-data-preparation.html Keep in mind that a missing value is not inherently the same thing as an intentional non-response! You don’t have the particular information that the question was asking about in either case, but when someone actively chooses not to answer, that in itself is a piece of data you wouldn’t have if the question were unintentionally skipped. Those data aren’t missing: you know exactly where they are, the respondent just doesn’t want you to have them. As discussed in the <a href="#">Survey Design</a> chapter (pg. YYY), it is good to include a “Prefer not to answer” option for questions that may be of a personal nature, such as race or ethnicity, income, political affiliation, etc. That way, you can designate a code for this type of response so when you are going through your dataset later on, you know the difference between the respondents that purposely chose not to provide you a given piece of information and the data that are just missing altogether.
04-preparing-data-for-use/ch07-data-preparation.html The best solutions are preventive. If you are the one creating the form for user input, do whatever you can to prevent receiving data that will require intensive handling during the data preparation stages. In the <a href="#">Types of Data Checking</a> chapter (pg. YYY), we’ll talk about different strategies for minimizing the number of data preparation tasks that need to be performed.
04-preparing-data-for-use/ch07-data-preparation.html If you’re not the one collecting the data but can speak with the people who are, try to work with them to identify and resolve any problematic data collection points using the strategies in the <a href="#">Types of Data Checking</a> chapter as a guide.
04-preparing-data-for-use/ch08-data-cleaning.html Spell Check is another basic check that you can use to find problems in your dataset. We suggest doing this field-by-field rather than trying to do it across the whole dataset at once. The reason for this is that a word that might be considered a misspelling in one variable could be a valid word for another variable. A good example of this is the first name field. If you have a dataset with a first name field, many of those entries could trigger a spell check alert even though they are legitimate names. If instead you focus on a single field at a time, you can more quickly work through the dataset. In the example from the <a href="#">data preparation</a> chapter where students were listing their major on a survey, say one of the students had just pulled an all-nighter and accidentally typed “Mtahmeitcs” instead of “Mathematics.” A spell check on the “Major” field in your dataset would quickly identify the misspelling and you could change it to “Math” or “Mathematics,” depending on which controlled vocabulary term you chose.
05-visualizing-data/ch12-deciding-which-and-how-much-data-to-illustrate.html Another way to do this would be to publish interactive versions of your visualizations that allow the viewers to dive in and explore the information themselves. If you’re able to share the raw datasets, that’s even better! That way, those who wish to dig deeper and understand the data in new ways will have the option to do so. We’ll talk more about static and interactive graphics later in the <a href="#">Print vs. Web</a> chapter.
06-dont-be-shady/ch17-perception-deception.html All of this brings us to the question of whether or not it’s a good idea to use icons or pictograms in our visualizations because the simplest icons are defined by their form, not color. Luckily for us, the chapter on <a href="#">The Importance of Color, Font, and Icons</a> offers some wisdom
03-collecting-data/ch06-finding-external-data.html Good citations give the reader enough information to find the data that you have accessed and used. Wondering what a good citation looks like? Try using an existing citation style manual from <a href="#"> APA</a>, <a href="#">MLA</a>, <a href="#">Chicago</a>, <a href="#">Turabian</a>, or <a href="#">Harvard</a>. Unlike citations for published items (like books), citations for a dataset vary a great deal from style to style.
04-preparing-data-for-use/ch11-data-transformations.html <p data-type="author">By <a href="#">Kiran PV</a>
05-visualizing-data/ch14-anatomy-of-a-graphic.html The contributors for this book have also set up a group to give you feedback on your visualizations. If you’re looking for honest feedback from an impartial party, our <a href="#">LinkedIn group</a> would be more than happy to help. We’re a supportive group of data and design nerds who want to help you learn, grow, and design amazing charts. Post your work-in-progress to the group forum and don’t be shy to use the Data. Design. contributors as a resource!

Typo in chapter 3

From reader's email - missing the word "from" (should be "collect from"):

Purpose of a Survey
The first step in designing a good survey is to identify its purpose before you create it. A good survey collects accurate and verifiable data that allow you to make concrete claims. This is easier to do if you have a clear purpose to guide you when deciding what information you want to collect your respondents.

Ch8 Add content and link for pattern matching resources

There are also pattern matching options in Excel and some advanced filter options that sometimes work even better. We have some resources of our own for this off the eBook site at {insert link here, I’ll record this so we can add the link for this before the 10th, it’s easy to show peeps how do but is just way too long and weird to explain in text.}

New chapters: Data munging; basic statistical analysis / lying with statistics

Hey, y'all,

Any thoughts on adding a couple chapters (or a section) on best practices for data munging and basic statistical analysis using common data analytic software? Right now the book largely assumes that the individual's data are immediately usable or that he or she knows how to get them to that point.

There's also not much discussion of how to determine when differences in data are meaningful. That can turn into a deep, dark rabbit hole, but restricting it to a basic discussion of some common analyses with linkouts for more detail should be manageable.

Thoughts on this?

Ch18: Wrong version of "truncated-x-axis" graph

Hey guys!

I've just saw that the second graph of chapter 18, which should show a "correct" Y axis, in fact is starting from 10:
Wrong Image

I've checked in the source code and it shows the correct version: https://github.com/infoactive/data-design/blob/master/images/sections/06/21-02_truncated-x-axis.png

But, for some reason, in the online version it shows another image.

It would be nice if someone could regenerate the site so that this wrong version gets replaced.

Thanks!

Provide an epub format

It would be awesome if the book could be downloaded as an .epub file, alongside the PDF format. Not sure how much work that means, though.

Ch 02 table overflow

The table on line 60 x-overflows. Not a problem for HTML, but for any print builds that's going to be truncated.

la version française

Genevieve and her superheroic team are done with the translations. And they have been superheroically patient with me.

I'm going to follow the folder structure of the English version and just replace the English html and image files with the French versions in the forked repo. If you want me to set it up a different way, let me know.

Open 'To-Do's in each chapter?

Is anyone keeping track of the inline 'to-do' comments that pop up from time to time in chapters? (E.g., "Add footnote"; "Insert warning box"; etc.)

I've run into more than a few of them while editing, but haven't seen too many of the issues closed out since then. Happy to run through and compile a list of them if there isn't already one!

TO-DO: Close out To-Dos

N.B. No promises that this is a complete list:
most of it came from our good ol' friend grep,
but can't promise that I caught everything. More
will be added as I find it.

Page numbers or links needed

Chapter Location
04-preparing-data-for-use/ch07-data-preparation.html There are so many techniques, we could write an entire book on this subject alone. You can break apart the data in Excel, or if you have programming skills, you can use Python, SQL, or any number of other languages. There’s too much for us to cover in this chapter, but for some good references, check out Appendix YYY. For right now, we’ll cover some basic starting strategies so you’ll be better equipped to take on those references when you’re ready.
04-preparing-data-for-use/ch07-data-preparation.html Keep in mind that a missing value is not inherently the same thing as an intentional non-response! You don’t have the particular information that the question was asking about in either case, but when someone actively chooses not to answer, that in itself is a piece of data you wouldn’t have if the question were unintentionally skipped. Those data aren’t missing: you know exactly where they are, the respondent just doesn’t want you to have them. As discussed in the Survey Design chapter (pg. YYY), it is good to include a “Prefer not to answer” option for questions that may be of a personal nature, such as race or ethnicity, income, political affiliation, etc. That way, you can designate a code for this type of response so when you are going through your dataset later on, you know the difference between the respondents that purposely chose not to provide you a given piece of information and the data that are just missing altogether.
04-preparing-data-for-use/ch07-data-preparation.html It is important to note that for purposes of basic descriptive visualization, missing values are described by including a non-responder category or by noting a change in the sample size. However, in inferential statistics, missing values may be dealt with in a variety of other ways, from exclusion to imputation to full analysis methods like use of EM algorithms. Check out pg. YYY for more resources on missing values in inferential statistics.
04-preparing-data-for-use/ch07-data-preparation.html The best solutions are preventive. If you are the one creating the form for user input, do whatever you can to prevent receiving data that will require intensive handling during the data preparation stages. In the Types of Data Checking chapter (pg. YYY), we’ll talk about different strategies for minimizing the number of data preparation tasks that need to be performed.
04-preparing-data-for-use/ch08-data-cleaning.html Let’s start with some of the most basic data cleaning procedures. We’re going to use Excel for many of these these examples, but you can use any spreadsheet or data manipulation software to perform these procedures. A list of programs is available in Appendix YYY.
04-preparing-data-for-use/ch08-data-cleaning.html Another slightly more advanced type of data check involves pattern matching. This is the sort of check that you can use, for example, to make sure all the entries in a field are an email address. This involves something called regular expressions (often shortened to regex), which give you a way of telling the computer, “I only want things that look like {this} to be stored in that variable. Tell me if something in there doesn’t look like {this}.” The way that you indicate what {this} should be varies from program to program and can look a little complicated if you’ve never worked with it before. If you have ever used an asterisk (*) as a wildcard for searching, that’s actually part of a regex expression, so you already know a piece of it! There are some regular expressions resources on page YYY if you want to learn the syntax for it.
04-preparing-data-for-use/ch11-data-transformations.html This chapter covers more advanced statistical concepts than some of the others but we wanted to include a brief introduction to data transformations in case you encounter them. If you need to do your own transformation, check out the resources in Appendix YYY for additional tips.
05-visualizing-data/ch14-anatomy-of-a-graphic.html Because of the way that the human brain interprets angles and area (see Chapter XXX, section YYY), it can be tough to visually assess the values associated with slices of pie charts. For that reason, it’s a good idea to add data labels to pie charts for clarity.

Additional content or commentary needed

Chapter Location
03-collecting-data/ch03-intro-to-survey-design.html One of the easiest and quickest ways you can conduct a survey is online. You can either have people fill out questionnaires directly on a website or send them a questionnaire via e-mail. There are even survey sites that let you design questionnaire forms, collect data, and analyze responses in real-time at an affordable rate. You’ll find links to a few of these sites in the Resources Appendix. {INSERT LINK HERE}
03-collecting-data/ch04-types-of-survey-questions.html { insert warning box with short set of info from data cleaning chapter about the Prefer not to answer choice }
04-preparing-data-for-use/ch08-data-cleaning.html There are also pattern matching options in Excel and some advanced filter options that sometimes work even better. We have some resources of our own for this off the eBook site at {insert link here, I’ll record this so we can add the link for this before the 10th, it’s easy to show peeps how do but is just way too long and weird to explain in text.}
04-preparing-data-for-use/ch10-what-data-cleaning-can-and-cant-catch.html Finally and importantly, normal data cleaning can’t tell you if data were made up. There are statistical analyses you can run that will point to potential fraudulent data patterns, but regular data checking can’t tell you for sure if a record was falsified. However, if you have tracking and restricted access permissions on the dataset, you can tell when records were added, altered, or deleted. {I feel like we should say something more concrete about this but I don’t know what}

Other misc issues

Chapter Location
03-collecting-data/ch04-types-of-survey-questions.html Random references section - need to figure out how to handle these

Expanding on documentation/reproducibility in Ch09

Does anyone mind if I write a few paragraphs expanding on After Data Cleaning: Please Be Kind and Document! in chapter 9 (data cleaning)? Being able to faithfully reproduce cleaning procedures and understand exactly what was done to go from raw data to processed is kind of a big deal for a lot of statisticians, and it's a lot more difficult to do in GUI-heavy programs than many people realize.

Errors generating mobi build

MOBI Build Info

Debug
Error

  • ERROR Unable to copy file skip-instructions.png into EPUB: [Errno 2] No such file or directory: 'images/sections/03/skip-instructions.png'
  • ERROR Unable to copy file 07-conclusion into EPUB: [Errno 21] Is a directory: '07-conclusion'
  • ERROR "There is no item named 'OEBPS/assets/07-conclusion' in the archive"

Intra-chapter PDF links unresponsive

None of the intra-chapter links (e.g., to glossary items) seem to be working in the PDF version of the book.

Do we know if this is an Atlas quirk or just something we're doing wrong in building the book?

Ch17 add footnotes

INSERTFOOTNOTE-CAIRO at line 25
INSERTFOOTNOTE-ROCKWELL at line 29
INSERTFOOTNOTE-HEALEY at line 49
INSERTFOOTNOTE-CLEVELAND at line 99

  • proper attribution for Healey graphics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.