mschermann / data_viz_reader Goto Github PK

View Code? Open in Web Editor NEW

19.0 6.0 17.0 185.47 MB

A Reader on Data Visualization

Home Page: https://mschermann.github.io/data_viz_reader/

License: Creative Commons Attribution Share Alike 4.0 International

Shell 0.08% TeX 97.23% CSS 2.69%

r bookdown class data-visualization student-project

data_viz_reader's People

Contributors

Stargazers

Watchers

Forkers

nivs2710 allisonyanc skakers hazel-han neophyticfern preethisr sikhadasr 82817 phoenixren bharatimalik preranap1 btamsir xzou0803 anhnguyendepocen samuellow amroalfadil acolivera

data_viz_reader's Issues

Reorganize Overlapping Contents in Chapter 5-Patterns and Chapter 4-Cases Studies

For now, we can see a lot of overlaps between chapter 4 and chapter 5, some contents of cases studies, bad/good visualization examples, improvement tips are organized under chapter 5 instead of chapter 4.

Would it be better to:

Move all materials in chapter 5 but actually related to chapter 4 such as visualization examples and tips to chapter 4;
Define a more clear structure of chapter 5 about what content should be added in here.

Some idea for Patterns Chapter5

What to include: According to our original thought, I think we should include principles that are reproducible or make it easy to develop charts in Chapter 5. Different from Chapter Fundamentals, which should include all general ideas for viz, Chapter Patterns should include something that can be practiced. Right now base on what we already have, some sessions list below in Chapter 3 Fundamentals should be moved into Chapter 5:

(Pick the Right Chart Type 3.7
Visual Data Communication 3.14
Gestalt Principles for Data 3.15
Definitions of Data Deception and Graphic Integrity 3.5.1)

About deception part, I think differently from suggestion in issue #316 . Deceptive charts are useful tool to develop charts and enhance ideas so this part may be put into Patterns part.

Integration: Right now some sessions in Chapter 5 are overlapped and some sessions are too detail to be a separate session. My suggestion is combine session with same idea, such as 5.1 and 5.6, also we may integrate all general tips into one single session.

References to figures

Please make sure that references to figures actually work. Run the project on your local machine and understand the log output.

How to Better Organize Chapter 4 (Case Studies)?

Problem:
Should we organize chapter 4 in the order of articles we added or just by different cases and put their articles information only in references part? Or should we combine cases in difference themes and reorganize the chapter?

Because in this chapter, some of the contents are from one article, but we seems like separate them in an ambiguous order. Here is the class reader:

For example: 4.5/4.6/4.7/4.8/4.9/4.10 are all cases cited from 4.2 article, but now they are ordered in parallel with 4.2, then there are only two lines under 4.2. And 4.1,4.3, 4.4 also have this problem.

Reference Tag is missing at book.bib file

Some tags in the R markdown file is not in the book.bib file and make the reader has the "???" marks. Either we should create those tags in the book.bib file or correct the tags to have the correct reference links or remove the tags that no longer used in the R markdown file.

For exmaple, at "case studies" chapter, there are tags such as"15_mindblowing" "population_pyramid". However, this tag can't be found in book.bib file.
Another example is http://vudlab.com/simpsons/ . This link doeesn't exist.

Organize Fundamentals Chapter by Related Topics?

There are reflections/references in fundamentals with similar themes, and it might make the chapter more cohesive to group them?

Some ideas for possible sections:
History
Design Principles
Best Practices
Tools
Special Features (e.g. interactivity)
Additional References

How to include datasets

Please include dataset that you want to use in your contributions in a folder resources. When you use a dataset, reference it correctly.

Integrate similar topics from different sections together

1.0 As I were going through the reader, One problem that I could see was that information on some of the topics are scattered throughout many sections of the reader. For example:

```
 Best Practices of Visualization:
```

In section 3(i.e. fundamentals) we have the following articles on Best Practices:
a. Practitioners Guide to Best Practices in Data Visualization(at 3.2)
b. Three Rules to Follow in order to Develop Intuitive Dashboards(3.2.1)
c. Fundamental Components of Design(3.3)
d. Visualization and Graphics Principles to Refocus and Guide You(3.3.1)
e. Best practices for visualization(3.10)
f. Story Telling with Data(3.11)
g. Tufte’s Design Principles of graphical excellence(3.13)
h. Visual Data Communication (3.14 )
i. Gestalt Principles for Data Viz(3.15)

Again in section 5(i.e. patterns), we have the following topics:
a. 5 Tips to improve Data Visualization(5.7)
b. An example to back some of our theories on ‘how to tell stories using data visualization’ / ‘exploratory data visualization’(5.11)

```
 Data Visualization tools:
```

a) Data Visualization Tools(at Section 3.4)
b) Interactive Data Visualization(at Section 3.12)
c) Tips for Tableau(at Section 5.9)
d) Reusable Data Visualization Code in R(5.12)

```
 Charts:
```

a) Pick the Right Chart Type(3.7)
b) Why pie chart is bad: a comparison with bar chart(at Section5.3)

 Data Vizualization in Business/Corporate:

a) Data Visualization in Business(3.6)
b) Corporate Scorecards and Data Visualization(3.11.1)

```
 Resources:
```

a) Useful Links on Data Visualization Trends, Tutorials and Research Papers(at Section-2.4)
b) Resources for Aspiring Data Visualists(3.16)
c) Other sources of great visualization(4.24)
d) More ways to improve your visualization design(5.8)

```
 Time Series:
```

a) Avoiding Common Mistakes with Time Series(at Section 5.1)
b) Example Visualizations of Time Series Data(at Section5.6)

1.1To solve this problem I want to take the following step:
1. Integrate similar topics
2. If some of the topics have less information add more articles to it.
3. At last, if some information are repetitive, then delete that information upon consent of the
original contributor.

1.2 For this week, I would like to focus more on the first two steps and I invite you all to contribute on the same. Also, you can add to the above list at.1.0 more topics with similar information that are scattered in the reader and work upon them.

Issue with Fundamental - Data Deception and Graphic Integrity section

title: "Viz R5"
author: "Bochen Wang"
date: "5/19/2018"
output: html_document

Orginally RMD files with picture: https://github.com/bochenw/Individua-project/edit/master/Reader%205.Rmd

ISSUE

Fundamental, Definitions of Data Deception and Graphic Integrity section

Link: https://github.com/mschermann/data_viz_reader/blob/master/02-fundamentals.Rmd#definions-of-data-deception-and-graphic-integrity

There are a thousand Hamlets in a thousand people's eyes, and people will have different views about the graph, the content in the graph, the style of the graph, the color of the graph, etc. those sectors could all be some reasons for the reader to misunderstand the data. I disagree with what the author defined “deceptive data graph” in Fundamental, Definitions of Data Deception and Graphic Integrity section. According to the author, deceptive data graph is “a graphical depiction of information, designed with or without an intent to deceive, that may create a belief about the message and its components, which varies from the actual message.” After I read this definition, I feel the reason that deceptive data graph happened is all belongs to the original author, and the author ignored the possibility that sometime it is not “misleading,” it's “misunderstanding.” For readers grow up in different places, with different educational backgrounds, and different purpose of reading the article. It is easy for people to generate different conclusions based on the same graph.

For example, the following graphs are from the reader. According to the article, it states that: “it is definitely misleading. This is often called improper extraction or tactic omitting data when only a certain chunk of data is included. This is more common in graphs that have time as one of their axes.”

The first graph shows the data from 2015-3-9 to 2015-3-13, and the second graph shows the data from 2015-3-2 to 2015-3-21. The author thinks this those two graph is an excellent example of improper extraction or tactic omitting data, however, what if the purpose of the first graph is trying to show the audience that the stabilization between 2015-3-9 and 2015-3-13? Before defining whether this graph is misleading its audience, we need to understand the purpose of the graph. Here is another example:

The purpose of the first graph is to prove that the US is an outlier on gun violence because it has way more guns than other developed nations. The author compared the United States with other countries about homicide by firearms per 1 million people, and the result shows that the United States has way more murders by firearms than other countries.

This second chart represents the homicide by firearm per 1 million people for different countries all over the world. When the rate goes higher, the country color becomes redder. From this chart, we can easily observe that the US stay in the safe “yellow” realm.

The above two graphs were generated from a different perspective. We cannot readily conclude that the US is a safe country or not. Compare with other developed countries, US has a higher rate, but compare with all other countries in the world, and the US is well below the world’s average rate. If we just solely picked one graph and reached a conclusion whether or not the US is a safe country, however, when another graph shows, we may alter our conclusion. Under this particular circumstance, we cannot define which graph is a “deceptive graph.”

“Data visualization deception” is a heavy word, because once I discover one deceptive graph from the article, I will have a conclusion that this author is trying to hide something or to mislead me by breaking the Graphic Integrity. This could lead me to reach an opposite conclusion to the article. I think the best way to detect visualization deception is before making any conclusion, understand very details and information the graph provided, legend, caption, numbers, etc.

Missing Pictures of the Reader

5.2 Outlier Detection
6.6.3 Segmentation

Wrong Source and Reference

there are so many places in the reader show Source: (???) referenced in (???). I list them as below.

2.1 What is Data Visualization?
2.5.2 David McCandless
3.1 Storytelling
3.3 Three Rules to Follow in order to Develop Intuitive Dashboards
3.4.5 Three Rules to Follow in order to Develop Intuitive Dashboards
3.5.2 Interactive Data Visualization
3.6 Research Results & What’s Next
Chapter 4 Case Studies
4.1.2 Two Centuries of U.S. Immigration
4.4.3 From Pyramid to Pillar: A Century of Change, Population of the U.S.
4.5.8 A Guide to Who is Fighting Whom in Syria
4.5.9 Adding up the White Oscars Winners
4.5.17 Britain’s diet in data
5.1 Using Shapes as Filters in Tableau When Your Fields Are Measures
5.9.3 Conclusions
5.13 An example to back some of our theories on ‘how to tell stories using data visualization’ / ‘exploratory data visualization’
6.5 Definitions of Data Deception and Graphic Integrity
7.1 More ways to improve your visualization design

Missing images

I found some missing images on 5.2 Outlier Detection.

Summarize the reference contents into reader in a good manner

When reviewing the reader contents I found that there're many contributions were just simply said "according to the article" or "as shown in the article", etc. I think it would be really helpful if we could summarize the content and copy paste the useful figures directly on the reader. Also, try to avoid languages like "I learned from this article that..." and try to organize the wordings to make it more like a reader/book format.

Outline for Case Studies

For the titles and headers in the case study chapter, I would like for us to decide on the levels for the headers. Since @lpyuan and @SaloniS95 have started to reorganize and remove non-case studies in #313 I think that if we rearrange the actual case studies, we can then have the second level headers (e.g. 4.1 or ##) as the groupings and the individual case studies as third level (e.g. 4.1.1 or ###). With the case studies currently being all second level, the Case Study section of the table of contents feels unnecessarily long.
Thoughts?

Observations

We have several 'Best Practices' sections. Perhaps it is a good idea to combine them and integrate the information?

Do not use local folder references

On line 531 in the patterns.Rmd, there is a reference to a local folder that will not work once it is on github. Please fix this issue.

case studies content

When I read case studies, I feel that some contents are repetitive. For example, "Best Data Visualization Projects of 2015" occurs in both "Best Data Visualizations cases" and " Articles to Read..." sections. I understand such content may include multiple graphs. My suggestions are

Move "Articles to read to check out some cool visualizations are mentioned below :" section to reference section, and with a link so people can check the article directly.
Delete the content without reproduced graph (1.1-1.4)

A lot of text piled up but no clear emphasis and graph examples

When I read this reader, I found a lot of paragraphs are just text piled up there. There are no clearer layers for some topics and emphasis or graph examples. This issue will make our reader not so readable and well understandable.

For emphasis, it will be a clear clue for everyone to get the main point quickly. So I suggest highlighting the main point in each topic.

For graph examples, it will be necessary to explain some graph use image, not words. I suggest those topics to try to add at least one image below the text.
For example:
In Chapter 5: Patterns: 5.1 Avoiding Common Mistakes with Time Series
The passage just uses text to describe the graph and no graph for example. It makes it hard to understand. No emphasis sentence showed up either.

The format of reference is not consistent

When I read the reference chapter, the format of many reference items is not consistent. For example, some items show the year: N/A (Author: Andrei Zinovyev, year: N/A) and some without year information(4.3 Design Iron Fist Author: Jarrod Drysdale )

Some items show the data information, like February 28th, 2016 and show just show the year information, like 2015. Those just show the year information actually can show the date information, such as 2.17 Next Steps for Data Visualization Research . And some are just wrong , such as "“An Aging Population.” XXXX."

Some other titles are weird that start with"---" , such as ———. 2018. “Infogram- Data Visualization.” or XXX, such as "XXX, XX. XXXX. “Using Data Visualization to Find Insights in Data.”

Moreover, some links of the reference can't be used. For exmaple, "visualizer, A great. 1982. “A ficticious web page title.” http://great{\_}viz{\_}org." . "———. 2018b. “Tableau Groups.” https://www.linkedin.com/search/results/groups/?keywords=tableau{\&}origin=SWITCH{\_}SEARCH{\_}VERTICAL.
"

Best Data Visualization - Case Study - Consolidation

In Case Studies there are multiple bullet points which cover similar information. For example:-
10 Best Data Visualization Projects of 2015
16 Captivating Data Visualization Examples
15 Data Visualizations That Will Blow Your Mind
All the above can be consolidated and summarized together. I was also think we can add insights at the end that we can draw from the best data visualizations similar to what we discussed in class (Minard).

Issue with Merge Conflict

During the cleanup last week and this week, I noticed that there are repetitive paragraphs/sentences in the reader. The essential problem is that people simply delete conflict markers such as <<<<<<<, =======, >>>>>>> but didn't look at why there is a conflict and many times it is multiple people working on the same thing. It is recommended to read contents around the conflict markers to avoid duplication.

Reorganizing Fundamentals

When reading chapter2-fundamentals, I found that the logic of this chapter is not very clear. Topics are just randomly put together and do not follow any logic. Obviously some parts are talking with the similar things and therefore and be grouped together as subtopics. Here are some possible subtitles.

Methodology. This part could include topics such as "Practitioners Guide to Best Practices in Data Visualization", "Pick the Right Chart Type","Visual Data Communication"
Tools. For example, "Data Visualization Tools" , "Story Telling with Data"
Principles, including "Tufte's Design Principles of graphical excellence" and"Gestalt Principles for Data Viz".

Use of we, you and I in the description text - content

Though not a very important matter to be addressed but sometimes I see sentences using we, you, I , for instance, "We understand from this graph that...", and sometimes "I infer from the article that..." and at few places, "This is what you need for a good visualization...". I have tried making them follow one trend.
Other than that, there are places where it is mentioned, "we learnt this in the class", so we are making a reader on visualization, is it correct to mention, in the class - referring to our Data Viz class or content like that?

References missing

The following references are mentioned in the text but were either incomplete or missing in the book.bib file. Please add them. Make sure that you stick to the formatting of the book.bib file.

pandoc-citeproc: reference viz not found
pandoc-citeproc: reference info_viz not found
pandoc-citeproc: reference history_viz not found
pandoc-citeproc: reference charts_viz not found
pandoc-citeproc: reference eagereyes_viz not found
pandoc-citeproc: reference research_viz not found
pandoc-citeproc: reference twitter_Kosara not found
pandoc-citeproc: reference flowingdata not found
pandoc-citeproc: reference infogram not found
pandoc-citeproc: reference data_viz_history not found
pandoc-citeproc: reference intuitive-dash not found
pandoc-citeproc: reference design_principles not found
pandoc-citeproc: reference decept_study not found
pandoc-citeproc: reference rose_tint not found
pandoc-citeproc: reference evil_axes not found
pandoc-citeproc: reference mislead_graph_ex not found
pandoc-citeproc: reference charts_viz not found
pandoc-citeproc: reference next_steps not found
pandoc-citeproc: reference next_steps not found
pandoc-citeproc: reference data_journ not found
pandoc-citeproc: reference DataVizBestPrac not found
pandoc-citeproc: reference benefits_interactive_viz not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference Immigration not found
pandoc-citeproc: reference population_pyramid not found
pandoc-citeproc: reference syria_chart not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference case_thesis not found
pandoc-citeproc: reference vizwiz_malaria not found
pandoc-citeproc: reference info_beautiful not found
pandoc-citeproc: reference country_chart not found
pandoc-citeproc: reference country_original not found
pandoc-citeproc: reference hans not found
pandoc-citeproc: reference interworks not found
pandoc-citeproc: reference KDD99 not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference britain-diet-data-trends not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference britain-diet-data-typical_diet not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference TabPy not found
pandoc-citeproc: reference data_meaning not found
pandoc-citeproc: reference design_ebooks not found
pandoc-citeproc: reference Calendar_Layout not found
pandoc-citeproc: reference CalendarView not found
pandoc-citeproc: reference DataUSA not found
pandoc-citeproc: reference CensusDataViz not found

Clean up references

References should be added to the file book.bib and then referenced using [@...]. This way the bookdown software will render all references correctly.

Intro Page - Resources at the bottom, move to another section?

This is my suggestion to clean up the bottom of the intro page, but I welcome your input.

I think with the intro, short and clean is best to grab attention and make a smooth transition to the rest of the book. That said, there are four sections at the bottom (not all committed to the master yet) that I think should be moved elsewhere but I'm not sure where is best.

"Useful Links on Data Visualization Trends, Tutorials and Research Papers"
"Resources for Aspiring Data Visualists"
"Other sources of great visualization"
"More ways to improve your visualization design"

It's all solid content, but they seem more like conclusion topics and will send readers elsewhere, when the goal of the first chapter should be enticing them to dive into chapter 2.

How do you think we can best integrate these sections into the reader?

How to include images and figures

Please include figures and images in the folder images. Make sure that you cite images correctly! I will remove any images without reference.

Merge History Parts of Intro and Fundamentals

In intro, there is a topic named "Key Figures in the History of Data Visualization". And the opening of Fundamentals talks about A Brief History of Data Visualization. I think this two could be merged to make the reader more logical.

Clean up week

It is clean-up week, so please pay close attention to the following issues:

Use markdown correctly. Format headings appropriately and make sure that your text is consistent. Here is the rmarkdown cheat sheet.
Remove all links in the text and include them as references.
Clean up the references (no duplicates, all references MUST have an author and a year)
Install bookdown and make sure that your contributions do not result in any errors. If in doubt, talk to me.

Clear references.Rmd

References.Rmd should be empty. It will be automatically filled. All references should be added appropriately to the book.bib.

use of reference

I noticed that reference also contains some "non-reference" contents such as website recommendations and case study examples, which makes the reference messing.
I'm thinking maybe we should use the reference content only for listing all the reference source and put those website recommendations into somewhere more appropriate.

Labels wrong

The following labels do not work. Please fix:

1: The label(s) fig:hickey-before, fig:hickey-after not found
2: The label(s) fig:hickey-breakdown not found
3: The label(s) fig:hickey-3D not found
4: The label(s) fig:henry-quarter, fig:henry-half not found
5: The label(s) fig:hickey-before, fig:hickey-after not found
6: The label(s) fig:hickey-breakdown not found
7: The label(s) fig:hickey-3D not found
8: The label(s) fig:henry-quarter, fig:henry-half not found

Links in the text

Sometimes it is beneficial to include links in the text. Put them in the text correctly:

Example:

This is a very cool [link](http://www.verycoollink.com)

produces

This is a very cool link

Multiple design principles in fundamentals

In fundamentals, there are many design principles under different topics:

Nine principles of design from graphic designer Melissa Anderson
Gestalt Principles of design
Tufte's Design Principles
Conclusion: composition of design principles(Looks like it is a continuation of Tufte's principles )
Some of the principles are overlapping and some aren't. We have to either merge the principles or if we are keeping all the design principles, it should be grouped together as they are in different subsections now.

References

Use the following procedure to add references:

Create an entry in the book.bib file
Reference the entry in the Rmd file using the format [@entry_name]

Code

If you include code, make sure it runs!

Preface_References.

The references at the chapter 1 Preface should go to Chapter References. However, we can't change that.

Commented text and code

I have commented out every line that causes trouble while building the reader:

Markdown that produces errors has been commented with  R Code that produces problems has been set to eval = FALSE`

Please fix this.

Chapter 4 , 5, 6

What exactly does the "Patterns" portion represent? Currently it is very heterogeneous and not very specific.

According to me the following topics from chapter 4,5 amd 6 need to go to different sections

Section::: suggested section

5.9 : fundamentals
5.6 : case studies
5.10 :case study
5.13 : fundamentals
4.27 : fundamentals
6.3 : fundamentals / patterns
6.4 : fundamentals / patterns

The "contributions" branch is deleted?

Why is the "contributions" branch missing? Should I create PR to "master" branch instead?

Case Studies

Do not just mention the case studies. Describe them, include figures, and REPLICATE them.

Missing reference/link in 04 Patterns

As I read 04 patterns, I found some of contents lack reference, please check the following:

Genetic Network Reconstruction

any reference?

Building Advanced Analytics Application with TabPy

[@TabPy]
...Imagine a scenario where we can just enter some x values in a dashboard form, and the visualization would predict the y variable!!! Here is a link that shows how to integrate and visualize data from Python in Tableau.
Where is the link???

Pick the Right Chart Type

reference?

Issues with proof reading

When I was trying to proofread some parts of a chapter, especially when the text length is longer than few paragraphs, after I pull request there'll always be conflicts and sometimes the conflicts could be more than 10... I found it hard and scary when trying to resolve the conflicts because it requires me to read the parts again and also needs to make sure I won't delete anything useful by mistake.
So I'm wondering is it only my concerns, or is it the right way to do it?

Image references in Patterns

In Patterns file, under 'Using design patterns to find greater meaning in your data' section, there are three local image references.
@YingjieYu Can you change your image references to the GitHub image references?

Section 5.10.1 (Calendar View) contains raw HTML code in Markdown

These embedded HTML codes has created a giant calendar on this chapter's page. It should be fenced by lines starting with ```

Deceptive viz

We have quite a lot material on deceptive visualizations. It would be good to integrate and consolidate.

Create a new chapter to contain tips for using visualization software/packages

The Patterns Chapter contains some tips for better use of data visualization software and libraries. While they can be useful, these tips do not quite fit into this chapter, because these are implementation details of pattern, not the pattern itself. I suggest that we create a new chapter to contain these tips:

5.2 Building advanced analytics application with TabPy
5.9 Tips for Tableau
5.10.1 Calendar View
5.12 Reusable Data Visualization Code in R