mschermann / data_viz_reader Goto Github PK
View Code? Open in Web Editor NEWA Reader on Data Visualization
Home Page: https://mschermann.github.io/data_viz_reader/
License: Creative Commons Attribution Share Alike 4.0 International
A Reader on Data Visualization
Home Page: https://mschermann.github.io/data_viz_reader/
License: Creative Commons Attribution Share Alike 4.0 International
For now, we can see a lot of overlaps between chapter 4 and chapter 5, some contents of cases studies, bad/good visualization examples, improvement tips are organized under chapter 5 instead of chapter 4.
Would it be better to:
(Pick the Right Chart Type 3.7
Visual Data Communication 3.14
Gestalt Principles for Data 3.15
Definitions of Data Deception and Graphic Integrity 3.5.1)
About deception part, I think differently from suggestion in issue #316 . Deceptive charts are useful tool to develop charts and enhance ideas so this part may be put into Patterns part.
Please make sure that references to figures actually work. Run the project on your local machine and understand the log output.
Problem:
Should we organize chapter 4 in the order of articles we added or just by different cases and put their articles information only in references part? Or should we combine cases in difference themes and reorganize the chapter?
Because in this chapter, some of the contents are from one article, but we seems like separate them in an ambiguous order. Here is the class reader:
For example: 4.5/4.6/4.7/4.8/4.9/4.10 are all cases cited from 4.2 article, but now they are ordered in parallel with 4.2, then there are only two lines under 4.2. And 4.1,4.3, 4.4 also have this problem.
Some tags in the R markdown file is not in the book.bib file and make the reader has the "???" marks. Either we should create those tags in the book.bib file or correct the tags to have the correct reference links or remove the tags that no longer used in the R markdown file.
For exmaple, at "case studies" chapter, there are tags such as"15_mindblowing" "population_pyramid". However, this tag can't be found in book.bib file.
Another example is http://vudlab.com/simpsons/ . This link doeesn't exist.
There are reflections/references in fundamentals with similar themes, and it might make the chapter more cohesive to group them?
Some ideas for possible sections:
History
Design Principles
Best Practices
Tools
Special Features (e.g. interactivity)
Additional References
Please include dataset that you want to use in your contributions in a folder resources
. When you use a dataset, reference it correctly.
1.0 As I were going through the reader, One problem that I could see was that information on some of the topics are scattered throughout many sections of the reader. For example:
Best Practices of Visualization:
In section 3(i.e. fundamentals) we have the following articles on Best Practices:
a. Practitioners Guide to Best Practices in Data Visualization(at 3.2)
b. Three Rules to Follow in order to Develop Intuitive Dashboards(3.2.1)
c. Fundamental Components of Design(3.3)
d. Visualization and Graphics Principles to Refocus and Guide You(3.3.1)
e. Best practices for visualization(3.10)
f. Story Telling with Data(3.11)
g. Tufte’s Design Principles of graphical excellence(3.13)
h. Visual Data Communication (3.14 )
i. Gestalt Principles for Data Viz(3.15)
Again in section 5(i.e. patterns), we have the following topics:
a. 5 Tips to improve Data Visualization(5.7)
b. An example to back some of our theories on ‘how to tell stories using data visualization’ / ‘exploratory data visualization’(5.11)
Data Visualization tools:
a) Data Visualization Tools(at Section 3.4)
b) Interactive Data Visualization(at Section 3.12)
c) Tips for Tableau(at Section 5.9)
d) Reusable Data Visualization Code in R(5.12)
Charts:
a) Pick the Right Chart Type(3.7)
b) Why pie chart is bad: a comparison with bar chart(at Section5.3)
Data Vizualization in Business/Corporate:
a) Data Visualization in Business(3.6)
b) Corporate Scorecards and Data Visualization(3.11.1)
Resources:
a) Useful Links on Data Visualization Trends, Tutorials and Research Papers(at Section-2.4)
b) Resources for Aspiring Data Visualists(3.16)
c) Other sources of great visualization(4.24)
d) More ways to improve your visualization design(5.8)
Time Series:
a) Avoiding Common Mistakes with Time Series(at Section 5.1)
b) Example Visualizations of Time Series Data(at Section5.6)
1.1To solve this problem I want to take the following step:
1. Integrate similar topics
2. If some of the topics have less information add more articles to it.
3. At last, if some information are repetitive, then delete that information upon consent of the
original contributor.
1.2 For this week, I would like to focus more on the first two steps and I invite you all to contribute on the same. Also, you can add to the above list at.1.0 more topics with similar information that are scattered in the reader and work upon them.
There are a thousand Hamlets in a thousand people's eyes, and people will have different views about the graph, the content in the graph, the style of the graph, the color of the graph, etc. those sectors could all be some reasons for the reader to misunderstand the data. I disagree with what the author defined “deceptive data graph” in Fundamental, Definitions of Data Deception and Graphic Integrity section. According to the author, deceptive data graph is “a graphical depiction of information, designed with or without an intent to deceive, that may create a belief about the message and its components, which varies from the actual message.” After I read this definition, I feel the reason that deceptive data graph happened is all belongs to the original author, and the author ignored the possibility that sometime it is not “misleading,” it's “misunderstanding.” For readers grow up in different places, with different educational backgrounds, and different purpose of reading the article. It is easy for people to generate different conclusions based on the same graph.
For example, the following graphs are from the reader. According to the article, it states that: “it is definitely misleading. This is often called improper extraction or tactic omitting data when only a certain chunk of data is included. This is more common in graphs that have time as one of their axes.”
The first graph shows the data from 2015-3-9 to 2015-3-13, and the second graph shows the data from 2015-3-2 to 2015-3-21. The author thinks this those two graph is an excellent example of improper extraction or tactic omitting data, however, what if the purpose of the first graph is trying to show the audience that the stabilization between 2015-3-9 and 2015-3-13? Before defining whether this graph is misleading its audience, we need to understand the purpose of the graph. Here is another example:
The purpose of the first graph is to prove that the US is an outlier on gun violence because it has way more guns than other developed nations. The author compared the United States with other countries about homicide by firearms per 1 million people, and the result shows that the United States has way more murders by firearms than other countries.
This second chart represents the homicide by firearm per 1 million people for different countries all over the world. When the rate goes higher, the country color becomes redder. From this chart, we can easily observe that the US stay in the safe “yellow” realm.
The above two graphs were generated from a different perspective. We cannot readily conclude that the US is a safe country or not. Compare with other developed countries, US has a higher rate, but compare with all other countries in the world, and the US is well below the world’s average rate. If we just solely picked one graph and reached a conclusion whether or not the US is a safe country, however, when another graph shows, we may alter our conclusion. Under this particular circumstance, we cannot define which graph is a “deceptive graph.”
“Data visualization deception” is a heavy word, because once I discover one deceptive graph from the article, I will have a conclusion that this author is trying to hide something or to mislead me by breaking the Graphic Integrity. This could lead me to reach an opposite conclusion to the article. I think the best way to detect visualization deception is before making any conclusion, understand very details and information the graph provided, legend, caption, numbers, etc.
5.2 Outlier Detection
6.6.3 Segmentation
there are so many places in the reader show Source: (???) referenced in (???). I list them as below.
2.1 What is Data Visualization?
2.5.2 David McCandless
3.1 Storytelling
3.3 Three Rules to Follow in order to Develop Intuitive Dashboards
3.4.5 Three Rules to Follow in order to Develop Intuitive Dashboards
3.5.2 Interactive Data Visualization
3.6 Research Results & What’s Next
Chapter 4 Case Studies
4.1.2 Two Centuries of U.S. Immigration
4.4.3 From Pyramid to Pillar: A Century of Change, Population of the U.S.
4.5.8 A Guide to Who is Fighting Whom in Syria
4.5.9 Adding up the White Oscars Winners
4.5.17 Britain’s diet in data
5.1 Using Shapes as Filters in Tableau When Your Fields Are Measures
5.9.3 Conclusions
5.13 An example to back some of our theories on ‘how to tell stories using data visualization’ / ‘exploratory data visualization’
6.5 Definitions of Data Deception and Graphic Integrity
7.1 More ways to improve your visualization design
I found some missing images on 5.2 Outlier Detection.
When reviewing the reader contents I found that there're many contributions were just simply said "according to the article" or "as shown in the article", etc. I think it would be really helpful if we could summarize the content and copy paste the useful figures directly on the reader. Also, try to avoid languages like "I learned from this article that..." and try to organize the wordings to make it more like a reader/book format.
For the titles and headers in the case study chapter, I would like for us to decide on the levels for the headers. Since @lpyuan and @SaloniS95 have started to reorganize and remove non-case studies in #313 I think that if we rearrange the actual case studies, we can then have the second level headers (e.g. 4.1 or ##) as the groupings and the individual case studies as third level (e.g. 4.1.1 or ###). With the case studies currently being all second level, the Case Study section of the table of contents feels unnecessarily long.
Thoughts?
We have several 'Best Practices' sections. Perhaps it is a good idea to combine them and integrate the information?
On line 531 in the patterns.Rmd, there is a reference to a local folder that will not work once it is on github. Please fix this issue.
When I read case studies, I feel that some contents are repetitive. For example, "Best Data Visualization Projects of 2015" occurs in both "Best Data Visualizations cases" and " Articles to Read..." sections. I understand such content may include multiple graphs. My suggestions are
When I read this reader, I found a lot of paragraphs are just text piled up there. There are no clearer layers for some topics and emphasis or graph examples. This issue will make our reader not so readable and well understandable.
For emphasis, it will be a clear clue for everyone to get the main point quickly. So I suggest highlighting the main point in each topic.
For graph examples, it will be necessary to explain some graph use image, not words. I suggest those topics to try to add at least one image below the text.
For example:
In Chapter 5: Patterns: 5.1 Avoiding Common Mistakes with Time Series
The passage just uses text to describe the graph and no graph for example. It makes it hard to understand. No emphasis sentence showed up either.
When I read the reference chapter, the format of many reference items is not consistent. For example, some items show the year: N/A (Author: Andrei Zinovyev, year: N/A) and some without year information(4.3 Design Iron Fist Author: Jarrod Drysdale )
Some items show the data information, like February 28th, 2016 and show just show the year information, like 2015. Those just show the year information actually can show the date information, such as 2.17 Next Steps for Data Visualization Research . And some are just wrong , such as "“An Aging Population.” XXXX."
Some other titles are weird that start with"---" , such as ———. 2018. “Infogram- Data Visualization.” or XXX, such as "XXX, XX. XXXX. “Using Data Visualization to Find Insights in Data.”
Moreover, some links of the reference can't be used. For exmaple, "visualizer, A great. 1982. “A ficticious web page title.” http://great{\_}viz{\_}org." . "———. 2018b. “Tableau Groups.” https://www.linkedin.com/search/results/groups/?keywords=tableau{\&}origin=SWITCH{\_}SEARCH{\_}VERTICAL.
"
In Case Studies there are multiple bullet points which cover similar information. For example:-
10 Best Data Visualization Projects of 2015
16 Captivating Data Visualization Examples
15 Data Visualizations That Will Blow Your Mind
All the above can be consolidated and summarized together. I was also think we can add insights at the end that we can draw from the best data visualizations similar to what we discussed in class (Minard).
During the cleanup last week and this week, I noticed that there are repetitive paragraphs/sentences in the reader. The essential problem is that people simply delete conflict markers such as <<<<<<<, =======, >>>>>>> but didn't look at why there is a conflict and many times it is multiple people working on the same thing. It is recommended to read contents around the conflict markers to avoid duplication.
When reading chapter2-fundamentals, I found that the logic of this chapter is not very clear. Topics are just randomly put together and do not follow any logic. Obviously some parts are talking with the similar things and therefore and be grouped together as subtopics. Here are some possible subtitles.
Though not a very important matter to be addressed but sometimes I see sentences using we, you, I , for instance, "We understand from this graph that...", and sometimes "I infer from the article that..." and at few places, "This is what you need for a good visualization...". I have tried making them follow one trend.
Other than that, there are places where it is mentioned, "we learnt this in the class", so we are making a reader on visualization, is it correct to mention, in the class - referring to our Data Viz class or content like that?
The following references are mentioned in the text but were either incomplete or missing in the book.bib
file. Please add them. Make sure that you stick to the formatting of the book.bib
file.
pandoc-citeproc: reference viz not found
pandoc-citeproc: reference info_viz not found
pandoc-citeproc: reference history_viz not found
pandoc-citeproc: reference charts_viz not found
pandoc-citeproc: reference eagereyes_viz not found
pandoc-citeproc: reference research_viz not found
pandoc-citeproc: reference twitter_Kosara not found
pandoc-citeproc: reference flowingdata not found
pandoc-citeproc: reference infogram not found
pandoc-citeproc: reference data_viz_history not found
pandoc-citeproc: reference intuitive-dash not found
pandoc-citeproc: reference design_principles not found
pandoc-citeproc: reference decept_study not found
pandoc-citeproc: reference rose_tint not found
pandoc-citeproc: reference evil_axes not found
pandoc-citeproc: reference mislead_graph_ex not found
pandoc-citeproc: reference charts_viz not found
pandoc-citeproc: reference next_steps not found
pandoc-citeproc: reference next_steps not found
pandoc-citeproc: reference data_journ not found
pandoc-citeproc: reference DataVizBestPrac not found
pandoc-citeproc: reference benefits_interactive_viz not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference Tufte_2001 not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference Immigration not found
pandoc-citeproc: reference population_pyramid not found
pandoc-citeproc: reference syria_chart not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference oscars_sowhite_chart not found
pandoc-citeproc: reference int_viz_2 not found
pandoc-citeproc: reference case_thesis not found
pandoc-citeproc: reference vizwiz_malaria not found
pandoc-citeproc: reference info_beautiful not found
pandoc-citeproc: reference country_chart not found
pandoc-citeproc: reference country_original not found
pandoc-citeproc: reference hans not found
pandoc-citeproc: reference interworks not found
pandoc-citeproc: reference KDD99 not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference britain-diet-data-trends not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference britain-diet-data-typical_diet not found
pandoc-citeproc: reference britain_diet_2016 not found
pandoc-citeproc: reference TabPy not found
pandoc-citeproc: reference data_meaning not found
pandoc-citeproc: reference design_ebooks not found
pandoc-citeproc: reference Calendar_Layout not found
pandoc-citeproc: reference CalendarView not found
pandoc-citeproc: reference DataUSA not found
pandoc-citeproc: reference CensusDataViz not found
References should be added to the file book.bib
and then referenced using [@...]
. This way the bookdown software will render all references correctly.
This is my suggestion to clean up the bottom of the intro page, but I welcome your input.
I think with the intro, short and clean is best to grab attention and make a smooth transition to the rest of the book. That said, there are four sections at the bottom (not all committed to the master yet) that I think should be moved elsewhere but I'm not sure where is best.
"Useful Links on Data Visualization Trends, Tutorials and Research Papers"
"Resources for Aspiring Data Visualists"
"Other sources of great visualization"
"More ways to improve your visualization design"
It's all solid content, but they seem more like conclusion topics and will send readers elsewhere, when the goal of the first chapter should be enticing them to dive into chapter 2.
How do you think we can best integrate these sections into the reader?
Please include figures and images in the folder images
. Make sure that you cite images correctly! I will remove any images without reference.
In intro, there is a topic named "Key Figures in the History of Data Visualization". And the opening of Fundamentals talks about A Brief History of Data Visualization. I think this two could be merged to make the reader more logical.
It is clean-up week, so please pay close attention to the following issues:
References.Rmd should be empty. It will be automatically filled. All references should be added appropriately to the book.bib.
I noticed that reference also contains some "non-reference" contents such as website recommendations and case study examples, which makes the reference messing.
I'm thinking maybe we should use the reference content only for listing all the reference source and put those website recommendations into somewhere more appropriate.
The following labels do not work. Please fix:
1: The label(s) fig:hickey-before, fig:hickey-after not found
2: The label(s) fig:hickey-breakdown not found
3: The label(s) fig:hickey-3D not found
4: The label(s) fig:henry-quarter, fig:henry-half not found
5: The label(s) fig:hickey-before, fig:hickey-after not found
6: The label(s) fig:hickey-breakdown not found
7: The label(s) fig:hickey-3D not found
8: The label(s) fig:henry-quarter, fig:henry-half not found
Sometimes it is beneficial to include links in the text. Put them in the text correctly:
Example:
This is a very cool [link](http://www.verycoollink.com)
produces
This is a very cool link
In fundamentals, there are many design principles under different topics:
Use the following procedure to add references:
book.bib
fileIf you include code, make sure it runs!
The references at the chapter 1 Preface should go to Chapter References. However, we can't change that.
I have commented out every line that causes trouble while building the reader:
Markdown that produces errors has been commented with <!-- Problematic Markdown --> R Code that produces problems has been set to
eval = FALSE`
Please fix this.
What exactly does the "Patterns" portion represent? Currently it is very heterogeneous and not very specific.
According to me the following topics from chapter 4,5 amd 6 need to go to different sections
Section::: suggested section
5.9 : fundamentals
5.6 : case studies
5.10 :case study
5.13 : fundamentals
4.27 : fundamentals
6.3 : fundamentals / patterns
6.4 : fundamentals / patterns
Why is the "contributions" branch missing? Should I create PR to "master" branch instead?
Do not just mention the case studies. Describe them, include figures, and REPLICATE them.
As I read 04 patterns, I found some of contents lack reference, please check the following:
any reference?
[@TabPy]
...Imagine a scenario where we can just enter some x values in a dashboard form, and the visualization would predict the y variable!!! Here is a link that shows how to integrate and visualize data from Python in Tableau.
Where is the link???
reference?
When I was trying to proofread some parts of a chapter, especially when the text length is longer than few paragraphs, after I pull request there'll always be conflicts and sometimes the conflicts could be more than 10... I found it hard and scary when trying to resolve the conflicts because it requires me to read the parts again and also needs to make sure I won't delete anything useful by mistake.
So I'm wondering is it only my concerns, or is it the right way to do it?
In Patterns file, under 'Using design patterns to find greater meaning in your data' section, there are three local image references.
@YingjieYu Can you change your image references to the GitHub image references?
These embedded HTML codes has created a giant calendar on this chapter's page. It should be fenced by lines starting with ```
We have quite a lot material on deceptive visualizations. It would be good to integrate and consolidate.
The Patterns Chapter contains some tips for better use of data visualization software and libraries. While they can be useful, these tips do not quite fit into this chapter, because these are implementation details of pattern, not the pattern itself. I suggest that we create a new chapter to contain these tips:
5.2 Building advanced analytics application with TabPy
5.9 Tips for Tableau
5.10.1 Calendar View
5.12 Reusable Data Visualization Code in R
After viewing the ethics chapter, I am wondering we should cover the data ethics or data visualization ethics. If we just focus on data visualization then this chapter already covers some very well, such as using data visualization for deception.
However, if we want to cover data ethics as a whole, then this chapter needs to include more materials, such as data privacy, data collection, data sovereignty, and data ownership.
Some of the references were preventing bookdown from compiling. Thus, I deleted them. You need to add them again in the correct format.
If you include simple line charts/pie charts/etc. do not copy them but reproduce them using R.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.