datamining-speeddating's People
datamining-speeddating's Issues
Feedback on Task 8
Our comments on the provisional version of Task 8.
1 - Karina provides a reference script for descriptive analysis
https://www-eio.upc.edu/~karina/datamining/refmaterial/labos/1Descriptiva/7TotalDescriptiveClean.R
It's a for loop that for each variable plots whatever is necessary etc, instead of hard-coding and copy-pasting for each variable.
Also, it has a R-Markdown version of the same script (like the one you uploaded to Overleaf). It is currently configured to output Word but surely could be configured to output Latex or whatever.
2 - "Hist function not found"
When we run the script, my R studio throws an error: "Hist function not found". We have tried to install some packages but it didn't work. hist, in lowercase, works for us but some parameters are different. In order to make the results reproducible (for instance, by the teacher), you should make sure you add the dependencies in the script (the required packages). Or it is a default package and my R runtime is broken?
3 - Why no boxplots?
They appear in the reference script.
4 - Insuficient bivariate analysis
We think this is the most important thing that is lacking, in our opinion. You are only doing some plots for some pairs of variables. We think many more could be done, some of them would probably be very relevant.
5 - Labeling/legends of plots
For the bivariate analysis, the Y/vertical variables are not labeled. In general it would be better if all plots had a title.
6 - Descriptives before and after pre-processing
Point C) in the task 8: "c. When required, please include descriptives before and after preprocessing"
We think this should be done at least for some of the variables that have been affected by the pre-processing. Karina's script says it in a comment, inside the loop.
Conclusion: What we think you should do now
As far as the univariate analysis is concerned, maybe you could just copy-paste Karina's script and modify a few things, like adding a table with the summary or something. You can configure it to output Latex. You could easily add the plots/comparison with the pre-processing version, at least for some of them. This should take only a few minutes. This part should be kinda automatized.
Where we think you should focus your efforts: Work on the bivariate analysis, there are many more pairs of variables that would be nice to be analyzed, and perhaps you could include more than just one plot.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.