ctrosset's People
ctrosset's Issues
Collect list of followers for the 130 companies
Dedicated server problem
I have some issues with my dedicated server.
You said that you could provide one to me is it still possible ?
Thank you.
Collect sample of 1000 followers for each company
Print out top weights for each label
- what are the highest weighted accounts for male/female labels. See
clf.coef_
Remove sensitive data from github
- See here
Read over linear regression background
Print a scatter plot of truth versus predicted % male
Matplotlib is good for this.
Compare against a baseline
- Average of y values
Hopefully, we're doing better than this.
Create feature matrix X and target matrix Y
X = [company x follower proportions]
Y = [company x demographic breakdown]
E.g.
Here's a matrix of two companies, with three follower proportions each.
X = [[0.2, 0.1, 0.7],
[0.5, 0.4, 0.1]]
Here's a matrix of their demographic breakdown (%male, %female)
Y = [[45.4, 54.6],
[46.7, 53.3]]
Error analysis
- Sort all predictions by error (truth - predicted)
- For top 10 errors:
- report the company name
- report top 5 features weighted by (abs(feature value * model coefficient))
Compute cross-validation mean-squared error for gender prediction
Try different values for alpha
.1, .3, .5, .7, 1
Add documentation to README.md
- Please keep track of each stage of processing and document it in README.md.
That is, enumerate step-by-step how to go from the raw data to the output of your analysis.
Filter X matrix by company frequency
I.e., remove columns that are non-zero for fewer than 3 rows.
Collect list of people each follower follows
E.g., person X follows company Y. Collect all Z's that X follows.
Plot MSE and number of columns as threshold decreases
Try thresholds of 50, 25, 10, 5, 3 and compute cross-validation MSE and number of remaining columns for each.
We expect threshold of 50 to have many fewer columns (~10,000?)
Repeat experiments with Lasso
Try predicting other variables
- income
- age
Write research report
- Introduction
- what is the problem
- why is it important
- roughly what is our solution
- how have previous people done this?
- Data collection
- what data did we collect and how
- rough statistics about the data
- methods
- how did we compute X matrix
- what models did we use (ridge regression)
- how did we pick best alpha
- results
- cross-validation MSE for each task
- top features for each output variable (e.g., male, female, old, young) (~10)
- conclusions
- how well did this work
- what are next steps
Linear regression computation issue
I have the following error on my loadMatrices.py script.
('Coefficients: \n', array([[ 0. , 0.03644562, 0.09987471, ..., 0.00901544,
0.02207366, -0.0079762 ],
[ 0. , -0.03586746, -0.09994756, ..., -0.00893835,
-0.02191355, 0.00792444]]))
Traceback (most recent call last):
File "loadMatrices.py", line 88, in
% np.mean((regr.predict(X) - Y) ** 2))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/matrixlib/defmatrix.py", line 343, in pow
return matrix_power(self, other)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/matrixlib/defmatrix.py", line 160, in matrix_power
raise ValueError("input must be a square array")
ValueError: input must be a square array
I don't know where it could come from.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.