Giter Club home page Giter Club logo

hanhan_data_science_resources2's Introduction

Hanhan_Data_Science_Resources2

More data science resources It seems that my Data Science Resources cannot be updated, create a new one here for more resources


SUMMARIZED RESOURCES


TREE BASED MODELS & ENSEMBLING


DATA PREPROCESSING

  • For more data preprocessing, check DATA PREPROCESSING section: https://github.com/hanhanwu/Hanhan_Data_Science_Resources

  • Entity Resolution

  • Dimension Reduction

    • t-SNE, non-linear dimensional reduction
      • Reference (a pretty good one!): https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29
      • (t-SNE) t-Distributed Stochastic Neighbor Embedding is a non-linear dimensionality reduction algorithm used for exploring high-dimensional data, it considers nearest neighbours when reduce the data. It is a non-parametric mapping.
      • The problem with linear dimensional reduction, is that they concentrate on placing dissimilar data points far apart in a lower dimension representation. However, it is also important to put similar data close together, linear dimensional reduction does not do this
      • In t-SNE, there are local approaches and global approaches. Local approaches seek to map nearby points on the manifold to nearby points in the low-dimensional representation. Global approaches on the other hand attempt to preserve geometry at all scales, i.e mapping nearby points to nearby points and far away points to far away points  
      • It is important to know that most of the nonlinear techniques other than t-SNE are not capable of retaining both the local and global structure of the data at the same time.
      • The algorithm computes pairwise conditional probabilities and tries to minimize the sum of the difference of the probabilities in higher and lower dimensions. This involves a lot of calculations and computations. So the algorithm is quite heavy on the system resources. t-SNE has a quadratic O(n2) time and space complexity in the number of data points. This makes it particularly slow and resource draining while applying it to data sets comprising of more than 10,000 observations. Another drawback is, it doesn’t always provide a similar output on successive runs.
      • How it works: it clusters similar data reocrds together, but it's not clustering because once the data has been mapped to lower dimensional, the original features are no longer recognizable.
      • NOTE: t-SNE could also help to make semanticly similar words close to each other, which could help create text summary, text comparison
      • R practice code: https://github.com/hanhanwu/Hanhan_Data_Science_Practice/blob/master/t-SNE_practice.R
      • Python practice code: https://github.com/hanhanwu/Hanhan_Data_Science_Practice/blob/master/t-SNE_Practice.ipynb
    • For more Dimensional Reduction, check "DATA PREPROCESSING" section
    • Some dimentional reduction methods
    • A little more about Factor Analysis
      • Fator Analysis is a variable reduction technique. It is used to determine factor structure or model. It also explains the maximum amount of variance in the model
      • EFA (Exploratory Factor Analysis) – Identifies and summarizes the underlying correlation structure in a data set
      • CFA (Confirmatory Factor Analysis) – Attempts to confirm hypothesis using the correlation structure and rate ‘goodness of fit’.
    • Dimension Reduction Must Know
      • Reference: https://www.analyticsvidhya.com/blog/2017/03/questions-dimensionality-reduction-data-scientist/?utm_content=bufferc792d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
      • Besides different algorithms to help reduce number of features, we can also use existing features to form less features as a dimensional reduction method. For example, we have features A, B, C, D, then we form E = 2A+B, F = 3C-D, then only choose E, F as the features for analysis
      • Cost function of SNE is asymmetric in nature. Which makes it difficult to converge using gradient decent. A symmetric cost function is one of the major differences between SNE and t-SNE.
      • For the perfect representations of higher dimensions to lower dimensions, the conditional probabilities for similarity of two points must remain unchanged in both higher and lower dimension, which means the similarity is unchanged
      • LDA aims to maximize the distance between class and minimize the within class distance. If the discriminatory information is not in the mean but in the variance of the data, LDA will fail.
      • Both LDA and PCA are linear transformation techniques. LDA is supervised whereas PCA is unsupervised. PCA maximize the variance of the data, whereas LDA maximize the separation between different classes.
      • When eigenvalues are roughly equal, PCA will perform badly, because when all eigen vectors are same in such case you won’t be able to select the principal components because in that case all principal components are equal. When using PCA, it is better to scale data in the same unit
      • When using PCA, features will lose interpretability and they may not carry all the info of the data. You don’t need to initialize parameters in PCA, and PCA can’t be trapped into local minima problem. PCA is a deterministic algorithm which doesn’t have parameters to initialize. PCA can be used for lossy image compression, and it is not invariant to shadows.
      • A deterministic algorithm has no param to initialize, and it gives the same result if we run again.
      • Logistic Regression vs LDA: If the classes are well separated, the parameter estimates for logistic regression can be unstable. If the sample size is small and distribution of features are normal for each class. In such case, linear discriminant analysis (LDA) is more stable than logistic regression.

MODEL EVALUATION

  • 7 important model evaluation metrics and cross validation: https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/

    • Confusion Matrix
    • Lift / Gain charts are widely used in campaign targeting problems. This tells us till which decile can we target customers for an specific campaign. Also, it tells you how much response do you expect from the new target base.
    • Kolmogorov-Smirnov (K-S) chart is a measure of the degree of separation between the positive and negative distributions. The K-S is 100, the higher the value the better the model is at separating the positive from negative cases.
    • The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is also known as false positive rate and sensitivity is also known as True Positive rate. To bring ROC curve down to a single number, AUC, which is the ratio under the curve and the total area. .90-1 = excellent (A) ; .80-.90 = good (B) ; .70-.80 = fair (C) ; .60-.70 = poor (D) ; .50-.60 = fail (F). But this might simply be over-fitting. In such cases it becomes very important to to in-time and out-of-time validations. For a model which gives class as output, will be represented as a single point in ROC plot. In case of probabilistic model, we were fortunate enough to get a single number which was AUC-ROC. But still, we need to look at the entire curve to make conclusive decisions.
    • Lift is dependent on total response rate of the population. ROC curve on the other hand is almost independent of the response rate, because the numerator and denominator of both x and y axis will change on similar scale in case of response rate shift.
    • Gini = 2*AUC – 1. Gini Coefficient is nothing but ratio between area between the ROC curve and the diagnol line & the area of the above triangle
    • The concordant pair is where the probability of responder was higher than non-responder. Whereas discordant pair is where the vice-versa holds true. Concordant ratio of more than 60% is considered to be a good model. It is primarily used to access the model’s predictive power. For decisions like how many to target are again taken by KS / Lift charts.
    • RMSE: The power of ‘square root’ empowers this metric to show large number deviations. The ‘squared’ nature of this metric helps to deliver more robust results which prevents cancelling the positive and negative error values. When we have more samples, reconstructing the error distribution using RMSE is considered to be more reliable. RMSE is highly affected by outlier values. Hence, make sure you’ve removed outliers from your data set prior to using this metric. As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.
    • k-fold cross validation is widely used to check whether a model is an overfit or not. If the performance metrics at each of the k times modelling are close to each other and the mean of metric is highest. For a small k, we have a higher selection bias but low variance in the performances. For a large k, we have a small selection bias but high variance in the performances. Generally a value of k = 10 is recommended for most purpose.
    • Tolerance (1 / VIF) is used as an indicator of multicollinearity. It is an indicator of percent of variance in a predictor which cannot be accounted by other predictors. Large values of tolerance is desirable.
  • To measure linear regression, we could use Adjusted R² or F value.

  • To measure logistic regression:

    • AUC-ROC curve along with confusion matrix to determine its performance.
    • The analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.
    • Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model.
    • Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.
  • Regularization becomes necessary when the model begins to ovefit / underfit. This technique introduces a cost term for bringing in more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce cost term. This helps to reduce model complexity so that the model can become better at predicting (generalizing).


Applied Data Science in Python/R/Java


Statistics in Data Science

  • Find All Calculators here, this one is easier to understand and better to use

  • Termology glossary for statistics in machine learning: https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/

  • Statistics behind Boruta feature selection: https://github.com/hanhanwu/Hanhan_Data_Science_Resources2/blob/master/boruta_statistics.pdf

  • How the laws of group theory provide a useful codification of the practical lessons of building efficient distributed and real-time aggregation systems (from 22:00, he started to talk about HyperLogLog and other approximation data structures): https://www.infoq.com/presentations/abstract-algebra-analytics

  • Confusing Concepts

    • Errors and Residuals: https://en.wikipedia.org/wiki/Errors_and_residuals
    • Heteroskedasticity: led by non-constant variance in error terms. Usually, non-constant variance is caused by outliers or extreme values
    • Coefficient and p-value/t-statistics: coefficient measures the strength of the relationship of 2 variables, while p-value/t-statistics measures how strong the evidence that there is non-zero association
    • Anscombe's quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed: https://en.wikipedia.org/wiki/Anscombe's_quartet
    • Difference between gradient descent and stochastic gradient descent: https://www.quora.com/Whats-the-difference-between-gradient-descent-and-stochastic-gradient-descent
    • Correlation & Covariance: In probability theory and statistics, correlation and covariance are two similar measures for assessing how much two attributes change together. The mean values of A and B, respectively, are also known as the expected values on A and B, E(A), E(B). Covariance, Cov(A,B)=E(A·B) - E(A)*E(B)
    • Rate vs Proportion: A rate differs from a proportion in that the numerator and the denominator need not be of the same kind and that the numerator may exceed the denominator. For example, the rate of pressure ulcers may be expressed as the number of pressure ulcers per 1000 patient days.
  • Bias is useful to quantify how much on an average are the predicted values different from the actual value. A high bias error means we have a under-performing model which keeps on missing important trends. Varianc on the other side quantifies how are the prediction made on same observation different from each other. A high variance model will over-fit on your training population and perform badly on any observation beyond training.

  • OLS and Maximum likelihood are the methods used by the respective regression methods to approximate the unknown parameter (coefficient) value. OLS is to linear regression. Maximum likelihood is to logistic regression. Ordinary least square(OLS) is a method used in linear regression which approximates the parameters resulting in minimum distance between actual and predicted values. Maximum Likelihood helps in choosing the the values of parameters which maximizes the likelihood that the parameters are most likely to produce observed data.

  • Standard Deviation – It is the amount of variation in the population data. It is given by σ. Standard Error – It is the amount of variation in the sample data. It is related to Standard Deviation as σ/√n, where, n is the sample size, σ is the standandard deviation of the population

  • 95% confidence interval does not mean the probability of a population mean to lie in an interval is 95%. Instead, 95% C.I means that 95% of the Interval estimates will contain the population statistic.

  • If a sample mean lies in the margin of error range then, it might be possible that its actual value is equal to the population mean and the difference is occurring by chance.

  • Difference between z-scores and t-values are that t-values are dependent on Degree of Freedom of a sample, and t-values use sample standard deviation while z-scores use population standard deviation.

  • The Degree of Freedom – It is the number of variables that have the choice of having more than one arbitrary value. For example, in a sample of size 10 with mean 10, 9 values can be arbitrary but the 10th value is forced by the sample mean.

  • Residual Sum of Squares (RSS) - It can be interpreted as the amount by which the predicted values deviated from the actual values. Large deviation would indicate that the model failed at predicting the correct values for the dependent variable. Regression (Explained) Sum of Squares (ESS) – It can be interpreted as the amount by which the predicted values deviated from the the mean of actual values.

  • Residuals is also known as the prediction error, they are vertical distance of points from the regression line

  • Co-efficient of Determination = ESS/(ESS + RSS). It represents the strength of correlation between two variables. Correlation Coefficient = sqrt(Co-efficient of Determination), also represents the strength of correlation between two variables, ranges between [-1,1]. 0 means no correlation, 1 means strong positive correlation, -1 means strong neagtive correlation.

  • About Data Sampling: http://psc.dss.ucdavis.edu/sommerb/sommerdemo/sampling/types.htm

    • Probability sampling can be representative, non-probability sampling may not
    • probability Sampling    * Random sample. (I guess R sample() is random sampling by default, so that each feature has the same weight)
      • Stratified sample
    • Nonprobability Sampling
      • Quota sample
      • Purposive sample
      • Convenience sample
  • Comprehensive and Practical Statistics Guide for Data Science - A real good one!

    • Sample Distribution and Population Distribution, Central Limit Theorem, Confidence Interval
    • Hypothesis Testing
    • t-test calculator
    • ANOVA (Analysis of Variance), continuous and categorical variables, ANOVA also requires data from approximately normally distributed populations with equal variances between factor levels.
    • F-ratio calculator
    • Chi-square test, categorical variables
    • chi-square calculator
    • Regression and ANOVA, it is important is knowing the degree to which your model is successful in explaining the trend (variance) in dependent variable. ANOVA helps finding the effectiveness of regression models.
    • An example of hypothesis test with chi-square:
      • chi-square tests the hypothesis that A and B are independent, that is, there is no correlation between them. Chi-square is used to calculate the correlation between characteristical variables
      • In this example, you have already calculated chi-square value as 507.93
      • B feature has 2 levels, "science-fiction", "history"; A feature has 2 levels, "female", "male". So we can form a 2x2 table. The degree of freedom = (2-1)*(2-1) = 1
      • Use the calculator here to calculate significant level, type degree of freedom as 1, probability as 0.001 (you can choose a probability you'd like). The calculated significant level is 10.82756617
      • chi-square value 507.93 is much larger than the significant level, so we reject the hypothesis that A, B are independent and not correlated
  • Probability cheat sheet: http://www.cs.elte.hu/~mesti/valszam/kepletek

  • Probability basics with examples

    • binonial distribution: a binomial distribution is the discrete probability distribution of the number of success in a sequence of n independent Bernoulli trials (having only yes/no or true/false outcomes).
    • The normal distribution is perfectly symmetrical about the mean. The probabilities move similarly in both directions around the mean. The total area under the curve is 1, since summing up all the possible probabilities would give 1.
    • Area Under the Normal Distribution
    • Z score: The distance in terms of number of standard deviations, the observed value is away from the mean, is the standard score or the Z score. Observed value = µ+zσ [µ is the mean and σ is the standard deviation]
    • Find Z Table here
  • Very Basic Conditional Probability and Bayes Theorem

    • Independent, Exclusive, Exhaustive events
    • Each time, when it's something about statistics pr probability, I will still read all the content to guarantee that I won't miss anything useful. This one is basic but I like the way it starts from simple concepts, using real life examples and finally leads to how does Bayes Theorem work. Although, there is an error in formula P (no cancer and +) = P (no cancer) * P(+) = 0.99852*0.99, it should be 0.99852*0.01
    • There are some major formulas here are important to Bayes Theorem:    * P(A|B) = P(A AND B)/P(B)
      • P(A|B) = P(B|A)*P(A)/P(B)
      • P(A AND B) = P(B|A)*P(A) = P(A|B)*P(B)
      • P(b1|A) + P(b2|A) + .... + P(bn|A) = P(A)
  • Dispersion - In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

  • Common Formulas

  • Linear regression line attempts to minimize the squared distance between the points and the regression line. By definition the ordinary least squares (OLS) regression tries to have the minimum sum of squared errors. This means that the sum of squared residuals should be minimized. This may or may not be achieved by passing through the maximum points in the data. The most common case of not passing through all points and reducing the error is when the data has a lot of outliers or is not very strongly linear.

  • Person vs Spearman: Pearson correlation evaluated the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable. Spearman evaluates a monotonic relationship. A monotonic relationship is one where the variables change together but not necessarily at a constant rate.

  • Linear Algebra with Python calculations


Machine Learning Algorithms

  • KNN with R example: https://www.analyticsvidhya.com/blog/2015/08/learning-concept-knn-algorithms-programming/

    • KNN unbiased and no prior assumption, fast
    • It needs good data preprocessing such as missing data imputing, categorical to numerical
    • k normally choose the square root of total data observations
    • It is also known as lazy learner because it involves minimal training of model. Hence, it doesn’t use training data to make generalization on unseen data set.
  • SVM with Python example: https://www.analyticsvidhya.com/blog/2015/10/understaing-support-vector-machine-example-code/?utm_content=buffer02b8d&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer

  • Basic Essentials of Some Popular Machine Learning Algorithms with R & Python Examples: https://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/?utm_content=buffer00918&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer

    • Linear Regression: Y = aX + b, a is slope, b is intercept. The intercept term shows model prediction without any independent variable. When there is only 1 independent variable, it is Simple Linear Regression, when there are multiple independent variables, it is Multiple Linear Regression. For Multiple Linear Regression, we can fit Polynomial Courvilinear Regression.
    • When to use Ridge or Lasso: In presence of few variables with medium / large sized effect, use lasso regression. In presence of many variables with small / medium sized effect, use ridge regression. Lasso regression does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance.
    • Logistic Regression: it is classification, predicting the probability of discrete values. It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).
    • Decision Tree: serves for both categorical and numerical data. Split with the most significant variable each time to make as distinct groups as possible, using various techniques like Gini, Information Gain = (1- entropy), Chi-square. A decision tree algorithm is known to work best to detect non – linear interactions. The reason why decision tree failed to provide robust predictions because it couldn’t map the linear relationship as good as a regression model did.
    • SVM: seperate groups with a line and maximize the margin distance. Good for small dataset, especially those with large number of features
    • Naive Bayes: the assumption of equally importance and the independence between predictors. Very simple and good for large dataset, also majorly used in text classification and multi-class classification. Likelihood is the probability of classifying a given observation as 1 in presence of some other variable. For example: The probability that the word ‘FREE’ is used in previous spam message is likelihood. Marginal likelihood is, the probability that the word ‘FREE’ is used in any message.
    • KNN: can be used for both classification and regression. Computationally expensive since it stores all the cases. Variables should be normalized else higher range variables can bias it. Data preprocessing before using KNN, such as dealing with outliers, missing data, noise
    • K-Means
    • Random Forest: bagging, which means if the number of cases in the training set is N, then sample of N cases is taken at random but with replacement. This sample will be the training set for growing the tree. If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. Each tree is grown to the largest extent possible. There is no pruning. Random Forest has to go with cross validation, otherwise overfitting could happen.
    • PCA: Dimensional Reduction, it selects fewer components (than features) which can explain the maximum variance in the data set, using Rotation. Personally, I like Boruta Feature Selection. Filter Methods for feature selection are my second choice. Remove highly correlated variables before using PCA
    • GBM (try C50, XgBoost at the same time in practice)
    • Difference between Random Forest and GBM: Random Forest is bagging while GBM is boosting. In bagging technique, a data set is divided into n samples using randomized sampling. Then, using a single learning algorithm a model is build on all samples. Later, the resultant predictions are combined using voting or averaging. Bagging is done is parallel. In boosting, after the first round of predictions, the algorithm weighs misclassified predictions higher, such that they can be corrected in the succeeding round. This sequential process of giving higher weights to misclassified predictions continue until a stopping criterion is reached.Random forest improves model accuracy by reducing variance (mainly). The trees grown are uncorrelated to maximize the decrease in variance. On the other hand, GBM improves accuracy my reducing both bias and variance in a model.
  • Online Learning vs Batch Learning: https://www.analyticsvidhya.com/blog/2015/01/introduction-online-machine-learning-simplified-2/

  • Optimization - Genetic Algorithm

  • Survey of Optimization

  • Optimization - Gradient Descent

    • Reference: https://www.analyticsvidhya.com/blog/2017/03/introduction-to-gradient-descent-algorithm-along-its-variants/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29
    • Challenges for gradient descent
      • data challenge: cannot be used on non-convex optimization problem; may end up at local optimum instead of global optimum; may even not an optimal point when gradient is 0 (saddle point)
      • gradient challenge: when gradient is too small or too large, vanishing gradient or exploding gradient could happen
      • implementation chanllenge: memory, hardware/software limitations
    • Type 1 - Vanilla Gradient Descent
      • "Vanilla" means pure here
      • update = learning_rate * gradient_of_parameters
      • parameters = parameters - update
    • Type 2 - Gradient Descent with Momentum
      • update = learning_rate * gradient
      • velocity = previous_update * momentum
      • parameter = parameter + velocity – update
      • With velocity, it considers the previous update
    • Type 3 - ADAGRAD
      • ADAGRAD uses adaptive technique for learning rate updation.
      • grad_component = previous_grad_component + (gradient * gradient)
      • rate_change = square_root(grad_component) + epsilon
      • adapted_learning_rate = learning_rate * rate_change
      • update = adapted_learning_rate * gradient
      • parameter = parameter – update
      • epsilon is a constant which is used to keep rate of change of learning rate
    • Type 4 - ADAM
      • ADAM is one more adaptive technique which builds on adagrad and further reduces it downside. In other words, you can consider this as momentum + ADAGRAD.
      • adapted_gradient = previous_gradient + ((gradient – previous_gradient) * (1 – beta1))
      • gradient_component = (gradient_change – previous_learning_rate)
      • adapted_learning_rate = previous_learning_rate + (gradient_component * (1 – beta2))
      • update = adapted_learning_rate * adapted_gradient
      • parameter = parameter – update
    • Tips for choose models
      • For rapid prototyping, use adaptive techniques like Adam/Adagrad. These help in getting quicker results with much less efforts. As here, you don’t require much hyper-parameter tuning.
      • To get the best results, you should use vanilla gradient descent or momentum. gradient descent is slow to get the desired results, but these results are mostly better than adaptive techniques.
      • If your data is small and can be fit in a single iteration, you can use 2nd order techniques like l-BFGS. This is because 2nd order techniques are extremely fast and accurate, but are only feasible when data is small enough

Data Visualization


Big Data


Cloud


TEXT ANALYSIS


Non-Machine Learning Data Analysis Examples


AI


Experiences/Suggestions from Others


Data Science Skillset Tests


Interview Tips


TRAIN YOUR BRAIN


OTHER

hanhan_data_science_resources2's People

Contributors

hanhanwu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.