pachadotdev / economiccomplexity Goto Github PK

View Code? Open in Web Editor NEW

37.0 6.0 12.0 42.5 MB

A wrapper of different indices and networks commonly used in Economic Complexity

Home Page: https://pacha.dev/economiccomplexity/

License: GNU General Public License v3.0

R 2.28% TeX 0.19% HTML 97.53%

r matrix eigenvalues eigenvectors recursive-algorithm networks graphs international-trade economic-complexity

economiccomplexity's Introduction

economiccomplexity

A wrapper of different methods from Linear Algebra for the equations introduced in The Atlas of Economic Complexity and related literature. This package provides standard matrix and graph output that can be used seamlessly with other packages. See doi:10.21105/joss.01866 for a summary of these methods and its evolution in literature.

The references for this work are Mariani, et al. (2015) doi:10.1140/epjb/e2015-60298-7, Hausmann, et al. (2014) doi:10.7551/mitpress/9647.001.0001, and Hausmann, et al. (2005) doi:10.3386/w11905.

Installation

# Install stable version from CRAN
install.packages("economiccomplexity")

# Install development version from GitHub
devtools::install_github("pachadotdev/economiccomplexity")

Community Guidelines

If you want to contribute to the software, report issues or problems with the software, please fork the repo and send me a Pull Request or open an issue. I’m happy to receive ideas and I would do my best to coordinate efforts and improve this package without reinventing the wheel.

If you seek support or have questions you can start a thread on the issues section, or you can email me but I prefer open issues as probably more users have the same questions as you.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

economiccomplexity's People

Contributors

Stargazers

Watchers

Forkers

diegokoz anhnguyendepocen math-avila amrofi vishalbelsare altaq3 leonardo-dias-up lucaasteixeira zauster sanguovbobo econmaett

economiccomplexity's Issues

tense is tricky

As a native english speaker, I struggle with tense, so take my suggestions with a grain of salt. However, I think this would flow better

The application widespread of economic complexity shall be looked with a bit of caution.

if this was

The application widespread of economic complexity should be considered with caution.

balassa_index: Size of Matrix different to original data

Dear community,

I have a dataset with 5018 unique product codes and 238 countries and when using balassa_index it returns me a matrix of only 221(countries)x 4902 (products).
I have checked and there is at least one export value greater than zero for each of the 5018 products and I have also transformed all NA's in the value column into zeros. Hence I do not understand why the command filters out some products. Do you have an idea what could be the problem ?
I left all the other arguments as default (discrete and cutoff).
Thank you.

Calculate distance values based on Proximity-output from this package

Dear all,
in addition to the metrics provided by this package (ECI,PCI, Outlook gain and outlook index) I would like to calculate the distance value for each product-year-country instance. Those familiar with economic complexity know the definition:
Distance (value between 0 and 1, for product p to country c): the sum of the proximities connecting a new good p to all the products that country c is not currently exporting. We normalize distance by dividing it by the sum of proximities between all products and product p. In other words, distance is the weighted proportion of products connected to good p that country c is not exporting.

I am able to transform the product-proximity matrix resulting from the package into a 3-column dataframe (Hs-code p1, hs-code p2, proxmity) and I also have created a dummy variable 0 and 1 based on whether the country has and RCA in this product. But I am struggling now to code a command/function to calculate the distance.
I attached my data again (its for the year 1995, but the logic is the same) and the code that I started for this calculation
How to calculate distance value.zip

testthat results

══ testthat results  ════════════════════════════════════
OK: 193 SKIPPED: 0 WARNINGS: 0 FAILED: 1
1. Failure: complexity measures are aligned with the expected output (@test-complexity_measures.R#76)

correct prody calculation

This is the equation from The Atlas of Economic Complexity:

It also appears here https://doi.org/10.3386/w11905 on page 8 but without the "c tilde" and "c prime" prime which I find quite unclear.

Here's a first version for PRODY in R: https://github.com/pachamaltese/economiccomplexity/blob/master/R/productivity_levels.R#L112
I used the formulation from the NBER paper (DOI link) and I wanted to triple check that is the same as the "c tilde" equation.

Once again, thanks a lot @mpadge

Variables cannot be named "product", "value" or "country" in balassa_index

I just wanted to make you aware of an issue I recently faced with the function balassa_index:
R gives you the error message "Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected", whenever your variables/columns used are named product, value or country. This probably arises because the three options named in the function itself for the selection of these three variables is exactly named like this.
Once you rename the variables the function works normally. Maybe this is already stated somewhere in the help file, but I just wanted to make you aware of this.

Balassa index for more years

Hi Mauricio,
To keep things organized, I am placing here my question - calculating the Balassa Index by year.
I am using your package economiccomplexity in R and I wonder if another parameter could be added (next to country and product, to also add the year)?
Thank you very much for your help! Cristina

Trade data of chapter world_trade_avg_1998_to_2000

Hi, the world_trade_avg_1998_to_2000 has no trade data of chapter 10 i.e. 1001-1008.

library(economiccomplexity)
library(tidyverse)

world_trade_avg_1998_to_2000 |> count(product) |> pull(product)

Function Description

Functions are missing @description that explains in more detail what each function do, e.g.: formula used to compute revealed comparative advantage

Data Source

This probably stems from a lack of knowledge of econometrics. I was wondering where one might find read-world data that could be used with the package.

I am asking because I see that all the built-in datasets are sparse matrices which, as far as I know, very few real-world data sources would return such formats.

class(world_trade_avg_1998_to_2000)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"

If I am mistaken, could you please point me in the direction of such a data source? If I am not wrong and that such data is rarely available out there in such a format, I believe the package should include:

A demo of how one might go about using it with real-world data, e.g. tradestatistics, which you also built I believe.
If the turning the "usual" real-world dataset to a sparse matrix is not easy, perhaps the pacakge should also include a convenience function(s) to do so.

fixes for v0.1.3

networks with sparse matrix input
rca with matrix output and values column with any name

ECI Eigenvalue calculation + Fitness iteration

Hi Pacha,
congrats to this nice package, I believe it's very useful!
However, I have noticed two things In the function complexity_measures which I also encountered in my work and which I wanted to bring to your attention:

When using the method eigenvalues, it can happen (and happened in my application with patent data) that R gives you the flipped eigenvector. I believe this can just happen mathematically (https://stats.stackexchange.com/questions/154716/pca-eigenvectors-of-opposite-sign-and-not-being-able-to-compute-eigenvectors-wi). What I do and what PA Balland does in his package is that I check the correlation between the obtained eigenvector and the result using MoR (because it doesn't suffer from this problem). If the correlation is negative, I just multiply the eigenvector with -1. I hope you understand what I mean and this was helpful.
When calculating the fitness, you don't need to use the (iteration -1)th element when calculating the normalised fitness values, this is only needed with the MoR :)

Cheers

rca

CRAN version has function named rca available, current dev Github version does not. To the best of my knowledge rca should be first deprecated before being completely removed, giving time to the users of the package to adjust their code.

Balassa Index

Hello. My name is David Casoratti, from Argentina and I really like your package. It help´s me to me a lot time of work. But I have an issue with the Balassa Index. By Default, when i source the code, the output is a Matrix of 1. But When I use the balassa_index with discrete = False, there is no problem.

Coding:

RCA <- balassa_index(
Dataset_From_Stata,
discrete = FALSE,
country = "location_code",
product = "hs_product_code",
value = "export_value")

(Has no problem)

MM <- balassa_index(
Dataset_From_Stata,
country = "location_code",
product = "hs_product_code",
value = "export_value")

(Has Problem)

I have to fix ir with the Following code:

threshold = 1
MM@x[MM@x < threshold] <- 0
MM@x[MM@x > threshold] <- 1

I don´t know if this is a problem mine, or if it is a problem of the code. Because I tried it with another devices, and the problem Continues. Have a Nice Weekend!

add ifelse statements for mixed tibble/matrix/numeric input in...

productivity_levels

Data comes grouped

Hi, the world_trade_avg_1998_to_2000 comes grouped and gives error in some operations:

Compare this two versions of code:

library(economiccomplexity)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data(world_trade_avg_1998_to_2000)

glimpse(world_trade_avg_1998_to_2000)
#> Rows: 124,336
#> Columns: 3
#> Groups: reporter_iso [226]
#> $ country <chr> "afg", "afg", "afg", "afg", "afg", "afg", "afg", "afg", "afg",~
#> $ product <chr> "0011", "0012", "0111", "0112", "0113", "0116", "0223", "0224"~
#> $ value   <dbl> 30068, 16366, 19273, 893, 350, 1561, 851, 12884, 114673, 67796~

world_trade_avg_1998_to_2000 <- world_trade_avg_1998_to_2000 %>% 
  filter(!(country %in% c("ant", "rom", "scg", "fsm", "umi")))
#> Error in if (!is.data.frame(groups) || tail(names(groups), 1L) != ".rows") {: valor ausente donde TRUE/FALSE es necesario

^{Created on 2021-05-18 by the reprex package (v2.0.0)}

and

library(economiccomplexity)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data(world_trade_avg_1998_to_2000)

glimpse(world_trade_avg_1998_to_2000)
#> Rows: 124,336
#> Columns: 3
#> Groups: reporter_iso [226]
#> $ country <chr> "afg", "afg", "afg", "afg", "afg", "afg", "afg", "afg", "afg",~
#> $ product <chr> "0011", "0012", "0111", "0112", "0113", "0116", "0223", "0224"~
#> $ value   <dbl> 30068, 16366, 19273, 893, 350, 1561, 851, 12884, 114673, 67796~

world_trade_avg_1998_to_2000 <- world_trade_avg_1998_to_2000 %>% 
  ungroup() %>% 
  filter(!(country %in% c("ant", "rom", "scg", "fsm", "umi")))

^{Created on 2021-05-18 by the reprex package (v2.0.0)}

allow arbitrary colnames in complexity function

Reference Paper

Hi Mauricio,

In the paper I read that the package depends on the tidyverse but in the DESCRIPTION I see no tidyverse packages.

I could be missing something.

Function to calculate product density

Hello!

A function to calculate product density could very easily be added to the package by, mimicking the distance function, writing:

density <- function(balassa_index, proximity_product) {
  # sanity checks ----
  if (!(any(class(balassa_index) %in% "dgCMatrix") == TRUE)) {
    stop("'balassa_index' must be a dgCMatrix")
  }

  if (!(any(class(proximity_product) %in% "dsCMatrix") == TRUE)) {
    stop("'proximity_product' must be a dgCMatrix")
  }

  return(
    tcrossprod(balassa_index, proximity_product / rowSums(proximity_product))
  )
}

The only difference is that, instead of 1-balassa_index, we use balassa_index.

Density measures are commonly used in the literature. See for example Balland et al. (2019).

Obs.: I tried it locally and it worked only after I specified Matrix::tcrossproduct() and Matrix:rowSums()

Projections - Docs

The man page of ?projections reads proximity, see ref