Giter Club home page Giter Club logo

featuretoolsr's People

Contributors

atusy avatar grayskripko avatar praktiskt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

featuretoolsr's Issues

Is there a way to use es$plot() ?

I'd like to plot an entity set. But upon installing graphviz, the plot function only returns digraph metadata, not an actual plot

give more intructions

Hi, I wonder if you could provide more instructions on how to use FeaturetoolsR? I found this as a very interesting package but just can't figure out how to use it properly. Thank you!

Existing primitives are not found

A number of primitives listed in list_primitives() throw an error:

Invalid transform primitive(s): mean. Use list_primitives() to find valid primitives.

Add tidy support for colnames

When calling tidy_feature_matrix variable names become very non-R-like.

Clean variable names using regexes, something like:

tidynames <- function(df) {
  n <- tolower(names(df))
  tn <- gsub("[^A-z0-9]", "_", n)
  tn <- gsub("(_+?$)|(__+?)", "", tn)
  names(df) <- tn
  return(df)
}

tidy_feature_matrix error: column `value` must be...

Based on the demo example. I got an error and then did debug(tidy_feature_matrix) to investigate the problem and its location:

to_r <- tibble::as.tibble(reticulate::py_to_r(.data[[1]]))
> Error: Column `value` must be a 1d atomic vector or a list

I think the solution is here

reticulate::py_to_r(.data[[1]]) %>% str

> .frame':	100 obs. of  1 variable:
 $ value:[y, z, z, o, x, ..., n, r, q, z, q]
Length: 100
Categories (25, object): [a, b, c, d, ..., w, x, y, z]
 - attr(*, "pandas.index")=Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  
             12,  13,
             14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
             27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
             40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
             53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
             66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
             79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
             92,  93,  94,  95,  96,  97,  98,  99, 100],
           dtype='int64', name='key')

Seems .data[[1]] isn't a plain pandas DataFrame

-- UPDATE --
Full code to reproduce the error

library(featuretoolsR)
library(magrittr)

set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T))

as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo") %>% 
dfs(
    target_entity = "set_1", 
    trans_primitives = c("and", "divide")) %>% 
    tidy_feature_matrix(remove_nzv = T, nan_is_na = T)

Ensure pip and virtualenv exists

install_featuretools() lack checks to reliably inform user if pip or virtualenv is missing.

Is a user is missing virtualenv the default message is good enough.

If a user has virtualenv but not pip the created virtualenv gets bricked. Can be (non-intuitively) be fixed by setting custom_virtualenv to true post pip installation.

Add checks to zzz.R

Add support to install_featuretools() to not create a virtualenv until pip exists.

library("featuretoolsR") error

library("featuretoolsR")
╔═════════════════════════╗
║ featuretoolsR 0.4.4 ║
╚═════════════════════════╝
错误: package or namespace load failed for ‘featuretoolsR’:
attachNamespace()里算'featuretoolsR'时.onAttach失败了,详细内容:
调用: py_get_attr_impl(x, name, silent)
错误: AttributeError: module 'featuretools' has no attribute 'version'

add_relationship signature

I met a problem trying to execute add_relationship() and I solved it only after reading the python featuretools documentation. https://docs.featuretools.com/generated/featuretools.Relationship.html#featuretools.Relationship

  1. It's hard to understand where to place parent and child set arguments when arguments called "set1" and "set2". I suggest to call them as in the original version: parent_variable, child_variable or maybe parent_set, child_set.

  2. Is it possible to have 2 separate arguments for parent_idx and child_idx in order to avoid aligning your entity column names?

Column True is not a string

as_entityset(data.frame(a = 1:3))

> 2018-12-24 01:26:37,591 featuretools.entityset - WARNING    index True not found in dataframe, creating new integer column
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: All column names must be strings (Column True is not a string)
In addition: Warning message:
In as_entityset(data.frame(a = 1:3)) :
 
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: All column names must be strings (Column True is not a string) 

Problem with dates after the last reticulate update

"Reticulate now always converts R Date objects into Python datetime objects. Note that these conversions can be inefficient -- if you would prefer conversion to NumPy datetime64 objects / arrays, you should convert your date to POSIXct first."
https://github.com/rstudio/reticulate/blob/master/NEWS.md

The next lines take a couple of seconds on my machine with a good CPU

rep(as_date("2019-01-01"), 500) %>% reticulate::r_to_py()

I fixed it in my project with

r_tibble %>% mutate_if(is.Date, as.POSIXct) %>% reticulate::r_to_py()

You should add this kind of mutation for every incoming R data.frame

Readme demo no longer works? Unable to add relationship because child variable is also its index

Hi,

I'm trying to run the demo and it stops at add_relationship.

library(magrittr)
set_1 <- data.frame(key = 1:100, value = sample(letters, 100, TRUE), stringsAsFactors = TRUE)
set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, TRUE), stringsAsFactors = TRUE)

as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo") %>%
  add_entity(entity_id = "set_2", df = set_2, index = "key") %>%
  add_relationship(
    parent_set = "set_1",
    child_set = "set_2",
    parent_idx = "key",
    child_idx = "key"
  )

Error:

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: Unable to add relationship because child variable 'key' in 'set_2' is also its index

I think it might be related to this new error message from this Jun 2020 issue, on featuretools.

I've played around the code but have a hard time understanding how to fix this.
Is there a quick fix?

Thanks for you help!

Diagnotic info:

SessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363) 
reticulate_1.19      featuretoolsR_0.4.4  magrittr_2.0.1
reticulate::py_discover_config() 
python:         C:/Anaconda3/envs/r-reticulate/python.exe
libpython:      C:/Anaconda3/envs/r-reticulate/python36.dll
pythonhome:     C:/Anaconda3/envs/r-reticulate
version:        3.6.13 (default, Feb 19 2021, 05:17:09) [MSC v.1916 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Anaconda3/envs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.19.5

python versions found: 
 C:/Anaconda3/envs/r-reticulate/python.exe
 C:/Anaconda3/python.exe

distributed.core annoying messages when n_jobs > 1

I get tens of

distributed.core - INFO - Event loop was unresponsive in Nanny for 1276.34s.  
This is often caused by long-running GIL-holding functions or moving large chunks of data. 
This can cause timeouts and instability.

Have you met these warnings? Do you know how to deal with them?
Spent many hours and tried different approaches with python log settings, featuretools settings, reticulate capturing these print messages, capture.output() and sink in R

Error in py_call_impl(callable, dots$args, dots$keywords)

Hi there,

I've tried this package as instructed in README.md but got the error message after executing

ft_matrix <- es %>%
  dfs(
    target_entity = "set_1", 
    trans_primitives = c("and", 'divide')
  )

error message

' Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: ('Unknown transform primitive divide. ', 'Call ft.primitives.list_primitives() to get', ' a list of available primitives') '

What went wrong?

Here's my sessioninfo:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950  LC_CTYPE=Chinese (Traditional)_Taiwan.950   
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950 LC_NUMERIC=C                                
[5] LC_TIME=Chinese (Traditional)_Taiwan.950    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2       magrittr_1.5         featuretoolsR_0.1.0  RevoUtils_11.0.1     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] reticulate_1.12    tidyselect_0.2.4   reshape2_1.4.3     purrr_0.2.5        splines_3.5.1     
 [6] lattice_0.20-38    colorspace_1.3-2   generics_0.0.2     stats4_3.5.1       yaml_2.2.0        
[11] survival_2.44-1.1  prodlim_2018.04.18 rlang_0.3.4        ModelMetrics_1.1.0 pillar_1.4.1      
[16] glue_1.3.0         withr_2.1.2        foreach_1.5.0      bindr_0.1.1        plyr_1.8.4        
[21] lava_1.6.5         stringr_1.4.0      timeDate_3043.102  munsell_0.5.0      gtable_0.3.0      
[26] recipes_0.1.5      devtools_1.13.6    codetools_0.2-16   memoise_1.1.0      caret_6.0-80      
[31] class_7.3-14       Rcpp_0.12.18       scales_0.5.0       ipred_0.9-6        jsonlite_1.5      
[36] ggplot2_3.0.0      digest_0.6.19      stringi_1.1.7      dplyr_0.7.6        grid_3.5.1        
[41] tools_3.5.1        lazyeval_0.2.1     tibble_1.4.2       crayon_1.3.4       pkgconfig_2.0.2   
[46] MASS_7.3-50        Matrix_1.2-17      lubridate_1.7.4    gower_0.1.2        assertthat_0.2.1  
[51] rstudioapi_0.10    iterators_1.0.10   R6_2.4.0           rpart_4.1-13       nnet_7.3-12       
[56] nlme_3.1-137       compiler_3.5.1    

Readme code does not work

I have just gone through the code as given in Readme last line
tidy <- tidy_feature_matrix(ft_matrix, remove_nzv = T, nan_is_na = T)

gives following:

Removing near zero variance variables
Error in as.vector(x, mode) :
cannot coerce type 'environment' to vector of type 'any'

Any help please?

An error in the demo example

library(featuretoolsR)
library(magrittr)

set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T))

as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo") %>% 
dfs(
    target_entity = "set_1", 
    trans_primitives = c("and", "divide")) %>% 
    tidy_feature_matrix(remove_nzv = T, nan_is_na = T)

> Removing near zero variance variables
C:\Users\srskr\ANACON~1\lib\site-packages\pandas\core\arrays\categorical.py:486: 
FutureWarning: Index.itemsize is deprecated and will be removed in a future version
  return self.categories.itemsize
Error in `[.python.builtin.object`(nondupe, , colname) : 
  unused argument (colname)

featuretoolsR::list_primitives() in console

> featuretoolsR::list_primitives()
                                name                              type
1  <environment: 0x000000002bfe8ef8> <environment: 0x000000002bcdad38>
2                               <NA>                              <NA>
3                               <NA>                              <NA>
...                             ...                               ...
62                              <NA>                              <NA>

Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

An interesting part here is that everything is ok when I tried to build this example with reprex::reprex()
But I can't work with it in my RStudio console

Reload R-session after featuretools installation

Currently the library can't be used until the R session restarted after running install_featuretools().

Upon successful Featuretools installation, unload and reload R-session. Perhaps something like:

cat("Reloading featuretoolsR\n")
unloadNamespace("featuretoolsR")
.rs.restartR() -> .; rm(.)
library(featuretoolsR)

(Not sure if this is allowed by CRAN, should be checked first)

out of bounds when executing tidy_feature_matrix

Hi there!

I just stumbled upon your package and I am incredibly happy someone made the effort implement this. Thanks a lot for this!

I started out with your example and unfortunately, I encountered an error when creating a tidy_feature_matrix (the idea of which I absolutely love!)

# pacman::p_install_gh("magnusfurugard/featuretoolsR")

pacman::p_load(tidyverse, featuretoolsR)

# Create some mock data
set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T))
set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, T))

# Create entityset
es <- as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo")

es <- es %>%
  add_entity(
    df = set_2, 
    entity_id = "set_2", 
    index = "key"
  )

es <- es %>%
  add_relationship(
    set1 = "set_1", 
    set2 = "set_2", 
    idx = "key"
  )

ft_matrix <- es %>%
  dfs(
    target_entity = "set_1", 
    trans_primitives = c("and", "divide")
  )

tidy <- tidy_feature_matrix(ft_matrix)

tidy

Error:

# Error in py_call_impl(callable, dots$args, dots$keywords) : IndexError: index 100 is out # of bounds for axis 0 with size 100

Through a traceback, I was able to narrow down the problem to the py_to_r function, which seems to have a problem with the 0 indexing of Python.

See here:

reticulate::py_to_r(ft_matrix[[1]])

Error:

# Error in py_call_impl(callable, dots$args, dots$keywords) : IndexError: index 100 is out # of bounds for axis 0 with size 100

Again, thank you for making this available and I totally understand that a lot of this is probably work in progress. I am just glad someone did this :)

Best,

Fabio

Update

I tried with different data and this seems to work fine. So it seems it has something to do with the example data, maybe?

ft <- reticulate::import("featuretools")

es = ft$demo$load_mock_customer(return_entityset=T)

ft_matrix <- es %>%
  dfs(
    target_entity = "customers", 
    trans_primitives = c("and", "divide")
  )

tidy <- tidy_feature_matrix(ft_matrix)

tidy

Works just fine!

Can't create an entity set: AttributeError: 'EntitySet' object has no attribute 'entity_from_dataframe'

When following the instructions in the README, under the 'Creating and EntitySet' heading. The following code results in an error:

library(featuretoolsR)
library(magrittr)

set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T), a = rep(Sys.Date(), 100))
set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, T), b = rep(Sys.time(), 100))

es <- as_entityset(
  set_1, 
  index = "key", 
  entity_id = "set_1", 
  id = "demo", 
  time_index = "a"
)

The error states:

Error in py_get_attr_impl(x, name, silent) : 
  AttributeError: 'EntitySet' object has no attribute 'entity_from_dataframe'

Session Info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_2.0.2      dplyr_1.0.8         foreign_0.8-81      featuretoolsR_0.4.4

loaded via a namespace (and not attached):
 [1] reticulate_1.24      tidyselect_1.1.2     purrr_0.3.4          reshape2_1.4.4       listenv_0.8.0       
 [6] splines_4.1.2        lattice_0.20-45      colorspace_2.0-3     vctrs_0.3.8          generics_0.1.2      
[11] stats4_4.1.2         utf8_1.2.2           survival_3.2-13      prodlim_2019.11.13   rlang_1.0.1         
[16] ModelMetrics_1.2.2.2 pillar_1.7.0         glue_1.6.2           withr_2.4.3          rappdirs_0.3.3      
[21] foreach_1.5.2        lifecycle_1.0.1      plyr_1.8.6           lava_1.6.10          stringr_1.4.0       
[26] timeDate_3043.102    munsell_0.5.0        gtable_0.3.0         future_1.24.0        recipes_0.2.0       
[31] codetools_0.2-18     caret_6.0-90         parallel_4.1.2       class_7.3-19         fansi_1.0.2         
[36] Rcpp_1.0.8           scales_1.1.1         ipred_0.9-12         jsonlite_1.8.0       parallelly_1.30.0   
[41] png_0.1-7            ggplot2_3.3.5        digest_0.6.29        stringi_1.7.6        rprojroot_2.0.2     
[46] grid_4.1.2           here_1.0.1           hardhat_0.2.0        cli_3.2.0            tools_4.1.2         
[51] tibble_3.1.6         crayon_1.5.0         future.apply_1.8.1   pkgconfig_2.0.3      ellipsis_0.3.2      
[56] MASS_7.3-54          Matrix_1.3-4         data.table_1.14.2    pROC_1.18.0          lubridate_1.8.0     
[61] gower_1.0.0          rstudioapi_0.13      iterators_1.0.14     R6_2.5.1             globals_0.14.0      
[66] rpart_4.1-15         nnet_7.3-16          nlme_3.1-153         compiler_4.1.2 

featuretoolsR::add_relationship new signiture

the new signature of featuretoolsR::add_relationship(entityset, parent_set, child_set, parent_idx, child_idx) break my previos code. I suggest to change it to featuretoolsR::add_relationship(entityset, parent_set, child_set, parent_idx, child_idx=parent_idx)

Update README

The package is now available on CRAN. Update readme to reflect that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.