azure / azure-tdsp-utilities Goto Github PK

Utilities and scripts developed as part of Microsoft's Team Data Science Process for productive data science

License: Creative Commons Attribution 4.0 International

R 0.02% HTML 77.10% Python 0.46% Jupyter Notebook 22.41%

azure-tdsp-utilities's Introduction

Data Science Utilities from Microsoft

Interactive Data Exploration, Analysis, and Reporting | Automated Modeling and Reporting | TDSP

This repository contains a set of Data Science Utilities developed for use in the context of the Team Data Science Process (TDSP).

Release Notes

This is an early preview release of the Data Science Utilities for TDSP. We are continuously improving our data science utilities based on our further accumulated experience and customer requests. Stay tuned for future releases.

Currently, the Data Science Utilities released in this repository include:

You can easily run these utilities on sample data in the Data/Common directory. If you are using Azure Data Science Virtual Machine, all three utilities are instantly ready to run.

Ask Questions.

We would love to hear back from you. Should you have any questions or suggestions, or you want to report a bug, please create issues at TDSP/Issues

Help to Enrich the Set of Utilities

We believe that with the help of the data science community, the set of data science utilities can be significantly enriched and can become more powerful, and can benefit more enterprises and more data scientists. We warmly welcome you to contribute to the data science utilities for TDSP.

What Is TDSP

To know more details about TDSP, check out the TDSP documents here.

Additional notes

In documentations, Screenshots of RStudio are from the Open Source Edition.

azure-tdsp-utilities's People

Contributors

Stargazers

Watchers

Forkers

strinmk j450h1 fsgp tadasvirsilas strategist922 asabhilash merico34 reply2vikas lejarx klupido sumendar mpisaac vaibhavthapliyal123 abhi0991 haritha91 ajfstats algunion borhan48 anjames16 benjamesbabala satishjasthi johnsonhsieh sourceilsusa mtisaac segxy papageorgiou jamiefo shivam11 robertmccausland tozammel rnavarromatesanz sba22230 aleale14 felipelageduarte grldsndrs astridkatrine 123saga delkyd handong890 gpfvic harrisonding alg-jmx piconsulting jflesaux danlaunv marqus45 justinnafe kharerachit waynejohnn tmike21 pomelopeel bnbwn ofergold dinicholson mindis henrilin28 evwhiz ampatha zhd bcolas andymason57 quinnlin pelonza lalithakishore singingdata vinaysinghdxc jorge3fernandes send2cloud samir72 emanceau cvandyke bhagu domiriet bgibson8708 ocanamat arunnairid unityculture charlie029 chenweisomebody126 xinancsd pcofre uneidel bluegranite lxyea herbertwang-me mathewnik90 hangyao akshaya-a james-fu sergiepro journeytothecode sseguraquerol kj00 kanellisg gredoy nemochina2008 dmonder shoreshgithub diseworks veena-calambur

azure-tdsp-utilities's Issues

IDEAR's Server param

Sorry if this is a silly issue... I'm trying to apply IDEAR on some data I have in a local SQL Server 2016 instance (everything is running on the same box, an Azure VM). I followed your directions for creating a .yaml but I must be messing up on the Server parameter. When I run IDEAR and select the .yaml, IDEAR keeps returning "Error: first argument is not an open RODBC channel".

So, what exactly should my argument be for the default SQL Server on the local box? My various attempts at using localhost didn't work.

I can query the DB just fine from SSMS, so the DB is alive and functional. I can also run IDEAR on your example datasets (para-adult.yaml and para-bike-rental.yaml), so the R/IDEAR side is also functional.

Any help would be appreciated, thanks.

On Linux it doesn't generate the report

As shown in http://imgur.com/zlMQF0Ql.png it doesn't generate the report although it's doing ok in the whole process. this is the yaml file that im using:
``DataFilePath:
hour.csv
HasHeader:
Yes
Separator:
','
Target:

ColumnsToExclude:
instant
RLogFilePath:
bikesharing-hour.log.r``

(for data exploration task)

thx in advance!

Something Broke...

I had students using the IDEAR tool two weeks ago without any real problems, but this week I had a different student download the whole package and had issues getting the code to run even on the sample data.

One: Iine 31 of the IDEAR.rmd there's a floating "css: style.css" that seems to stop the r-markdown from actually running.

After commenting it out we seemed to be able to get it running, though we were still having version of R and pandoc issues (had to install the newest version for it to work).

Warning: pandas.core.datetools deprecated

I've been using the Python IDEAR notebook and when loading the modules, I came across the following warning:

FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.

from pandas.core import datetools

reuse existing environment? scalability plans?

Hi,

IDEAR seems like a very cool tool...thanks for sharing. A couple questions...

I already have a couple versions of R installed on my machine (MS R Server, In-SQL & Stand-Alone), as well as R Tools for VS. Is it possible to use your IDEAR tool within that environment, or am I going to have to install yet more?

Also, do you have plans to make the IDEAR tool not limited by in-memory data frames? I've been computing with RevoScaleR just for this purpose, and it would be great to not have to backtrack. For some things sampling will work (e.g., first-order stats), but for others sampling is undesirable (e.g., outliers).

Thanks.

-Rob

Unable to see the generate final button and also unable to find the intermediate .pynb files generated

After running the cells in the IDEAR.ipynb in my local machine, I was unable to see the generate final report button and also the tmp export directory is empty.

Can anyone suggest me how to solve the issues i am facing.

Thanks in advance :)

By the way i am using ipywidgets 4.0...is there something to do with this?
I am unable to see most of the buttons in the scripts...

AzureML Studio Dataset as source

Hi,

In this current release only flat files and SQL Server query result sets can be used as source for IDEAR. Is there any plan to allow IDEAR to connect to a transient data set (after initial cleansing and transforming) in AML Studio ?

Best regards,

Carlos Abramo

Create Custom Visualization

Hello,

Besides the great built-in visualizations available, is it possible (or planned) to create a custom dashboard with my preferred visualizations in a single page ? It should be great also to be possible to correlate the attributes in the dashboard as we do on PowerBI.

Best regards,

Carlos Abramo

Error in IDEAR.ipynb ('int' object has no attribute 'children')

Platform: Windows 10
Environment : Anaconda Python3.6 (also checked with Python 3.5 and Python2.7)

Error in IDEAR.ipynb in Cell : Explore the target variable and onwards

Error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 i = interactive(TargetAnalytics.custom_barplot, df=fixed(df), filename=fixed(filename), col1=w1, Export=w_export)

C:\Anaconda3\envs\python36\lib\site-packages\ipywidgets\widgets\interaction.py in init(self, _interactive__interact_f, interactive__options, **kwargs)
160 getcallargs(f, **{n:v for n,v, in new_kwargs})
161 # Now build the widgets from the abbreviations.
--> 162 self.kwargs_widgets = self.widgets_from_abbreviations(new_kwargs)
163
164 # This has to be done as an assignment, not using self.children.append,

C:\Anaconda3\envs\python36\lib\site-packages\ipywidgets\widgets\interaction.py in widgets_from_abbreviations(self, seq)
258 if not (isinstance(widget, ValueWidget) or isinstance(widget, fixed)):
259 if widget is None:
--> 260 raise ValueError("{!r} cannot be transformed to a widget".format(abbrev))
261 else:
262 raise TypeError("{!r} is not a ValueWidget".format(widget))

ValueError: <ipywidgets.widgets.widget_button.Button object at 0x000001FA6840E780> cannot be transformed to a widget

AttributeError Traceback (most recent call last)
in ()
21 get_ipython().magic('reset_report')
22 get_ipython().magic('add_interaction_code_to_report i = interactive(TargetAnalytics.custom_barplot, df=fixed(df), filename=fixed(filename), col1=w1, Export=w_export)')
---> 23 hbox = widgets.HBox(i.children)
24
25 display(hbox)

AttributeError: 'int' object has no attribute 'children'`

Error in sys.frame(1) : not that many frames on the stack

Hello there,

I'm trying to run this script on a Mac in Rstudio and I get this error message when running this line

script.dir <- dirname(sys.frame(1)$ofile)
:

Error in sys.frame(1) : not that many frames on the stack

Here is the output from Sys.getenv if that helps:

Sys.getenv()
__CF_USER_TEXT_ENCODING
0x1F5:0x0:0x0
Apple_PubSub_Socket_Render
/private/tmp/com.apple.launchd.j6KVlXR0RP/Render
DISPLAY /private/tmp/com.apple.launchd.PM723wEEf3/org.macosforge.xquartz:0
DYLD_FALLBACK_LIBRARY_PATH
/Library/Frameworks/R.framework/Resources/lib:/Users/jas/lib:/usr/local/lib:/usr/lib::
EDITOR vi
GIT_ASKPASS rpostback-askpass
HOME /Users/jas
LANG en_US.UTF-8
LC_CTYPE en_US.UTF-8
LN_S ln -s
LOGNAME jas
MAKE make
PAGER /usr/bin/less
PATH /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/local/git/bin:/Library/TeX/texbin
R_BROWSER /usr/bin/open
R_BZIPCMD /usr/bin/bzip2
R_DOC_DIR /Library/Frameworks/R.framework/Resources/doc
R_GZIPCMD /usr/bin/gzip
R_HOME /Library/Frameworks/R.framework/Resources
R_INCLUDE_DIR /Library/Frameworks/R.framework/Resources/include
R_LIBS_SITE
R_LIBS_USER ~/Library/R/3.3/library
R_PAPERSIZE a4
R_PDFVIEWER /usr/bin/open
R_PLATFORM x86_64-apple-darwin13.4.0
R_PRINTCMD lpr
R_QPDF /Library/Frameworks/R.framework/Resources/bin/qpdf
R_RD4PDF times,inconsolata,hyper
R_SESSION_TMPDIR
/var/folders/_4/n0yjps8x5dv1qdkn7s2t2zr40000gn/T//RtmpjIKkAt
R_SHARE_DIR /Library/Frameworks/R.framework/Resources/share
R_SYSTEM_ABI osx,gcc,gxx,gfortran,?
R_TEXI2DVICMD /usr/local/bin/texi2dvi
R_UNZIPCMD /usr/bin/unzip
R_ZIPCMD /usr/bin/zip
RMARKDOWN_MATHJAX_PATH
/Applications/RStudio.app/Contents/Resources/resources/mathjax-26
RS_RPOSTBACK_PATH
/Applications/RStudio.app/Contents/MacOS/rpostback
RS_SHARED_SECRET
51355adb-a590-4f62-9b6a-2882f6ca0e12
RSTUDIO 1
RSTUDIO_PANDOC /Applications/RStudio.app/Contents/MacOS/pandoc
RSTUDIO_SESSION_PORT
29342
RSTUDIO_USER_IDENTITY
jas
RSTUDIO_WINUTILS
bin/winutils
SED /usr/bin/sed
SHELL /bin/bash
SSH_AUTH_SOCK /private/tmp/com.apple.launchd.Xd0PSFedE6/Listeners
TAR /usr/bin/tar
TMPDIR /var/folders/_4/n0yjps8x5dv1qdkn7s2t2zr40000gn/T/
USER jas
XPC_FLAGS 0x0
XPC_SERVICE_NAME
0
YOUR_VAR abc123
`

Warning: Error in winDialog: winDialog() cannot be used non-interactively

Hi,
I am trying to run IDEAR.rmd and get the following error "Warning: Error in winDialog: winDialog() cannot be used non-interactively". Any idea.

Regards,
Amit

session Info:

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rmarkdown_1.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 digest_0.6.10 mime_0.5 R6_2.1.3 xtable_1.8-2 magrittr_1.5 evaluate_0.9
[8] stringi_1.1.1 miniUI_0.1.1 shinyjs_0.7 tools_3.3.1 stringr_1.0.0 shiny_0.13.2 httpuv_1.3.3
[15] yaml_2.1.13 rsconnect_0.4.3 htmltools_0.3.5

the rmarkdown log is listed below:

Loading required package: shiny

Listening on http://127.0.0.1:3994

|.. | 3%
inline R code fragments

|.... | 6%
label: unnamed-chunk-1 (with options)
List of 3
$ echo : logi FALSE
$ message: logi FALSE
$ warning: logi FALSE

processing file: IDEAR.rmd
Quitting from lines 23-240 (IDEAR.rmd)

Warning: Error in winDialog: winDialog() cannot be used non-interactively
Stack trace (innermost first):
105: winDialog
104: eval [#29]
103: eval
102: withVisible
101: withCallingHandlers
100: handle
99: evaluate_call
98: evaluate
97: in_dir
96: block_exec
95: call_block
94: process_group.block
93: process_group
92: withCallingHandlers
91: process_file
90: knitr::knit
89:
88: do.call
87: contextFunc
86: .getReactiveEnvironment()$runWith
85: shiny::maskReactiveContext
84: reactive reactive({
out <- rmd_cached_output(file, encoding)
output_dest <- out$dest
if (out$cached) {
if (nchar(out$resource_folder) > 0) {
shiny::addResourcePath(basename(out$resource_folder),
out$resource_folder)
}
return(out$shiny_html)
}
if (!file.exists(dirname(output_dest))) {
dir.create(dirname(output_dest), recursive = TRUE, mode = "0700")
}
resource_folder <- knitr_files_dir(output_dest)
perf_timer_reset_all()
dependencies <- list()
shiny_dependency_resolver <- function(deps) {
dependencies <<- deps
list()
}
output_opts <- list(self_contained = FALSE, copy_resources = TRUE,
dependency_resolver = shiny_dependency_resolver)
message("\f")
args <- merge_lists(list(input = reactive_file(), output_file = output_dest,
output_dir = dirname(output_dest), output_options = output_opts,
intermediates_dir = dirname(output_dest), runtime = "shiny",
envir = new.env()), render_args)
result_path <- shiny::maskReactiveContext(do.call(render,
args))
if (!dir_exists(resource_folder))
dir.create(resource_folder, recursive = TRUE)
shiny::addResourcePath(basename(resource_folder), resource_folder)
dependencies <- append(dependencies, list(create_performance_dependency(resource_folder)))
write_deps <- base::file(file.path(resource_folder, "shiny.dep"),
open = "wb")
on.exit(close(write_deps), add = TRUE)
serialize(dependencies, write_deps, ascii = FALSE)
if (!isTRUE(out$cacheable)) {
shiny::onReactiveDomainEnded(shiny::getDefaultReactiveDomain(),
function() {
unlink(result_path)
unlink(resource_folder, recursive = TRUE)
})
}
shinyHTML_with_deps(result_path, dependencies)
})
73: doc
72: shiny::renderUI
71: func
70: output$reactivedoc
3:
2: do.call
1: rmarkdown::run

An Error message when running the script Run-IDEAR.R .

My environment:

R V3.3.1
RStudio V1.0.136

To start IDEAR,
I run the script "Run-IDEAR.R" in RStudio,but there is an error message on the shinny browser
"Error: missing value where TRUR/FALSE needed"

Any help would be appreciated, thanks.

add_conf_code_to_report() got an unexpected keyword argument 'local_ns'

Hello,

Thanks for sharing this tools.

I'm trying to generate IDEAR report using jupyter notebook but I have the beloz errors regarding the add_conf_code_to_report() function. I have never used the magic functions So could anyone please help get through this error.

TypeError Traceback (most recent call last)
in
----> 1 get_ipython().run_cell_magic('add_conf_code_to_report', '', "import os\nworkingDir = 'C:\\GitRepos\\DGADSCommon\\Utilities\\DataScienceUtilities\\DataReport-Utils\\Python'\nos.chdir(workingDir)\n\nconf_file = '.\\para-adult.yaml'\nSample_Size = 10000\n\nexport_dir = '.\\tmp\\'\n")

~/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2360 with self.builtin_trap:
2361 args = (magic_arg_s, cell)
-> 2362 result = fn(*args, **kwargs)
2363 return result
2364

TypeError: add_conf_code_to_report() got an unexpected keyword argument 'local_ns'

IDEAR Updates?

Is Microsoft going to continue to improve the TDSP Utilities?

Doesn't Replace Old Plots

Most of the "interactive" plots available in IDEAR, when you change the variables, makes a smaller plot next to the drop-down, rather than replacing the larger figures generated.

This has two problems:

It's unclear what (ought) to get exported.
It's hard to zoom/see other interactives.

Bug in extracting new date/time variables in the IDEAR.rmd file

I found a bug in the chunk of code that creates new variables from a date/time one.
Here my suggestions:

autogen_datetime_columns <- character()
if(!is.null(config$DateTimeColumns[[1]])){
  for (i in 1:length(config$DateTimeColumns) ) {  #<-- new
  #for (dt in names(config$DateTimeColumns)) {    #<-- old
    dt <- names(config$DateTimeColumns[[i]])
    data[[dt]] <- as.POSIXct(data[[dt]], format = config$DateTimeColumns[[i]][[dt]])  #<-- new
    #data[[dt]] <- as.POSIXct(data[[dt]], format = config$DateTimeColumns[[dt]])      #<-- old
...

HTH

Feature request: update python notebook to work with current versions of relevant packages

Would like to see the python version of notebooks get updated to work with current versions of toolkits, particularly ipywidgets and numpy

Error package 'knitr' not found

After selecting the YAML file, i got the following error:

Error package 'knitr' not found

This is the Console Output:

Warning: Error in utils::packageVersion: package ‘knitr’ not found
Stack trace (innermost first):
109: utils::packageVersion
108: knit_meta_reset
107: render
106: discover_rmd_resources
105: find_external_resources
104: copy_render_intermediates
103: output_format$intermediates_generator
102:
101: do.call
100: contextFunc
99: .getReactiveEnvironment()$runWith
98: shiny::maskReactiveContext
97:
86: doc
85: shiny::renderUI
84: func
83: origRenderFunc
82: output$reactivedoc
7:
6: do.call
5: rmarkdown::run
4: eval [~/Team Data Science Process/Azure-TDSP-Utilities-master/DataScienceUtilities/DataReport-Utils/Run-IDEAR.R#14]
3: eval
2: withVisible
1: source

What is wrong with it?

Lifecycle in Data Science

Hi,

I wanted to know, where do you include the testing in the data science because for the robust system dealing with several possibilities we need to test our model

Compare to traditional software lifecycle, how do you correlate with the data science life cycle?

versions not aligned in knitr

And also the IDEAR new is missing

Challenges with Error Messages in Binary-Classification Modeling

I've been trying to use the binary classification module on classic Titanic data-set and running into quite a few challenges...

Some of this could be cleared up by making sure there's better error-code catching/reporting...

For example:

I was running with missing values still in the Age feature (from the basic training dataset from Kaggle.com). -- The error I was getting was that I was trying to sort something that was a list. After a lot of head-scratching, I realized it was trying to sort something that still had non-numeric values in it... which was probably Age...
After removing/dealing with them, I was able to get AMR to run (farther).
After cleaning the data up some more (and limiting it to a lot less columns to reduce issues-- Specifically: Age and Fare) ... I'm getting an error when trying to run the glmnet model, that "train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()"
After some investigation, the controlObject for glmnet IS properly set to have classProbs=TRUE... but obviously there's an error somewhere in actually computing those class probabilities which did NOT properly raise an error/exception. I'm still trying to trace back and figure out where that might be... but there's clearly some info missing in these error messages...

It's also possible there's a lot more errors in my "input" files...

Perhaps an alternative would be to have a more comprehensive "check dataset" tool that made sure the input data-sets (as specified by the yaml with exclusions/inclusions) met the expected formats to be able to run on models. Then, if not, give a report of errors.

In some sense, this seems to be missing between the IDEAR and AMR tools... while IDEAR lets you see what state the data is in, there's not (or at least, I seem to have missed it) clear specifications on the condition data/data-frames need to be in for running AMR.

Enhancement: Separate Plotting

We run into an issue recently where we wanted to increase the font size on some of the plots, specifically the pie plots.

I am not sure I see the extra value in nesting and hiding the combined plot call. I would suggest just pulling the plot calls directly into the notebook. this would more easily facilitate a user pulling the notebook code chunk out and reproducing/modifying the plot.

Another thing to consider would be a move to bokeh for the visualizations to allow more interaction and adjustment of the plots before exporting them.

BinaryClassification_UCI_Income.yaml incorrectly ordered.

The example BinaryModelSelection.rmd file is throwing an error when executing on the BinaryClassification_UCI_Income.yaml. The issue occurs because the parameter grid for runXgBoost is created with switched values of xgBoostsubsample and min_child_weight. The indexing should be switched to match the order of those parameters in the yaml (or the order of the parameters in the yaml should be switched).

[IDEAR] Error in Rank Variables

Hi all,

I'm trying to get info from the train.csv file you can find here:

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

I'm using this yaml file:

DataFilePath:
    'Z:\<your_path>\train.csv'
HasHeader:
    Yes
Separator:
    ','
CategoricalColumns:
    - MSSubClass
    - MSZoning
    - Street
    - Alley
    - LotShape
    - LandContour
    - Utilities
    - LotConfig
    - LandSlope
    - Neighborhood
    - Condition1
    - Condition2
    - BldgType
    - HouseStyle
    - OverallQual
    - OverallCond
    - RoofStyle
    - RoofMatl
    - Exterior1st
    - Exterior2nd
    - MasVnrType
    - ExterQual
    - ExterCond
    - Foundation
    - BsmtQual
    - BsmtCond
    - BsmtExposure
    - BsmtFinType1
    - BsmtFinType2
    - Heating
    - HeatingQC
    - CentralAir
    - Electrical
    - KitchenQual
    - Functional
    - FireplaceQu
    - GarageType
    - GarageFinish
    - GarageQual
    - GarageCond
    - PavedDrive
    - PoolQC
    - Fence
    - MiscFeature
    - SaleType
    - SaleCondition
NumericalColumns:
    - LotFrontage
    - LotArea
    - YearBuilt
    - YearRemodAdd
    - MasVnrArea
    - BsmtFinSF1
    - BsmtFinSF2
    - BsmtUnfSF
    - TotalBsmtSF
    - YearRemodAdd
    - 1stFlrSF
    - LowQualFinSF
    - GrLivArea
    - BsmtFullBath
    - BsmtHalfBath
    - FullBath
    - HalfBath
    - Bedroom
    - Kitchen
    - TotRmsAbvGrd
    - Fireplaces
    - GarageYrBlt
    - GarageCars
    - WoodDeckSF
    - OpenPorchSF
    - EnclosedPorch
    - 3SsnPorch
    - ScreenPorch
    - PoolArea
    - MiscVal
    - MoSold
    - YrSold
    - SalePrice
ColumnsToExclude:
    - Id
Target:
    SalePrice
RLogFilePath:
    'Z:\<your_path>\house_prices.log.r'

When I try to rank the variables versus the SalePrice variable, I get the following error:

task 4 failed - "models were not all fitted to the same size of dataset"

What's wrong?

Thank you

Using Windows: Error in sys.frame(1) : not that many frames on the stack

Hi,
I ran "Run-IDEAR.R" and get the following error "Error in sys.frame(1) : not that many frames on the stack"

I am running R using RStudio on Windows.

Regards,
Amit

session info:

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rmarkdown_1.0

loaded via a namespace (and not attached):
[1] magrittr_1.5 rsconnect_0.4.3 htmltools_0.3.5 tools_3.3.1 Rcpp_0.12.6 stringi_1.1.1 stringr_1.0.0
[8] digest_0.6.10 evaluate_0.9

error:
if (!"rmarkdown" %in% installed_packages){

install.packages("rmarkdown")

}
if (!"shiny" %in% installed_packages){

install.packages("shiny")

}

library(rmarkdown)
script.dir <- dirname(sys.frame(1)$ofile)
Error in sys.frame(1) : not that many frames on the stack
setwd(script.dir)
Error in setwd(script.dir) : object 'script.dir' not found

Issue running IDEAR.ipynb

hi folks, trying to run the IDEAR.ipynb in DSVM using jupyter notebook. I keep getting this error, any ideas?

%%add_conf_code_to_report
import os
workingDir = 'C:\Users\AzureUser\Downloads\IDEAR\DataScienceUtilities\DataReport-Utils\Python'
os.chdir(workingDir)

conf_file = '.\para-adult.yaml'
Sample_Size = 10000

export_dir = '.\tmp\'

TypeError Traceback (most recent call last)
in
----> 1 get_ipython().run_cell_magic('add_conf_code_to_report', '', "import os\nworkingDir = 'C:\\Users\\AzureUser\\Downloads\\IDEAR\\DataScienceUtilities\\DataReport-Utils\\Python'\nos.chdir(workingDir)\n\nconf_file = '.\\para-adult.yaml'\nSample_Size = 10000\n\nexport_dir = '.\\tmp\\'\n")

C:\Miniconda\envs\py37_default\lib\site-packages\IPython\core\interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2360 with self.builtin_trap:
2361 args = (magic_arg_s, cell)
-> 2362 result = fn(*args, **kwargs)
2363 return result
2364

TypeError: add_conf_code_to_report() got an unexpected keyword argument 'local_ns'

[IDEAR] "subscript out of bounds" error

Hi all,

I tried to load the data I get in this page:

http://lib.stat.cmu.edu/datasets/irish.ed

Here is the csv file I'm using.
irish_educational_transitions_data.zip

The yaml file I'm using is this one:

DataFilePath:
    'Z:/<my_path>/irish_educational_transitions_data.csv'
HasHeader:
    Yes
Separator:
    ','
CategoricalColumns:
    - sex
    - educational_level
    - leaving_certificate
    - type_school
NumericalColumns:
    - dvrt
    - father_occupation_score
Target:
    leaving_certificate
RLogFilePath:
    irish_educational_transitions_data.log.r

I'm getting this error:

Quitting from lines 1130-1251 (IDEAR.rmd)

Warning: Error in [: subscript out of bounds
Stack trace (innermost first):
118: eval [#28]
117: eval
116: withVisible
115: withCallingHandlers
114: handle
113: timing_fn
112: evaluate_call
111: evaluate::evaluate
110: in_dir
109: block_exec
108: call_block
107: process_group.block
106: process_group
105: withCallingHandlers
104: process_file
103: knitr::knit
102:
101: do.call
100: contextFunc
99: .getReactiveEnvironment()$runWith
98: shiny::maskReactiveContext
97:
86: doc
85: shiny::renderUI
84: func
83: origRenderFunc
82: output$reactivedoc
7:
6: do.call
5: rmarkdown::run
4: eval [Z:/SolidQ/Tools/Azure-TDSP-Utilities-master/DataScienceUtilities/DataReport-Utils/Run-IDEAR.R#13]
3: eval
2: withVisible
1: source

What's happened?

Thank you.

Creating the "Export" Button

Running the IDEAR notebook in Python3 --- I did a conversion of all code files with 2to3, and am manually converting the prints in the notebook.

When trying to generate the widgets in "In[17]" ... the export button does not generate properly.
I can still run that cell if I take the export button out of the code... but it means exporting any analysis is missing!

Error: object 'cat_columns' not found

Hi all,

I'm using the 0.12 version. I'm importing the csv file through the yaml file you can find as attachment.
I'm getting the "Error: object 'cat_columns' not found".

Is there something wrong in the yaml file?

Thank you.
Price Elasticity.zip

Error in convert: failed to copy rendered pandoc artefact

Hello!

I'm giving a try to this tool. When trying to run the example I got this error after selecting the YAML (both of them)

Warning: Error in convert: failed to copy rendered pandoc artefact to 'C:/Users/pdelb_000/AppData/Local/Temp/RtmpEFeRzj/file15c86d2872d5.html'
Stack trace (innermost first):
    94: convert
    93: <Anonymous>
    92: do.call
    91: contextFunc
    90: .getReactiveEnvironment()$runWith
    89: shiny::maskReactiveContext
    88: reactive reactive({
    out <- rmd_cached_output(file, encoding)
    output_dest <- out$dest
    if (out$cached) {
        if (nchar(out$resource_folder) > 0) {
            shiny::addResourcePath(basename(out$resource_folder), 
                out$resource_folder)
        }
        return(out$shiny_html)
    }
    if (!file.exists(dirname(output_dest))) {
        dir.create(dirname(output_dest), recursive = TRUE, mode = "0700")
    }
    resource_folder <- knitr_files_dir(output_dest)
    perf_timer_reset_all()
    dependencies <- list()
    shiny_dependency_resolver <- function(deps) {
        dependencies <<- deps
        list()
    }
    output_opts <- list(self_contained = FALSE, copy_resources = TRUE, 
        dependency_resolver = shiny_dependency_resolver)
    message("\f")
    args <- merge_lists(list(input = reactive_file(), output_file = output_dest, 
        output_dir = dirname(output_dest), output_options = output_opts, 
        intermediates_dir = dirname(output_dest), runtime = "shiny", 
        envir = new.env()), render_args)
    result_path <- shiny::maskReactiveContext(do.call(render, 
        args))
    if (!dir_exists(resource_folder)) 
        dir.create(resource_folder, recursive = TRUE)
    shiny::addResourcePath(basename(resource_folder), resource_folder)
    dependencies <- append(dependencies, list(create_performance_dependency(resource_folder)))
    write_deps <- base::file(file.path(resource_folder, "shiny.dep"), 
        open = "wb")
    on.exit(close(write_deps), add = TRUE)
    serialize(dependencies, write_deps, ascii = FALSE)
    if (!isTRUE(out$cacheable)) {
        shiny::onReactiveDomainEnded(shiny::getDefaultReactiveDomain(), 
            function() {
                unlink(result_path)
                unlink(resource_folder, recursive = TRUE)
            })
    }
    shinyHTML_with_deps(result_path, dependencies)
})
    77: doc
    76: shiny::renderUI
    75: func
    74: output$__reactivedoc__
     7: <Anonymous>
     6: do.call
     5: rmarkdown::run
     4: eval [C:/Users/pdelb_000/Projects/Azure-TDSP-Utilities/DataScienceUtilities/DataReport-Utils/Run-IDEAR.R#13]
     3: eval
     2: withVisible
     1: source

I'm using Windows 10, Latest RStudio Version 1.0.44

sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shinyjs_0.8          scatterplot3d_0.3-37 vcd_1.4-3            foreach_1.4.3       
[5] RODBC_1.3-14         yaml_2.1.13          shiny_0.13.2         rmarkdown_0.9.6.14  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0      knitr_1.12       magrittr_1.5     MASS_7.3-43      lattice_0.20-33 
 [6] colorspace_1.2-6 xtable_1.8-2     R6_2.1.1         stringr_1.0.0    tools_3.2.2     
[11] miniUI_0.1.1     htmltools_0.3.5  iterators_1.0.8  digest_0.6.8     lmtest_0.9-34   
[16] formatR_1.4      codetools_0.2-14 evaluate_0.9     mime_0.4         stringi_0.5-5   
[21] jsonlite_0.9.21  httpuv_1.3.3     zoo_1.7-12

Thanks!

RUN- IDEAR.R Warning: Error in if: argument is of length zero

Im getting the flowing error, below the erro is my yaml file

label: unnamed-chunk-18 (with options)
List of 3
$ echo : logi FALSE
$ message: logi FALSE
$ warning: logi FALSE

Quitting from lines 1391-1535 (IDEAR.rmd)

Warning: Error in if: argument is of length zero

YAML
DataFilePath:
ptpprogramdata.csv
HasHeader:
Yes
Separator:
','
CategoricalColumns:
- ProgramName
- TeamName
- TypeType
- LOC
- LOCDesc
- ReferralSourceType
- TargetedReferral
- DischargeReason
- discharge_revocation
- DeathInOurCare
- LifeExpectancy
- FinancialClass
- ProgramStatus
NumericalColumns:
- LOS
- STC
ColumnsToExclude:
- pt_program_id
- client_id
RLogFilePath:
ptprogram2.log.r

Data Exploration without Target key in YAML file

If you don't have Target field in your YAML file,

following line 1392 in IDEAR.rmd file fails as there is neither FALSE no TRUE in the if clause:

if(((config$Target %in% config$CategoricalColumns) & length(config$CategoricalColumns) > 1) | (!(config$Target %in% config$CategoricalColumns) & length(config$CategoricalColumns)>=1))

Feature Request: Create a version that is native to spark / databricks environments.

BinaryClassification RMD doesn't properly create factors

The current code in the BinaryClassification.rmd doesn't correctly use R syntax to create factor columns
This is a giant problem for using the "auto" factor feature in the yaml files.

The cuplrit is line 118 in the B-C.rmd
Currently it reads:
if (!is.null(factorCols)) {for (i in 1:length(factorCols)) { trainDF[, factorCols[i]] <- make.names(as.factor(trainDF[, factorCols[i]])) }}

Change that line to:
if (!is.null(factorCols)) {for (i in 1:length(factorCols)) { trainDF$factorCols[i] <-as.factor(trainDF$factorCols[i]) }}

The key difference there is that R doesn't know how to handle lists when converting to factors (it generates some sort errors)... and this avoids that entirely.

With this change I (and the other yaml-file fix) I was able to run the BinaryClassification rmd file...

ValueError: Button(description=u'Export', style=ButtonStyle()) cannot be transformed to a widget

Platform: MacOsX 10.12.2 (Sierra)
Environment : Python 2.7.14 :: Anaconda, Inc.
Error in IDEAR.ipynb in Cell : Explore the target variable and onwards


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-4ec3cfdce941> in <module>()
     26     get_ipython().magic(u'reset_report')
     27     get_ipython().magic(u'add_interaction_code_to_report')
---> 28     i = interactive(TargetAnalytics.custom_barplot, df=fixed(df), filename=fixed(filename), col1=w1, Export=w_export)
     29     hbox = widgets.HBox(i.children)
     30     display(hbox)

/Users/user/anaconda2/lib/python2.7/site-packages/ipywidgets/widgets/interaction.pyc in __init__(self, _interactive__interact_f, _interactive__options, **kwargs)
    192             getcallargs(f, **{n:v for n,v,_ in new_kwargs})
    193         # Now build the widgets from the abbreviations.
--> 194         self.kwargs_widgets = self.widgets_from_abbreviations(new_kwargs)
    195 
    196         # This has to be done as an assignment, not using self.children.append,

/Users/user/anaconda2/lib/python2.7/site-packages/ipywidgets/widgets/interaction.pyc in widgets_from_abbreviations(self, seq)
    292             if not (isinstance(widget, ValueWidget) or isinstance(widget, fixed)):
    293                 if widget is None:
--> 294                     raise ValueError("{!r} cannot be transformed to a widget".format(abbrev))
    295                 else:
    296                     raise TypeError("{!r} is not a ValueWidget".format(widget))

ValueError: Button(description=u'Export', style=ButtonStyle()) cannot be transformed to a widget

Not resolved....(issue #34 )

When I update the ipywidgets, I get the error ...issue #26(Error in IDEAR.ipynb ('int' object has no attribute 'children') #26)

When Launching IDEAR_MRS from VS2017 -> Error in eval: could not find function "tk_choose.files"

Hi Team,

I do not know how extended this issue is (I have faced it twice, with both R versions of IDEAR when executing from VS with RTVS), but upon sourcing 'Run-IDEAR-MRS.R' the following error appears:

Quitting from lines 25-581 (IDEAR-MRS.rmd) 
Warning:
 Error in eval: could not find function "tk_choose.files"

I have repaired the error by calling the tcltk library at the beginning of IDEAR-MRS.rmd. This is a no-issue, but just in case anyone else was facing it too.

Regards,

Generate Report Error - Quitting from lines 10-100 (adult.log.spin.Rmd)

Hi Team!

While generating the Final Report, the following occurs. Has anyone else experienced this and found a fix?

installing source package 'knitr' ...
** package 'knitr' successfully unpacked and MD5 sums checked
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
DONE (knitr)
Quitting from lines 10-100 (adult.log.spin.Rmd)

Warning: Error in file: cannot open the connection
Stack trace (innermost first):
94: file
93: read.table
92: read.csv
91: eval [#31]
90: eval
89: withVisible
88: withCallingHandlers
87: handle
86: timing_fn
85: evaluate_call
84: evaluate
83: in_dir
82: block_exec
81: call_block
80: process_group.block
79: process_group
78: withCallingHandlers
77: process_file
76: knitr::knit
75: rmarkdown::render
74: eval [#17]
73: eval
72: withProgress
71: observeEventHandler [#16]
7:
6: do.call
5: rmarkdown::run
4: eval [~/GitHub/MS_TDSP/IDEAR/Run-IDEAR.R#13]
3: eval
2: withVisible
1: source

azure / azure-tdsp-utilities Goto Github PK

azure-tdsp-utilities's Introduction

Data Science Utilities from Microsoft

Release Notes

Ask Questions.

Help to Enrich the Set of Utilities

What Is TDSP

Additional notes

azure-tdsp-utilities's People

Contributors

Stargazers

Watchers

Forkers

azure-tdsp-utilities's Issues

Recommend Projects

Recommend Topics

Recommend Org