Giter Club home page Giter Club logo

nimbusml-samples's Introduction

Samples for NimbusML

NimbusML is a package of Python bindings for the ML.NET framework. ML.NET is a machine learning framework designed for .NET developers to build and use great machine learning models in their applications. It provides battle-tested state-of-the-art ML algorithms, transforms and components, aiming to make them useful for all developers. However, we also know that often people work in multiple programming languages or work in teams where people use multiple programming languages.

In machine learning, Python has become very popular. We want to enable as many people to benefit from the ML.NET machine learning framework as possible and enable teams to work together, so we've created this project as Python bindings for ML.NET.

This is an open source project located at https://github.com/Microsoft/NimbusML. We'd love for you to try it out and/or contribute! For a full list of the samples/notebooks, please refer to our documentation.

Try today with Azure Notebooks - free Jupyter based notebooks in the Azure cloud

  1. Azure Notebooks Import sample notebooks into Azure Notebooks.

  2. Open one of the sample notebooks.

    Make sure the Azure Notebook kernel is set to Python 3.6 when you open a notebook.

    set kernel to Python 3.6

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

nimbusml-samples's People

Contributors

ganik avatar microsoftopensource avatar montebhoover avatar msftgits avatar mstfbl avatar najeeb-kazmi avatar pieths avatar zyw400 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nimbusml-samples's Issues

Missing tab in code example for 1.1 Classification with Synthetic Data

As a user trying to run the example, I often copy and paste the code snippets. When doing this with the code snippets in 1.1, I hit an error when copying the plot_result function.

>>> def plot_result(model, data, label):
...     xx, yy = np.meshgrid(np.arange(-2,1,0.01),np.arange(-2,2,0.01))
...     xx1 = np.array(xx).flatten()
...     yy1 = np.array(yy).flatten()
...     _predict = np.array(model.predict(pd.DataFrame({"X": xx1, "Y": yy1})))
... 
>>>     plt.figure(figsize=(8,8))
  File "<stdin>", line 1
    plt.figure(figsize=(8,8))
    ^
IndentationError: unexpected indent

It looks like plot_result has a blank line that is missing a tab which causes the indentation to be lost -- where other code that does contain blank lines work fine when there is a tab.

Please see this page for reference:
https://github.com/Microsoft/NimbusML-Samples/blob/master/samples/1.1%20%5BNumeric%5D%20Classification%20with%20Synthetic%20Data.ipynb

Samples 2.1 and 2.2 use wrong parameter names for the AveragedPerceptronBinaryClassifier

NameError: Parameters ['l2_regularizer_weight', 'num_iterations'] are not allowed for class 'AveragedPerceptronBinaryClassifier'.
Allowed: averaged, averaged_tolerance, caching, decrease_learning_rate,
feature, initial_weights, initial_weights_diameter, l2_regularization,
label, lazy_update, learning_rate, loss, normalize,
number_of_iterations, params, recency_gain,
recency_gain_multiplicative, reset_weights_after_x_examples, shuffle
```console

Matplotlib new implementation of color parameter breaks Tutorial 1.1.

In tutorial 1.1 in cell 7 when we call plot_result(model, dataTest, labelTest) we get the following KeyError:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-f15b92a14334> in <module>
----> 1 plot_result(model, dataTest, labelTest);

<ipython-input-2-31c7838de467> in plot_result(model, data, label)
     29     plt.figure(figsize=(8,8))
     30     plt.contourf(xx,yy,_predict.reshape((400,300)), alpha = 0.2)
---> 31     plt.scatter(data["X"], data["Y"], c = label)
     32     plt.show()

c:\users\mohoov\source\repos\nimbusml-samples\tests\dependencies\python3.6\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
   2862         vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,
   2863         verts=verts, edgecolors=edgecolors, **({"data": data} if data
-> 2864         is not None else {}), **kwargs)
   2865     sci(__ret)
   2866     return __ret

c:\users\mohoov\source\repos\nimbusml-samples\tests\dependencies\python3.6\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1803                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1804                         RuntimeWarning, stacklevel=2)
-> 1805             return func(ax, *args, **kwargs)
   1806 
   1807         inner.__doc__ = _add_data_doc(inner.__doc__,

c:\users\mohoov\source\repos\nimbusml-samples\tests\dependencies\python3.6\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4193                 isinstance(c, str) or
   4194                 (isinstance(c, collections.Iterable) and
-> 4195                     isinstance(c[0], str))):
   4196             c_array = None
   4197         else:

c:\users\mohoov\source\repos\nimbusml-samples\tests\dependencies\python3.6\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    765         key = com._apply_if_callable(key, self)
    766         try:
--> 767             result = self.index.get_value(self, key)
    768 
    769             if not is_scalar(result):

c:\users\mohoov\source\repos\nimbusml-samples\tests\dependencies\python3.6\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   3116         try:
   3117             return self._engine.get_value(s, k,
-> 3118                                           tz=getattr(series.dtype, 'tz', None))
   3119         except KeyError as e1:
   3120             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()

KeyError: 0

This appears to be a chance from Matplotlib v3.0.0 to v3.0.1 because they are doing a check for c[0] in the color parameter where we have c=labelTest, which is a pandas Series which is indexed by data point ID's and it happens to not have a zero index.

Make Medium article for NimbusML

A Medium article on NimbusML would be a great way to showcase its capabilities in a succinct and appealing way. We should write a NimbusML Medium article with an easy-to-follow example. This Medium article should be authored by an official Microsoft account, or co-authored by Microsoft software engineers.

Add time series sample

Now that there is time series support in NimbusML, should a time series sample be added?

Documentation Link points to a 404

Describe the bug
The link to documentation points to: https://docs.microsoft.com/en-us/nimbusml/tutorials, which gives a 404

To Reproduce
Steps to reproduce the behavior:

  1. Go to github page, scroll to the paragraph that reads:
This is an open source project located at https://github.com/Microsoft/NimbusML. We'd love for you to try it out and/or contribute! For a full list of the samples/notebooks, please refer to our documentation.
  1. Click on 'documentation'
  2. See that there is a 404

Expected behavior
This should link to the live documentation site.

Sample 2.3 uses wrong parameter name for NGramFeaturizer

NameError: Parameter 'output_tokens' is not allowed for class 'NGramFeaturizer'.
Allowed: char_feature_extractor, columns, dictionary, keep_diacritics,
keep_numbers, keep_punctuations, language, output_tokens_column_name,
params, stop_words_remover, text_case, vector_normalizer,
word_feature_extractor

Image Classification example is producing different results at each run

Describe the bug
The Image Classification example is producing different results at each run. This is due to the fact that there are non-deterministic factors involved in producing the output, such as the random initialization of the k-means centroids, seeds given to the pipeline (MLContext) itself, to the PCATransformer, etc.

To Reproduce
Run the Image Classification example (on Jupyter or local machine) multiple times, and the user will see that tables with different scores and values are displayed, which also results in the final PCA graph of the image cluster displaying different plots.

Expected behavior
Every single run of the Image Classification example should produce the exact same output. It should be deterministic.

Additional context
This was first noticed while looking at Issue 23.

Commit #1186afe is failing in prediction step on Azure DevOps build

Describe the bug
In the build of commit #1186afe on Azure DevOps, the job fails during the execution of tests\cmd.exe .

However, when ran locally, this build does not always fail. In fact, when ran locally on Jupyter Notebook, it builds successfully, but produces different results than the expected results that are in the GitHub file for the clustering sample file.

To Reproduce
Build and Log file: https://dev.azure.com/aifx/public/_build/results?buildId=1486

The most important lines in the log file are lines 496-581, Specifically, the failed build originates here, when the trained pipeline is called to make a prediction and generate clustering results.

errorLog

Screenshots

Expected results (https://github.com/mstfbl/NimbusML-Samples/blob/master/samples/2.6%20%5BImage%5D%20Image%20Processing%20-%20Clustering.ipynb, seems to be running Python 3.7.2):

expected1

expected2

Actual results 1 (from local Jupyter Notebook, Python 3.6.6):

actual11

actual12

Actual results 2 (from local .py file, converted from .ipynb, Python 3.6.6):

actual21

actual22

actual23

actual 24

Not able to merge the commit due to version issue

Describe the bug
I was trying to merge the commit for fixing the issue of using wrong parameter names (#15 ). However I got an ImportError while merging the branch, so I change the line of installing the python package from "pip install https://pythonpkgdeps.blob.core.windows.net/pytlc/nimbusml-0.6.0-cp36-none-win_amd64.whl" to "pip install https://pythonpkgdeps.blob.core.windows.net/pytlc/nimbusml-1.0.0-cp37-none-win_amd64.whl", but it still doesn't work. I also tried to see what version is available through the URL, but the page was not showing anything.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.