gedeck / dmba Goto Github PK
View Code? Open in Web Editor NEWUtility functions for "Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python"
License: MIT License
Utility functions for "Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python"
License: MIT License
I do not know where the rest of the code to the book lives?
The pydotplus package leads to an additional nodes in the rendering. Switching to graphviz solves the problem.
Consider resolving #4 at the same time.
import dmba
data = dmba.load_data('Universities.csv')
print(data.shape)
It seems that if pydotplus is installed, but graphviz is missing, the import of pydotplus fails. As the code hides the error message, check if we can provide a better error message in this case.
With the latest version, plotDecisionTree doesn't seem to be working. The error message is quite long. I'm pasting it below in case it is useful:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/usr/local/lib/python3.9/dist-packages/IPython/core/display.py](https://localhost:8080/#) in _data_and_metadata(self, always_both)
1299 try:
-> 1300 b64_data = b2a_base64(self.data).decode('ascii')
1301 except TypeError:
TypeError: a bytes-like object is required, not 'Source'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
2 frames
[/usr/local/lib/python3.9/dist-packages/IPython/core/display.py](https://localhost:8080/#) in _data_and_metadata(self, always_both)
1300 b64_data = b2a_base64(self.data).decode('ascii')
1301 except TypeError:
-> 1302 raise FileNotFoundError(
1303 "No such file or directory: '%s'" % (self.data))
1304 md = {}
FileNotFoundError: No such file or directory: 'digraph Tree {
node [shape=box, style="filled, rounded", color="black", fontname="helvetica"] ;
edge [fontname="helvetica"] ;
0 [label=<attachment ≤ 0.5<br/>samples = 20<br/>value = [8, 12]>, fillcolor="#bddef6"] ;
1 [label=<5<br/>[5, 0]>, fillcolor="#e58139"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label=<char_length ≤ 292.0<br/>15<br/>[3, 12]>, fillcolor="#6ab6ec"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label=<char_length ≤ 113.0<br/>14<br/>[2, 12]>, fillcolor="#5aade9"] ;
2 -> 3 ;
4 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
3 -> 4 ;
5 [label=<char_length ≤ 118.0<br/>10<br/>[2, 8]>, fillcolor="#6ab6ec"] ;
3 -> 5 ;
6 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
5 -> 6 ;
7 [label=<char_length ≤ 140.5<br/>9<br/>[1, 8]>, fillcolor="#52a9e8"] ;
5 -> 7 ;
8 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
7 -> 8 ;
9 [label=<char_length ≤ 186.0<br/>5<br/>[1, 4]>, fillcolor="#6ab6ec"] ;
7 -> 9 ;
10 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
9 -> 10 ;
11 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
9 -> 11 ;
12 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
2 -> 12 ;
}
'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/usr/local/lib/python3.9/dist-packages/IPython/core/display.py](https://localhost:8080/#) in _data_and_metadata(self, always_both)
1299 try:
-> 1300 b64_data = b2a_base64(self.data).decode('ascii')
1301 except TypeError:
TypeError: a bytes-like object is required, not 'Source'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
2 frames
[/usr/local/lib/python3.9/dist-packages/IPython/core/display.py](https://localhost:8080/#) in _data_and_metadata(self, always_both)
1300 b64_data = b2a_base64(self.data).decode('ascii')
1301 except TypeError:
-> 1302 raise FileNotFoundError(
1303 "No such file or directory: '%s'" % (self.data))
1304 md = {}
FileNotFoundError: No such file or directory: 'digraph Tree {
node [shape=box, style="filled, rounded", color="black", fontname="helvetica"] ;
edge [fontname="helvetica"] ;
0 [label=<attachment ≤ 0.5<br/>samples = 20<br/>value = [8, 12]>, fillcolor="#bddef6"] ;
1 [label=<5<br/>[5, 0]>, fillcolor="#e58139"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label=<char_length ≤ 292.0<br/>15<br/>[3, 12]>, fillcolor="#6ab6ec"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label=<char_length ≤ 113.0<br/>14<br/>[2, 12]>, fillcolor="#5aade9"] ;
2 -> 3 ;
4 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
3 -> 4 ;
5 [label=<char_length ≤ 118.0<br/>10<br/>[2, 8]>, fillcolor="#6ab6ec"] ;
3 -> 5 ;
6 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
5 -> 6 ;
7 [label=<char_length ≤ 140.5<br/>9<br/>[1, 8]>, fillcolor="#52a9e8"] ;
5 -> 7 ;
8 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
7 -> 8 ;
9 [label=<char_length ≤ 186.0<br/>5<br/>[1, 4]>, fillcolor="#6ab6ec"] ;
7 -> 9 ;
10 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
9 -> 10 ;
11 [label=<4<br/>[0, 4]>, fillcolor="#399de5"] ;
9 -> 11 ;
12 [label=<1<br/>[1, 0]>, fillcolor="#e58139"] ;
2 -> 12 ;
}
'
<IPython.core.display.Image object>
The current representation is not scaled.
Relevant information:
if os.environ.get('DISPLAY', '') == '':
print('no display found. Using non-interactive Agg backend')
mpl.use('Agg')
This throws an error and won't display plots in Windows 10 using Jupyter (due to 'Agg' being non-GUI) unless you order your imports correctly.
This does not work in windows command line. When you run python from cmd, plt.show() opens a window with the plot. However, the DISPLAY key is not defined in the os.environ dict. ..... โ Jindra Helcl May 28 '15 at 17:25
So, this is not displaying plots in (Jupyter) using Windows
%matplotlib inline
import matplotlib.pyplot as plt
from dmba import regressionSummary
But this will display plots properly (still showing the error above):
from dmba import regressionSummary
%matplotlib inline
import matplotlib.pyplot as plt
regressionSummary and classificationSummary expect 1D arrays. If the arrays are 2D (n by 1), the methods fail with
unsupported format string passed to numpy.ndarray.__format__
I use Google Colab (fantastic for students, most packages loaded, most everyone has a Google account).
In the way in which your init.py file checks for Display, it forces mpl.use('Agg') in a Colab environment; the 'Agg' backend is non-interactive and does not allow displaying figures directly, thus creating an issue. You have to save the figure and use Image to display it.
I forked your repo, modified the code (see below), and for the time being I pip install and import from my forked repo in the notebooks; it works well, but it would be cleaner if at some point you can recreate the package with this small change (there may be a similar circumstance in other cloud based notebook platforms -e.g. Kaggle- , but I have not checked).
Add following code to dmba/__init__.py
if 'google.colab' in sys.modules:
print('Colab environment detected.')
else:
if os.environ.get('DISPLAY', '') == '' and [os.name](http://os.name/) != 'nt':
print('No display found. Using non-interactive Agg backend')
mpl.use('Agg')
Error:
import: 'dmba'
Traceback (most recent call last):
File "/home/conda/feedstock_root/build_artifacts/dmba_1682476494141/test_tmp/run_test.py", line 2, in <module>
import dmba
File "/home/conda/feedstock_root/build_artifacts/dmba_1682476494141/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/dmba/__init__.py", line 13, in <module>
from .graphs import gainsChart, liftChart, plotDecisionTree, textDecisionTree
File "/home/conda/feedstock_root/build_artifacts/dmba_1682476494141/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/dmba/graphs.py", line 90, in <module>
pdfFile: Optional[os.PathLike] = None) -> Union[Image, str]:
^^^^^
NameError: name 'Image' is not defined
The actual error happens above in:
try:
from IPython.display import Image
hasImage = True
except ImportError:
pass
If IPython
is not available then Image
is not defined. In this case the type Image
in Union[Image, str]
is undefined which results in the error above. Caused by the line
https://github.com/gedeck/dmba/blob/master/src/dmba/graphs.py#L90
I may be mistaken, but the GainsChart() line for a random draw should be based on the total actual values for prediction (or total number of actual occurences for classification) and not the total predicted values:
nActual = gains.sum() # number of desired records
"gains" is the list of predicted values.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.