autodeployai / pypmml Goto Github PK

View Code? Open in Web Editor NEW

73.0 8.0 21.0 43.68 MB

Python PMML scoring library

License: Apache License 2.0

Python 100.00%

pmml deployment ai ml pmml-scoring-library

pypmml's People

Contributors

Stargazers

Watchers

pypmml's Issues

features not included in pmml file could be used

i got two pmml file ,one of them using lgbm and the other using xgboost

data for lgb
lgb_test1112.zip
related pmml file
model_lgb_end.zip

file lgb_test1112 is a normal test dataset for model_lgb_end.pmml
and file lgb_test1112_1 has only one unrelated feature which is not in the model_lgb_end.pmml

the strange part is that i can get a result with lgb_test1112_1
i can even take the data for xbg as an input for model_lgb_end.pmml
and that's why i found the exception

data for xbg
xgb_test1112.zip
related pmml file
model_xgb_end.zip

code:

import pandas
from pypmml import Model

df1 = pandas.read_csv("lgb_test1112_1")
df2 = pandas.read_csv("xgb_test1112")

model1 = Model.fromFile("model_lgb_end_11.pmml")
model2 = Model.fromFile("model_xgb_end(1).pmml")

model1.predict(df1)
model2.predict(df2)

model1.predict(df2)
model2.predict(df1)

does it means XGBClassifier and LGBMClassifier has no protection for input data?
i have used them in java and i still work(won't check whether the feature in pmml and just give the result)

FileNotFoundError: [WinError 2] The system cannot find the file specified

I loaded the model using the code below and it shows this error, even though I am sure the PMML file path is correct and exists. Does it not support PMML file? This model file was exported from KNIME.

knime_model = Model.load("knime_decision_tree.pmml")

Traceback (most recent call last):
  File "c:/Users/user/Desktop/ANSON/Python Scripts/SHRDC-projects/knime-examples/testing.py", line 9, in <module>
    knime_model = Model.load("knime_decision_tree.pmml")
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\model.py", line 234, in load
    model = cls.fromFile(model_content)
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\model.py", line 190, in fromFile
    pc = PMMLContext.getOrCreate()
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\base.py", line 77, in getOrCreate
    PMMLContext()
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\base.py", line 51, in __init__
    PMMLContext._ensure_initialized(self, gateway=gateway)
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\base.py", line 60, in _ensure_initialized
    PMMLContext._gateway = gateway or cls.launch_gateway()
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\pypmml\base.py", line 98, in launch_gateway
    _port = launch_gateway(classpath=launch_classpath, javaopts=javaopts, java_path=java_path, die_on_exit=True)
  File "C:\Users\user\anaconda3\envs\pmml\lib\site-packages\py4j\java_gateway.py", line 331, in launch_gateway
    proc = Popen(
  File "C:\Users\user\anaconda3\envs\pmml\lib\subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\user\anaconda3\envs\pmml\lib\subprocess.py", line 1311, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

can not use Model.load(pmml_file) in python2 successfully

from pypmml import Model
pmml_model = Model.load(pmml_file)

when i use Model.load in python 2.7.5 i got an Exception. Then i check the code in pypmml/model.py at line 221, i found isinstance(model_content, str) may always False in python 2. Does it right?

Not possible to score PMML model exported from SAS

Hi,
I have a decision tree model exported from SAS as PMML model.

Trying to score it as

model = Model.load('iris_Decision_Tree_PMML.xml')
result = model.predict({'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2})

gives the following exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o0.predict.
: java.lang.IllegalArgumentException: Field "class" does not exist.
at org.pmml4s.common.StructType$$anonfun$fieldIndex$1.apply(DataType.scala:229)
at org.pmml4s.common.StructType$$anonfun$fieldIndex$1.apply(DataType.scala:229)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.pmml4s.common.StructType.fieldIndex(DataType.scala:228)
at org.pmml4s.data.GenericSeriesWithSchema.fieldIndex(Series.scala:360)
at org.pmml4s.model.Model.prepare(Model.scala:303)
at org.pmml4s.model.TreeModel.predict(TreeModel.scala:51)
at org.pmml4s.model.Model.predict(Model.scala:188)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)

Pypmml predicts Null value of LGB-pmml

Hi there,

I got a problem when using model.predict -- some rows(records) predictions result in Null (missing value) in python3.8, while some are fine (valid results prob).

I have a lgb.model(v-3.2.0) and I convert it by using Jar (JDK 1.8.0_202 ) and jpmml-lightgbm-1.2.14 to a PMML file(v-4.3).
The pypmml version is latest pypmml v-0.9.16.

Also I dealt with many model files following the same process, this is the first case results null value.

If you need .pmml and data please PM me, I can send via email ( the data is quite private so I cannot share it here)

Thank you!

To include Covariate in pypmml

What is the tag I should use to add covariate in pypmml

Remediate CVE-2022-42889 Text4Shell Vulnerability

commons-text need to be upgraded to 1.10.0 version to remediate CVE-2022-42889 vulnerability.

I have submitted a PR for this #51

Please review and let me know next steps.

when i was using Model.predict(data) i got a None with pyspark Pipeline pmml which has a GBTClassifier step

`from pyspark.ml.linalg import SparseVector,DenseVector
from pyspark.ml.classification import GBTClassifier
import xgboost
spark = SparkSession.builder.appName("test").enableHiveSupport().getOrCreate()
data1 = spark.createDataFrame([(1, SparseVector(10, (0,1,7),(1.0,1.6,0.3)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (0,2,9),(0.2,2.1,0.7)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,3,4),(2.0,1.6,-1.1)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (2,5,7),(-0.2,-1.3,2.0)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (3,7,8),(-0.4,2.4,-0.2)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,2,3),(1.1,0.6,0.4)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (0,2,3),(-0.7,0.3,2.2)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,2,4),(-0.2,0.1,1.3)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (5,7,9),(1.1,-1.1,-0.3)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,3,5),(-0.6,0.7,0.2)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,3,4),(2.0,1.6,-1.1)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (2,5,7),(-0.2,-1.3,2.0)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (3,7,8),(-0.4,2.4,-0.2)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,2,3),(1.1,0.6,0.4)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,3,4),(2.0,1.6,-1.1)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (2,5,7),(-0.2,-1.3,2.0)), DenseVector([1.0,0.1]))
,(0, SparseVector(10, (3,7,8),(-0.4,2.4,-0.2)), DenseVector([1.0,0.1]))
,(1, SparseVector(10, (1,2,3),(1.1,0.6,0.4)), DenseVector([1.0,0.1]))
], ["label", "feat_1", "feat_2"])
from pyspark.sql.functions import udf, col
from pyspark.sql.types import ArrayType, DoubleType

def to_array(col):
def to_array_(v):
return v.toArray().tolist()
return udf(to_array_, ArrayType(DoubleType()))(col)

data = data1.withColumn("xs", to_array(col("feat_1"))).select(["label"] + [col("xs")[i] for i in range(10)])

+-------+-----+-----+-----+

| word|xs[0]|xs[1]|xs[2]|

+-------+-----+-----+-----+

| assert| 1.0| 2.0| 3.0|

|require| 0.0| 2.0| 0.0|

+-------+-----+-----+-----+

from pyspark.ml.feature import VectorAssembler,StandardScaler,HashingTF
assembler = VectorAssembler(inputCols=["xs[{}]".format(i) for i in range(10)], outputCol="feat_assembler")
data.show()
model_params = {"maxDepth":3
,"maxBins":32
,"lossType":"logistic"
,"maxIter": 10
,"stepSize": 0.01
,"featuresCol": "feat_assembler"
,"subsamplingRate": 0.5
}
gbdt = GBTClassifier(**model_params)
pipeline = Pipeline(stages=[assembler, gbdt])
model = pipeline.fit(data)
from pyspark2pmml import PMMLBuilder
PMMLBuilder.buildFile("{model_path}")
from pypmml import Model
model = Model.load("{model_path}")
model.inputNames
model.predict(data)`

then i got a None

Batch prediction on multiple rows

I can't help but notice that in model.py, predictions on 2D arrays, dataframes, etc. are submitted to the Java model independently and are not batched, possibly resulting in unnecessary overhead and long compute times compared to an equivalent python-native model (such as a scikit-learn model):
[self.call('predict', record) for record in data]

Is this inherent to the nature of the underlying Java model?
Does the Java model support batch predictions?

Thank you.

File doesn't load

When I try to do Model.load I get a file not found error . Not sure why as I can open the file. Any ideas? I enclose the file .

FileNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11020/3468497621.py in
----> 1 model=Model.load("GBAW12ml-1.xml")

~\anaconda3\lib\site-packages\pypmml\model.py in load(cls, f)
232 # Check if a file path
233 if os.path.exists(model_content):
--> 234 model = cls.fromFile(model_content)
235 else:
236 model = cls.fromString(model_content)

~\anaconda3\lib\site-packages\pypmml\model.py in fromFile(cls, name)
188 def fromFile(cls, name):
189 """Load a model from PMML file with given pathname"""
--> 190 pc = PMMLContext.getOrCreate()
191 try:
192 java_model = pc._jvm.org.pmml4s.model.Model.fromFile(name)

~\anaconda3\lib\site-packages\pypmml\base.py in getOrCreate(cls)
75 with PMMLContext._lock:
76 if PMMLContext._active_pmml_context is None:
---> 77 PMMLContext()
78 return PMMLContext._active_pmml_context
79

~\anaconda3\lib\site-packages\pypmml\base.py in init(self, gateway)
49
50 def init(self, gateway=None):
---> 51 PMMLContext._ensure_initialized(self, gateway=gateway)
52
53 @classmethod

~\anaconda3\lib\site-packages\pypmml\base.py in _ensure_initialized(cls, instance, gateway)
58 with PMMLContext._lock:
59 if not PMMLContext._gateway:
---> 60 PMMLContext._gateway = gateway or cls.launch_gateway()
61 PMMLContext._jvm = PMMLContext._gateway.jvm
62

~\anaconda3\lib\site-packages\pypmml\base.py in launch_gateway(cls, javaopts, java_path)
96 javaopts = java_opts.split()
97
---> 98 _port = launch_gateway(classpath=launch_classpath, javaopts=javaopts, java_path=java_path, die_on_exit=True)
99 gateway = JavaGateway(
100 gateway_parameters=GatewayParameters(port=_port,

~\anaconda3\lib\site-packages\py4j\java_gateway.py in launch_gateway(port, jarpath, classpath, javaopts, die_on_exit, redirect_stdout, redirect_stderr, daemonize_redirect, java_path, create_new_process_group, enable_auth, cwd, return_proc, use_shell)
329
330 popen_kwargs["shell"] = use_shell
--> 331 proc = Popen(
332 command, stdout=PIPE, stdin=PIPE, stderr=stderr, cwd=cwd,
333 **popen_kwargs)

~\anaconda3\lib\subprocess.py in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
949 encoding=encoding, errors=errors)
950
--> 951 self._execute_child(args, executable, preexec_fn, close_fds,
952 pass_fds, cwd, env,
953 startupinfo, creationflags, shell,

~\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
1418 # Start the process
1419 try:
-> 1420 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
1421 # no special security
1422 None, None,

FileNotFoundError: [WinError 2] The system cannot find the file specified

This is the file
GBAW12ml-1.zip

Model.fromFile is failing for 'single_iris_dectree.xml' PMML files

Hi,
I am facing a problem with PMML load file.
1.) I downloaded pymml library pypmml-0.9.6.tar.gz and py4j-0.10.9-py2.py3-none-any.whl. I used 'pip install' command.
2.) downloaded http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml
3.) My java -version details
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
OpenJDK 64-Bit Server VM (build 11.0.3+7, mixed mode)

##########Python Code ##########################
from pypmml import Model
if name == 'main':
model = Model.load('single_iris_dectree.xml')

####### Error Messages ############
File "c:\xx\scikit3.py", line 24, in
model = Model.load('single_iris_dectree.xml')
File "D:\xx\model.py", line 231, in load
model = cls.fromString(model_content)
File "D:\xx\model.py", line 196, in fromString
pc = PMMLContext.getOrCreate()
File "D:\xx\base.py", line 77, in getOrCreate
PMMLContext()
File "D:\xx\base.py", line 51, in init
PMMLContext._ensure_initialized(self, gateway=gateway)
File "D:\xx\base.py", line 60, in _ensure_initialized
PMMLContext._gateway = gateway or cls.launch_gateway()
File "D:\xx\base.py", line 86, in launch_gateway
_port = launch_gateway(classpath=launch_classpath, die_on_exit=True)
File "D:\xx\Python\lib\site-packages\py4j\java_gateway.py", line 332, in launch_gateway
_port = int(proc.stdout.readline())
ValueError: invalid literal for int() with base 10: b''
############ End of Error################################################

Would appreciate help.

Model.fromFile is failing for 3 DMG example PMML files

PMML load is failing for the following PMML files:

Regards

how to get the probability result of a model ?

I saved a GBTClassificationModel in pmml file by PySpark, then I loaded it by pypmml, the prediction result I got is an integer. How can I get the probability result ? Thanks

Kohonen clustering model throws java.lang.ArrayIndexOutOfBoundsException error

I'm trying to run data through a kohonen clustering model (ClusteringModel algorithmName="Kohonen" functionName="clustering" modelClass="centerBased"), and get this error. What is wrong here?

Traceback (most recent call last):
File "a.py", line 15, in
df = model.predict(data)
File "C:\Program Files\Python38\lib\site-packages\pypmml\model.py", line 177, in predict
result = [self.call('predict', record) for record in records]
File "C:\Program Files\Python38\lib\site-packages\pypmml\model.py", line 177, in
result = [self.call('predict', record) for record in records]
File "C:\Program Files\Python38\lib\site-packages\pypmml\base.py", line 134, in call
return call_java_func(getattr(self._java_model, name), *a)
File "C:\Program Files\Python38\lib\site-packages\pypmml\base.py", line 41, in call_java_func
return _java2py(func(*args))
File "C:\Program Files\Python38\lib\site-packages\py4j\java_gateway.py", line 1309, in call
return_value = get_return_value(
File "C:\Program Files\Python38\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o0.predict.
: java.lang.ArrayIndexOutOfBoundsException: 157
at org.pmml4s.common.squaredEuclidean$$anonfun$distance$2.apply$mcVI$sp(ComparisonMeasure.scala:103)
at org.pmml4s.common.squaredEuclidean$$anonfun$distance$2.apply(ComparisonMeasure.scala:102)
at org.pmml4s.common.squaredEuclidean$$anonfun$distance$2.apply(ComparisonMeasure.scala:102)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:234)
at org.pmml4s.common.squaredEuclidean$.distance(ComparisonMeasure.scala:102)
at org.pmml4s.model.ClusteringModel.predict(ClusteringModel.scala:109)
at org.pmml4s.model.Model.predict(Model.scala:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)

pmml and pkl predictions are inconsistent

I use lightgbm regression model,when feature data have a lot of the same values,the pkl and pmml predict inconsistent. and large difference in predictions.Does anyone have the same problem?

Gradient boosted tree model giving incorrect predictions

I'm trying to use the attached PMML model, predict-y-gbt.zip, which contains a gradient boosted tree.

Calling from PyPMML results in an output prediction value that is always zero, i.e.:

from pypmml import Model
model_gbt = Model.load('predict-y-gbt.pmml')
model_gbt.predict({"x1": 0.4, "x2": 0.7})
# {'prediction': 0.0}

If I call the same model using JPMML-Evaluator then I get the correct output against test.csv, i.e.

java -cp pmml-evaluator-example-executable-1.5.16.jar \
  org.jpmml.evaluator.example.EvaluationExample \
  --model predict-y-gbt.pmml --input test.csv --output test.out.csv

which will output -0.210 as the prediction.

What could be going wrong here?

Is InvalidValueReplacement working?

Hi,

in the following model, I would assume that InvalidValueReplacement should yield a result different from the one currently produced by PyPmml. Could you have a look at it?

How to reproduce:

import pandas as pd
import pypmml

model = pypmml.Model.fromFile('./out.txt')
pred_data = model.predict(pd.DataFrame([['c'], ['d']], columns=['C']))
assert (pred_data['predicted_T'] == pd.Series([1, 1])).all()  # Value 'd' should be invalid, thus being replaced by 'c'

out.txt

System:

Python 3.8.6
PyPmml 0.9.13 (with fix for #32)

Thank you!
Wolfgang

Why the predict method always returns an empty list？

The predict method always returns an empty list，but the other functions are natural.

Py4JJavaError: An error occurred while calling z:org.pmml4s.model.Model.fromFile

Py4JJavaError: An error occurred while calling z:org.pmml4s.model.Model.fromFile.
: org.pmml4s.AttributeNotFoundException: Required attribute 'name' is missing.
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.pmml4s.xml.XmlAttrs.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.ModelBuilder$$anon$1$$anon$2.build(ModelBuilder.scala:94)
at org.pmml4s.xml.ModelBuilder$$anon$1$$anon$2.build(ModelBuilder.scala:92)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:81)
at org.pmml4s.xml.ModelBuilder.makeElem(ModelBuilder.scala:33)
at org.pmml4s.xml.ModelBuilder$$anon$1.build(ModelBuilder.scala:92)
at org.pmml4s.xml.ModelBuilder$$anon$1.build(ModelBuilder.scala:89)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:71)
at org.pmml4s.xml.ModelBuilder.makeElem(ModelBuilder.scala:33)
at org.pmml4s.xml.ModelBuilder.makeHeader(ModelBuilder.scala:89)
at org.pmml4s.xml.ModelBuilder.makeModel(ModelBuilder.scala:50)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:43)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:33)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:71)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:138)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:73)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:138)
at org.pmml4s.xml.ModelBuilder$.fromXml(ModelBuilder.scala:149)
at org.pmml4s.model.Model$.apply(Model.scala:670)
at org.pmml4s.model.Model$.fromFile(Model.scala:657)
at org.pmml4s.model.Model.fromFile(Model.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "", line 1, in
model = Model.fromFile('E:/model.xml')

File "D:\ProgramFiles\anaconda3\envs\python36\lib\site-packages\pypmml\model.py", line 157, in fromFile
raise PmmlError(je.getClass().getSimpleName(), je.getMessage())

PmmlError: ('AttributeNotFoundException', "Required attribute 'name' is missing.")

Py4JError: Could not find py4j jar at

Hi,
I am executing the following command after importing Pypmml in Databricks-
model = Model.load('single_iris_dectree.xml')

But, it is giving the following error -
`Py4JError Traceback (most recent call last)
<command-1251288781954273> in
----> 1 model = Model.load('single_iris_dectree.xml')

/databricks/python/lib/python3.8/site-packages/pypmml/model.py in load(cls, f)
234 model = cls.fromFile(model_content)
235 else:
--> 236 model = cls.fromString(model_content)
237 return model
238 else:

/databricks/python/lib/python3.8/site-packages/pypmml/model.py in fromString(cls, s)
199 def fromString(cls, s):
200 """Load a model from PMML in a string"""
--> 201 pc = PMMLContext.getOrCreate()
202 try:
203 java_model = pc._jvm.org.pmml4s.model.Model.fromString(s)

/databricks/python/lib/python3.8/site-packages/pypmml/base.py in getOrCreate(cls)
75 with PMMLContext._lock:
76 if PMMLContext._active_pmml_context is None:
---> 77 PMMLContext()
78 return PMMLContext._active_pmml_context
79

/databricks/python/lib/python3.8/site-packages/pypmml/base.py in init(self, gateway)
49
50 def init(self, gateway=None):
---> 51 PMMLContext._ensure_initialized(self, gateway=gateway)
52
53 @classmethod

/databricks/python/lib/python3.8/site-packages/pypmml/base.py in _ensure_initialized(cls, instance, gateway)
58 with PMMLContext._lock:
59 if not PMMLContext._gateway:
---> 60 PMMLContext._gateway = gateway or cls.launch_gateway()
61 PMMLContext._jvm = PMMLContext._gateway.jvm
62

/databricks/python/lib/python3.8/site-packages/pypmml/base.py in launch_gateway(cls, javaopts, java_path)
96 javaopts = java_opts.split()
97
---> 98 _port = launch_gateway(classpath=launch_classpath, javaopts=javaopts, java_path=java_path, die_on_exit=True)
99 gateway = JavaGateway(
100 gateway_parameters=GatewayParameters(port=_port,

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in launch_gateway(port, jarpath, classpath, javaopts, die_on_exit, redirect_stdout, redirect_stderr, daemonize_redirect, java_path, create_new_process_group, enable_auth, cwd, return_proc)
292 # Fail if the jar does not exist.
293 if not os.path.exists(jarpath):
--> 294 raise Py4JError("Could not find py4j jar at {0}".format(jarpath))
295
296 # Launch the server in a subprocess.

Py4JError: Could not find py4j jar at`

I have tried the solution mentioned in https://docs.microsoft.com/en-us/azure/databricks/kb/libraries/pypmml-fail-find-py4j-jar but it's not working.
Will you please tell me how to solve it.

Issue predicting from an imported pmml file

I am having an issue predicting from an imported pmml file

I am trying to run this code but the model.predict() function is giving me an error after importing it as a pmml file.

from sklearn2pmml.pipeline import PMMLPipeline

test_model =chosen_model['Model 3 pmml'][0]
print("TEST MODEL INFO:")
print(test_model)
print("RESULTS FROM sklearn2pmml PIPELINE:")
print(test_model.predict_proba(model3_X)[:,1][0])
# dd.verify(X[columns3])
sklearn2pmml(test_model,
             'test.pmml')


model_3 = Model.fromFile('test.pmml')
model_3.predict(model3_X)

Here is the output of the code:

TEST MODEL INFO:
PMMLPipeline(steps=[('classifier_comb_80_3', GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.01, loss='deviance', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=5,
                           min_weight_fraction_leaf=0.0, n_estimators=200,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False))])
RESULTS FROM sklearn2pmml PIPELINE:
0.615243309486158

Here is the error I am getting:

---------------------------------------------------------------------------

Py4JJavaError                             Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pypmml/model.py in predict(self, data)
    135                     records = data.to_dict('records')
--> 136                     results = [self.call('predict', record) for record in records]
    137                     return pd.DataFrame.from_records(results)

/opt/anaconda3/lib/python3.7/site-packages/pypmml/model.py in <listcomp>(.0)
    135                     records = data.to_dict('records')
--> 136                     results = [self.call('predict', record) for record in records]
    137                     return pd.DataFrame.from_records(results)

/opt/anaconda3/lib/python3.7/site-packages/pypmml/base.py in call(self, name, *a)
    121     def call(self, name, *a):
--> 122         return call_java_func(getattr(self._java_model, name), *a)
    123 

/opt/anaconda3/lib/python3.7/site-packages/pypmml/base.py in call_java_func(func, *args)
     40     """ Call Java Function """
---> 41     return _java2py(func(*args))
     42 

/opt/anaconda3/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 

/opt/anaconda3/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:

Py4JJavaError: An error occurred while calling o16038.predict.
: java.lang.ArrayIndexOutOfBoundsException


During handling of the above exception, another exception occurred:

PmmlError                                 Traceback (most recent call last)
~/Documents/GEICO Attrition Production Movement/final deliverables/code/geico_attrn_model_inference.py in <module>
     13 
     14 model_3 = Model.fromFile('test.pmml')
---> 15 model_3.predict(model3_X)

/opt/anaconda3/lib/python3.7/site-packages/pypmml/model.py in predict(self, data)
    145                 raise PmmlError('Data type "{type}" not supported'.format(type=type(data).__name__))
    146             except Exception as e:
--> 147                 raise PmmlError('An error occurred caused by {message}'.format(message=str(e)))
    148 
    149     @classmethod

PmmlError: An error occurred caused by An error occurred while calling o16038.predict.
: java.lang.ArrayIndexOutOfBoundsException

feature request: possibility to configure port of py4j gateway / problem with multiple processes

Hi,

I think, it would be helpful to give possibility to configure port of py4j gateway.
Sometimes default port may be locked or used by another service, so it should be configurable.

Regards,
Piotr

Error parsing random forest regression model

Py4JJavaError: An error occurred while calling z:org.pmml4s.model.Model.fromFile.
: org.pmml4s.AttributeNotFoundException: Required attribute 'value' is missing.
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.pmml4s.xml.XmlAttrs.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.CommonBuilder$class.makeValue(CommonBuilder.scala:47)
at org.pmml4s.xml.ModelBuilder.makeValue(ModelBuilder.scala:33)
at org.pmml4s.xml.ModelBuilder$$anon$3$$anonfun$build$1.applyOrElse(ModelBuilder.scala:113)
at org.pmml4s.xml.ModelBuilder$$anon$3$$anonfun$build$1.applyOrElse(ModelBuilder.scala:111)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.pmml4s.xml.XmlUtils$class.traverseElems(XmlUtils.scala:61)
at org.pmml4s.xml.ModelBuilder.traverseElems(ModelBuilder.scala:33)
at org.pmml4s.xml.ModelBuilder$$anon$3.build(ModelBuilder.scala:111)
at org.pmml4s.xml.ModelBuilder$$anon$3.build(ModelBuilder.scala:102)
at org.pmml4s.xml.XmlUtils$class.makeElems(XmlUtils.scala:123)
at org.pmml4s.xml.ModelBuilder.makeElems(ModelBuilder.scala:33)
at org.pmml4s.xml.ModelBuilder.makeDataDictionary(ModelBuilder.scala:102)
at org.pmml4s.xml.ModelBuilder.makeModel(ModelBuilder.scala:56)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:43)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:33)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:71)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:138)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:73)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:138)
at org.pmml4s.xml.ModelBuilder$.fromXml(ModelBuilder.scala:149)
at org.pmml4s.model.Model$.apply(Model.scala:670)
at org.pmml4s.model.Model$.fromFile(Model.scala:657)
at org.pmml4s.model.Model.fromFile(Model.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

During handling of the above exception, another exception occurred:

PmmlError Traceback (most recent call last)
in
2
3 # The model is from http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml
----> 4 model = Model.fromFile('myModel.xml')

~/anaconda3/lib/python3.7/site-packages/pypmml/model.py in fromFile(cls, name)
155 except Py4JJavaError as e:
156 je = e.java_exception
--> 157 raise PmmlError(je.getClass().getSimpleName(), je.getMessage())
158
159 @classmethod

PmmlError: ('AttributeNotFoundException', "Required attribute 'value' is missing.")

Missing value handling of Lightgbm

Hi,
I found that the prediction results produce by python lightgbm model and pmml file is different.
It happens when training data did not contain missing value but predict the data which contains missing value.
I don't know whether it is the error of pypmml
Here is the example to show this case.

import lightgbm as lgb
import  pandas as pd
import numpy as np
import pandas
import joblib
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn_pandas import DataFrameMapper
from sklearn2pmml import sklearn2pmml
from pypmml import Model

np.random.seed(1)
n_feature = 20
fea_name = ['Fea'+str(i+1) for i in range(n_feature)]

####training without missing value
X =10*np.random.randn(1000,n_feature)
X =X.astype(np.float32)
Y = np.random.random_integers(0,1,1000)

my_model = lgb.LGBMClassifier(n_estimators=100)
my_model.fit(X,Y,feature_name=fea_name)

mapper = DataFrameMapper([([i], None) for i in fea_name])  

pipeline = PMMLPipeline([
    ('mapper', mapper), 
    ("classifier", my_model)
])

sklearn2pmml(pipeline, "lgb.pmml")
#load
pmml_model=Model.load('./lgb.pmml')

Example 1 predict given dataset without missing value

##predict given dataset without missing value
np.random.seed(10)
test_X =10*np.random.randn(1000,n_feature)
test_X =test_X.astype(np.float32)
#test_X[test_X<0]=np.nan

my_model_pred = my_model.predict_proba(test_X)[:,1]

pmml_model_pred = pmml_model.predict(pd.DataFrame(test_X,columns=fea_name)).to_numpy()
pmml_model_pred = np.array([list(re) for re in pmml_model_pred])[:,1]

res_df = pd.DataFrame({
    'pmml_model_pred':pmml_model_pred,
    'my_model_pred':my_model_pred,
})
res_df['pred_diff'] = abs(res_df['my_model_pred'] -res_df['pmml_model_pred'] )

print(res_df.sort_values('pred_diff',ascending=False).head(10))

     pmml_model_pred  my_model_pred     pred_diff
644         0.252020       0.252020  5.551115e-17
0           0.831942       0.831942  0.000000e+00
671         0.518838       0.518838  0.000000e+00
659         0.844573       0.844573  0.000000e+00
660         0.874415       0.874415  0.000000e+00
661         0.249900       0.249900  0.000000e+00
662         0.089023       0.089023  0.000000e+00
663         0.378300       0.378300  0.000000e+00
664         0.498033       0.498033  0.000000e+00
665         0.245736       0.245736  0.000000e+00

Example 2 predict given dataset containing missing value

##predict given dataset containing missing value
np.random.seed(10)
test_X =10*np.random.randn(1000,n_feature)
test_X =test_X.astype(np.float32)
test_X[test_X<0]=np.nan

my_model_pred = my_model.predict_proba(test_X)[:,1]

pmml_model_pred = pmml_model.predict(pd.DataFrame(test_X,columns=fea_name)).to_numpy()
pmml_model_pred = np.array([list(re) for re in pmml_model_pred])[:,1]


res_df = pd.DataFrame({
    'pmml_model_pred':pmml_model_pred,
    'my_model_pred':my_model_pred,
})
res_df['pred_diff'] = abs(res_df['my_model_pred'] -res_df['pmml_model_pred'] )

print(res_df.sort_values('pred_diff',ascending=False).head(10))

     pmml_model_pred  my_model_pred  pred_diff
387         0.051331       0.907809   0.856478
452         0.952144       0.124708   0.827436
77          0.030062       0.848526   0.818465
568         0.855977       0.060451   0.795526
311         0.107558       0.900605   0.793047
146         0.953153       0.168106   0.785047
243         0.128113       0.903057   0.774945
348         0.141077       0.911108   0.770030
385         0.859853       0.108619   0.751234
440         0.910722       0.180750   0.729972

Error when importing SVM model to Python: The attribute alternateTargetCategory is required.

Hello,

I'm using pypmml library to import pmml models to Python. When reading SVM models created from KNIME using the command Model.fromFile(), I have always an error that I have not been able to solve. Always the following error appears: The attribute alternateTargetCategory is required in case of binary classification models with only one SupportVectorMachine element. It is also required in case of multi-class classification models implementing the one-against-one method.
However, I'm reading a SVM model that is one-against-all.

In the .xml file of the pmml model the atribute classificationMethod is not explicitly written, but, I read that this atribute is optional and if it does not appear, the method by default is one-against-all. Even if I write this atribute as I show below, the error still appears:
<SupportVectorMachineModel modelName="SVM" functionName="classification" algorithmName="Sequential Minimal Optimization (SMO)" svmRepresentation="SupportVectors" classificationMethod="OneAgainstAll">

I would like to ask for any help or advice to solve this issue. Thank you very much!

Incorrect defaultScore for RuleSet

Description

When evaluating a RuleSet model, an incorrect defaultScore is returned when evaluating a combination that is not contained in the RuleSet.

How to reproduce

import pandas as pd
import pypmml

model = pypmml.Model.fromFile('./out.txt')
pred_data = model.predict(pd.DataFrame([[1, 3, 'B'], [2, 4, 'B']], columns=['A', 'B', 'C']))  # First row as contained in model, second some other row
assert (pred_data['predicted_T'] == pd.Series([1, 0])).all()  # 1 is score as defined by rule; 0 would be defaultScore, instead -9223372036854775808 is returned!

Environment

Python 3.8.6
PyPmml 0.9.11

out.txt

Problems loading models

I am having problems loading a pmml model attached it comes up with activation function not found, But it is in there.

Also on a few occasions it has worked but then stopped working again which is slightly strange.

GBNH23-1.zip

model = Model.load('GBNH23-1.xml')

The output from in a jupyter notebook is

Py4JJavaError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pypmml\model.py in fromFile(cls, name)
191 try:
--> 192 java_model = pc._jvm.org.pmml4s.model.Model.fromFile(name)
193 return cls(java_model)

~\anaconda3\lib\site-packages\py4j\java_gateway.py in call(self, *args)
1308 answer = self.gateway_client.send_command(command)
-> 1309 return_value = get_return_value(
1310 answer, self.gateway_client, self.target_id, self.name)

~\anaconda3\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".

Py4JJavaError: An error occurred while calling z:org.pmml4s.model.Model.fromFile.
: org.pmml4s.AttributeNotFoundException: Required attribute 'activationFunction' is missing.
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.XmlAttrs$$anonfun$apply$1.apply(XmlAttrs.scala:27)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.pmml4s.xml.XmlAttrs.apply(XmlAttrs.scala:27)
at org.pmml4s.xml.NeuralNetworkBuilder.makeAttributes(NeuralNetworkBuilder.scala:124)
at org.pmml4s.xml.NeuralNetworkBuilder.build(NeuralNetworkBuilder.scala:37)
at org.pmml4s.xml.NeuralNetworkBuilder.build(NeuralNetworkBuilder.scala:28)
at org.pmml4s.xml.ModelBuilder.makeModel(ModelBuilder.scala:67)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:44)
at org.pmml4s.xml.ModelBuilder.build(ModelBuilder.scala:34)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:71)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:143)
at org.pmml4s.xml.XmlUtils$class.makeElem(XmlUtils.scala:73)
at org.pmml4s.xml.ModelBuilder$.makeElem(ModelBuilder.scala:143)
at org.pmml4s.xml.ModelBuilder$.fromXml(ModelBuilder.scala:154)
at org.pmml4s.model.Model$.apply(Model.scala:718)
at org.pmml4s.model.Model$.fromFile(Model.scala:702)
at org.pmml4s.model.Model.fromFile(Model.scala)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)

During handling of the above exception, another exception occurred:

PmmlError Traceback (most recent call last)
in
----> 1 model = Model.load('GBNH23-1.xml')

~\anaconda3\lib\site-packages\pypmml\model.py in fromFile(cls, name)
194 except Py4JJavaError as e:
195 je = e.java_exception
--> 196 raise PmmlError(je.getClass().getSimpleName(), je.getMessage())
197
198 @classmethod

PmmlError: ('AttributeNotFoundException', "Required attribute 'activationFunction' is missing.")

Trigonometric functions not working

I have a pmml file generated with sklearn2pmml with the following DerivedField:

<DerivedField name="hour_x" optype="continuous" dataType="double">
	<Apply function="x-sin">
		<Apply function="*">
			<Constant dataType="double">0.2617993877991494</Constant>
			<FieldRef field="hour"/>
		</Apply>
	</Apply>
</DerivedField>

When I try to read it into Python with pypmml.Model, I get the following error: ('FunctionNotFoundException', "Function 'x-sin' is not defined.")

Is there a possibility to add the trigonometric functions to your package?

Could not find py4j jar

I'm trying to use pypmml using

from pypmml import Model
model = Model.fromFile('DecisionTreeIris.pmml')

but encountering py4j not found (I've pip-installed it already, and JAVA_HOME is set).

How do I debug this? Thanks.

(edit)
To add some further context, my environment:

predict retinanet_with_coco_1.pmml

pmml predict:
from pypmml import Model
import cv2
import pandas as pd

model = Model.load('retinanet_with_coco_1.pmml')
img = cv2.imread("20190506053526137.jpg", 0)
data = pd.DataFrame(img)
result = model.predict(data)
print (result.shape)

why result shape = (0, 0)??? or The prediction method is wrong???

avg() function of pmml 4.3 not averaging if one value is missing

Hi,

I am running pmml 4.3 on pypmml version 0.9.15 and I have transformation that should do average over variables and if one of those variables is missing or has invalid values then the entire average is "none". I might be wrong but avg() function should simply omit "none" values and create avg() on available values from other variables and should be "none" only if all are missing.

reference: https://dmg.org/pmml/v4-3/BuiltinFunctions.html#min

example:

   <DerivedField name="avg_result" dataType="double" optype="continuous">
    <Apply function="avg">
     <FieldRef field="variable1"/>
     <FieldRef field="variable2"/>
     <FieldRef field="variable3"/>
     <FieldRef field="variable4"/>
     <FieldRef field="variable5"/>
     <FieldRef field="variable6"/>
     <FieldRef field="variable7"/>
     <FieldRef field="variable8"/>
     <FieldRef field="variable9"/>
     <FieldRef field="variable10"/>
     <FieldRef field="variable11"/>
     <FieldRef field="variable12"/>
     <FieldRef field="variable13"/>
    </Apply>

MissingValueReplacement not working as expected

Hi,

sorry to bother you (again :)):

I think missingValueReplacement of the mining schema is not working as expected (see https://dmg.org/pmml/v4-3/MiningSchema.html):

If this attribute is specified then a missing input value is automatically replaced by the given value. That is, the model itself works as if the given value was found in the original input.

Example (on Python 3.8.6, PyPMML 0.9.17:

import pypmml
import pandas as pd
from pandas._testing import assert_frame_equal
import numpy as np

df = pd.DataFrame([["test"], ["MISSING"], [np.NaN], ["NA"], ["X"]], columns=["TEST"])
model_1 = pypmml.Model.fromString("""<PMML xmlns="https://www.dmg.org/PMML-4_3" version="4.3">
    <Header copyright="dmg.org"/>
    <DataDictionary>
        <DataField name="TEST" optype="categorical" dataType="string">
            <Value property="valid" value="MISSING"/>
            <Value property="valid" value="test"/>
        </DataField>
        <DataField name="SCORE" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="regression" modelName="" normalizationMethod="softmax">
        <MiningSchema>
            <MiningField invalidValueTreatment="returnInvalid" missingValueReplacement="MISSING" name="TEST" usageType="active"/>
            <MiningField name="SCORE" usageType="target"/>
        </MiningSchema>
        <RegressionTable targetCategory="1" intercept="0.5">
            <CategoricalPredictor name="TEST" value="MISSING" coefficient="0.3"/>
            <CategoricalPredictor name="TEST" value="test" coefficient="-0.2"/>
        </RegressionTable>
        <RegressionTable targetCategory="0" intercept="0"/>
    </RegressionModel>
</PMML>""")
model_2 = pypmml.Model.fromString("""<PMML xmlns="https://www.dmg.org/PMML-4_3" version="4.3">
    <Header copyright="dmg.org"/>
    <DataDictionary>
        <DataField name="TEST" optype="categorical" dataType="string">
            <Value property="valid" value="MISSING"/>
            <Value property="valid" value="test"/>
        </DataField>
        <DataField name="SCORE" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="regression" modelName="" normalizationMethod="softmax">
        <MiningSchema>
            <MiningField invalidValueTreatment="asMissing" missingValueReplacement="MISSING" name="TEST" usageType="active"/>
            <MiningField name="SCORE" usageType="target"/>
        </MiningSchema>
        <RegressionTable targetCategory="1" intercept="0.5">
            <CategoricalPredictor name="TEST" value="MISSING" coefficient="0.3"/>
            <CategoricalPredictor name="TEST" value="test" coefficient="-0.2"/>
        </RegressionTable>
        <RegressionTable targetCategory="0" intercept="0"/>
    </RegressionModel>
</PMML>""")
print(model_1.predict(df))  # Missing values are not replaced, invalid is returned instead.
print(model_2.predict(df))  # Invalid values should be replaced by "MISSING", however results indicate that no variable is used in regression

Best
Wolfgang

Error whle loading pmml file

Traceback (most recent call last):
File "C:/Users/lakshmana.selvam/PycharmProjects/LakshmanWorkspace/venv/CheckPMML.py", line 2, in
model = Model.fromFile(r'C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\DecisionTreeIris.pmml')
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\pypmml\model.py", line 151, in fromFile
pc = PMMLContext.getOrCreate()
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\pypmml\base.py", line 77, in getOrCreate
PMMLContext()
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\pypmml\base.py", line 51, in init
PMMLContext._ensure_initialized(self, gateway=gateway)
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\pypmml\base.py", line 60, in _ensure_initialized
PMMLContext._gateway = gateway or cls.launch_gateway()
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\pypmml\base.py", line 86, in launch_gateway
_port = launch_gateway(classpath=launch_classpath, die_on_exit=True)
File "C:\Users\lakshmana.selvam\PycharmProjects\LakshmanWorkspace\venv\lib\site-packages\py4j\java_gateway.py", line 328, in launch_gateway
cwd=cwd, **popen_kwargs)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 756, in init
restore_signals, start_new_session)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1155, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

missing value imputation not working on decision tree once imported into python

Hello,

I'm having an issue with importing a PMML decision tree into python using pypmml. The tree is designed for the titanic dataset, the relevant fields are sex, pclass, age and embarked (don't ask why the last one). I attach the file here as a txt file only because github won't let me upload pmml files.

Of particular interest is the snippet

<MiningSchema>
    <MiningField name="outcome" usageType="target"/>
    <MiningField name="pclass" missingValueReplacement="3.0"/>
    <MiningField name="age" missingValueReplacement="28.0"/>
    <MiningField name="sex" missingValueReplacement="male"/>
    <MiningField name="embarked" missingValueReplacement="C"/>
</MiningSchema>

included to guarantee that a leaf will always be reached, as if any value is not input, there is a specified default for imputation.

However, when I import the model into python, it doesn't work like that.

>>> from pypmml import Model
>>> clf = Model.fromFile("titanic_modified.pmml")
>>> clf.predict({'sex':'male','pclass':3,'age':5,'embarked':'S'})
{'Pr(survived)': 0.6176470588235294, 'nodeId': 5, 'Pr(died)': 0.38235294117647056}
>>> clf.predict({'pclass':3,'age':5,'embarked':'S'})
{'Pr(survived)': nan, 'nodeId': -9223372036854775808, 'Pr(died)': nan}

The only difference between the two inputs is that in the first, the sex was male, and in the second, the sex value was missing. This shouldn't make a difference as the default missing value replacement for sex is male, and yet when it was left out, I didn't get to a leaf.

Am I doing something wrong here or is there a bug?

Thanks so much for your help,

Philip

Release a new version 0.9.17

Hello,

commons-text was upgraded to 1.10.0 version to remediate GHSA-599f-7c49-w659 vulnerability. PR for this #51 is already merged.

Could you please release this version so that we can install the remediated library from pypi.org? Please let me know if I have to follow any steps.

Regards,
Vamsi.

Transformation with pyPMML?

Has anyone been able to use a transformation PMML (normalization to be specific) in pyPMML?

I can get the model to load into python just fine and I can print out the input fields, but if I try to "predict" any values or even print out the output fields I get the following error:

"py4j.protocol.Py4JJavaError: An error occurred while calling o0.outputFields.
: java.lang.IllegalArgumentException: requirement failed: For the transformedValue result feature, OutputField must contain an EXPRESSION ..."

It seems like python can load the model but then cannot tell what the output is supposed to be.

Are boosted decision trees supported?

It seems to me, that boosted decision trees are not supported.
I have a c5.0 boosted DT in PMML, and when I run it through pypmml, the first match of the first < TreeModel > is used, the rest is ignored.
Is it on the roadmap to support boosted DTs?

Property missing

Hi,

In data dictionary I have specified some values as missing (they are downloaded from sql and are in database as values -99). I use the declaration:

<DataField name="variable1" optype="continuous" dataType="double">
<Interval closure="closedOpen" leftMargin="0"/>
<Value value="-99" property="missing"/>
</DataField>

But when I try to use it in transformations, instead of passing it thru as missing, it is passing as value -99 and it's computing with the value as if it was valid.

Whether to support non-English

Hello, I reported this error when I was reading a pmml file containing Chinese.

"PmmlError: ('MalformedInputException', 'Input length = 1')"

I looked up the problem and some people said that Chinese cannot be included. Is there a solution? If this is the case, why can't Chinese be included?

Thanks.

pypmml.base.PmmlError: ('FileNotFoundException',

When I call the pmml file with pypmml across folders, this error always occurs when loading the model (the error is as follows), I try to modify the java version (the original is 1.7, modified to 1.8), the same situation still occurs

The path passed must be correct

Hope that fixes this

Traceback (most recent call last):
File "E:\python_virtual_env\com_work\lib\site-packages\pypmml\model.py", line 192, in fromFile
java_model = pc._jvm.org.pmml4s.model.Model.fromFile(name)
File "E:\python_virtual_env\com_work\lib\site-packages\py4j\java_gateway.py", line 1322, in call
return_value = get_return_value(
File "E:\python_virtual_env\com_work\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.pmml4s.model.Model.fromFile.
: java.io.FileNotFoundException: E:\python_virtual_env\com_work\SchedulerSystem\algorithmZoo\pmml_file\DecisionTreeIris.pmml (系统找不到指定的路径。)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
at java.base/java.io.FileInputStream.(FileInputStream.java:157)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at org.pmml4s.model.Model$.fromFile(Model.scala:701)
at org.pmml4s.model.Model.fromFile(Model.scala)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:833)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/python_virtual_env/com_work/SchedulerSystem/algorithmZoo/iris_desicion_tree.py", line 7, in
model = Model.fromFile(os.path.join(os.getcwd(), 'pmml_file/DecisionTreeIris.pmml'))
File "E:\python_virtual_env\com_work\lib\site-packages\pypmml\model.py", line 196, in fromFile
raise PmmlError(je.getClass().getSimpleName(), je.getMessage())
pypmml.base.PmmlError: ('FileNotFoundException', 'E:\python_virtual_env\com_work\SchedulerSystem\algorithmZoo\pmml_file\DecisionTreeIris.pmml (系统找不到指定的路径。)')

Model.fromFile never ends

I try to load an pmml file with Model.fromFile("file.pmml") , that I generated with sklearn2pmml.make_pmml_pipeline, but the loading of the file never stops.

I would be thankful for some support.
file.zip

How to make model.predict() return a given id?

How can I make model.predict() to return in the resulting dataset a record id, which is present in the source dataframe?
I'd like to know which prediction belongs to which record, and df index does not seem to be reliable here...

Categorical data field with all values valid

Hi,

according to Link, if a categorical field contains at least one value with a valid property, these values completely define the set of valid values. Otherwise any value should be valid by default.

Is the second part respected in PyPmml?

If I test the following model in PyPmml 0.9.16 on Python 3.8.6, it seems that "val1" and "val2" are considered valid, however "val3" is not.

import pypmml
import pandas as pd
from pandas._testing import assert_frame_equal
import numpy as np

model = pypmml.Model.fromString("""<PMML xmlns="https://www.dmg.org/PMML-4_3" version="4.3">
    <Header copyright="dmg.org"/>
    <DataDictionary>
        <DataField name="TEST" optype="categorical" dataType="string">
        </DataField>
        <DataField name="SCORE" optype="continuous" dataType="double"/>
    </DataDictionary>
    <TreeModel modelName="test" functionName="classification" missingValueStrategy="none">
        <MiningSchema>
            <MiningField name="TEST" usageType="active"/>
            <MiningField name="SCORE" usageType="target"/>
        </MiningSchema>
        <Node>
            <True/>
            <Node score="1.0">
                <SimplePredicate field="TEST" operator="equal" value="val1"/>
            </Node>
            <Node score="2.0">
                <SimplePredicate field="TEST" operator="equal" value="val2"/>
            </Node>
            <Node score="3.0">
                <True/>
            </Node>
        </Node>
    </TreeModel>
</PMML>""")
df = pd.DataFrame([["val1"], ["val2"], ["val3"]], columns=["TEST"])
assert_frame_equal(model.predict(df), pd.DataFrame([[1.0], [2.0], [3.0]], columns=["predicted_SCORE"]))  # does indeed return 1.0, 2.0, np.NaN

Note that adding invalidValueTreatment="asIs" to the MiningField fixes this.

Best
Wolfgang

Can not get any model files to be read - File Not Found issue

Hello, I have not been able to use pypmml to import any models as I am always getting a 'File Not Found error' Does not matter where I place the model file, I even placed it in the working directory. The full error message is shown below and also a screenshot is included.

Should also mention that this is a Windows 10 environment, fresh Anaconda and pypmml installations, all latest versions.

The two commands below:

from pypmml import Model

model = Model.fromFile('DTree.xml')

Result in this error: (DTree.xml is in the same working directory as illustrated by the image)

FileNotFoundError Traceback (most recent call last)
in
2
3
----> 4 model = Model.fromFile('DTree.xml')

C:\ProgramData\Anaconda3\lib\site-packages\pypmml\model.py in fromFile(cls, name)
149 def fromFile(cls, name):
150 """Load a model from PMML file with given pathname"""
--> 151 pc = PMMLContext.getOrCreate()
152 try:
153 java_model = pc._jvm.org.pmml4s.model.Model.fromFile(name)

C:\ProgramData\Anaconda3\lib\site-packages\pypmml\base.py in getOrCreate(cls)
75 with PMMLContext._lock:
76 if PMMLContext._active_pmml_context is None:
---> 77 PMMLContext()
78 return PMMLContext._active_pmml_context
79

C:\ProgramData\Anaconda3\lib\site-packages\pypmml\base.py in init(self, gateway)
49
50 def init(self, gateway=None):
---> 51 PMMLContext._ensure_initialized(self, gateway=gateway)
52
53 @classmethod

C:\ProgramData\Anaconda3\lib\site-packages\pypmml\base.py in _ensure_initialized(cls, instance, gateway)
58 with PMMLContext._lock:
59 if not PMMLContext._gateway:
---> 60 PMMLContext._gateway = gateway or cls.launch_gateway()
61 PMMLContext._jvm = PMMLContext._gateway.jvm
62

C:\ProgramData\Anaconda3\lib\site-packages\pypmml\base.py in launch_gateway(cls)
84 launch_classpath = path.join(jars_dir, "*")
85
---> 86 _port = launch_gateway(classpath=launch_classpath, die_on_exit=True)
87 gateway = JavaGateway(
88 gateway_parameters=GatewayParameters(port=_port,

C:\ProgramData\Anaconda3\lib\site-packages\py4j\java_gateway.py in launch_gateway(port, jarpath, classpath, javaopts, die_on_exit, redirect_stdout, redirect_stderr, daemonize_redirect, java_path, create_new_process_group, enable_auth)
321
322 proc = Popen(command, stdout=PIPE, stdin=PIPE, stderr=stderr,
--> 323 **popen_kwargs)
324
325 # Determine which port the server started on (needed to support

C:\ProgramData\Anaconda3\lib\subprocess.py in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
773 c2pread, c2pwrite,
774 errread, errwrite,
--> 775 restore_signals, start_new_session)
776 except:
777 # Cleanup if the child failed starting.

C:\ProgramData\Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
1176 env,
1177 os.fspath(cwd) if cwd is not None else None,
-> 1178 startupinfo)
1179 finally:
1180 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

Spelling mistake

pypmml/pypmml/model.py

Line 142 in 6a92760

 raise PmmlError('Data type "{type}" not supported'.foramt(type=type(data).__name__)) 

pypmml/pypmml/model.py

Line 144 in 6a92760

 raise PmmlError('Data type "{type}" not supported'.foramt(type=type(data).__name__)) 

I think you mean "format" instead of "foramt".

PmmlError: ('OutOfMemoryError', 'GC overhead limit exceeded')

I tried to load a random forest made in R, but I believe that it was necessary to allocate more memory, including to save a random forest in pmml in R tivo to allocate more memory. But in python I don't know which command to use to allocate more memory
Below is the code I used:

from pypmml import Model
model = Model.fromFile('C:/Users/eugen/Desktop/ranger rf.pmml')

Can not run two pypmml processes with py4j==0.10.9

Hi.
I have the following snippet, which i try to run as two separate programs.

from pypmml import Model
import pandas as pd
import time

model = Model.load(artifact_path)
data = pd.read_csv(data_path)
predictions = model.predict(data)

while True:
    time.sleep(5)

One starts, another hangs.

Investigation showed, that it works with py4j==0.10.7 and doesn't work with py4j==0.10.9. That's because launch_gateway() returns random port in 0.10.7 and in 0.10.9 it always binds to default port, so second binding fails.

I open an issue in py4j repo.
py4j/py4j#406

Probably temp (or permanent) solution for pypmml would be finding empty port and specifying it in launch_gateway.
What do you think?

Py4JError with model.predict on Amazon EC2

Hi, I encountered an error when using an imported PMML model to predict on Amazon EC2 RHEL environment.
The exact code works very well on my other windows environment. I tried using the same version of pypmml, py4j and pandas from windows environment on ec2 but still the same error.

The error is

Py4JError Traceback (most recent call last)
~/x_validation/venv/lib64/python3.7/site-packages/pypmml/model.py in predict(self, data)
135 results = [self.call('predict', record) for record in records]
--> 136 return pd.DataFrame.from_records(results)
137 elif isinstance(data, pd.Series):

~/x_validation/venv/lib64/python3.7/site-packages/pandas/core/frame.py in from_records(cls, data, index, exclude, columns, coerce_float, nrows)
2072 else:
-> 2073 arrays, arr_columns = to_arrays(data, columns)
2074 if coerce_float:

~/x_validation/venv/lib64/python3.7/site-packages/pandas/core/internals/construction.py in to_arrays(data, columns, dtype)
798 elif isinstance(data[0], abc.Mapping):
--> 799 arr, columns = _list_of_dict_to_arrays(data, columns)
800 elif isinstance(data[0], ABCSeries):

~/x_validation/venv/lib64/python3.7/site-packages/pandas/core/internals/construction.py in _list_of_dict_to_arrays(data, columns)
883 sort = not any(isinstance(d, dict) for d in data)
--> 884 pre_cols = lib.fast_unique_multiple_list_gen(gen, sort=sort)
885 columns = ensure_index(pre_cols)

~/x_validation/venv/lib64/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.fast_unique_multiple_list_gen()

~/x_validation/venv/lib64/python3.7/site-packages/pandas/core/internals/construction.py in (.0)
881 if columns is None:
--> 882 gen = (list(x.keys()) for x in data)
883 sort = not any(isinstance(d, dict) for d in data)

/usr/lib64/python3.7/_collections_abc.py in iter(self)
719 def iter(self):
--> 720 yield from self._mapping
721

~/x_validation/venv/lib64/python3.7/site-packages/py4j/java_collections.py in iter(self)
80 def iter(self):
---> 81 return self.keySet().iterator()
82

~/x_validation/venv/lib64/python3.7/site-packages/py4j/java_gateway.py in call(self, *args)
1285 return_value = get_return_value(
-> 1286 answer, self.gateway_client, self.target_id, self.name)
1287

~/x_validation/venv/lib64/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
--> 332 format(target_id, ".", name, value))
333 else:

Py4JError: An error occurred while calling o304.iterator. Trace:
java.lang.reflect.InaccessibleObjectException: Unable to make public final java.util.Iterator java.util.HashMap$KeySet.iterator() accessible: module java.base does not "opens java.util" to unnamed module @4f9ec8f6
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
at py4j.reflection.MethodInvoker$1.run(MethodInvoker.java:240)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:238)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:833)

During handling of the above exception, another exception occurred:

PmmlError Traceback (most recent call last)
/tmp/ipykernel_23054/3103260193.py in
----> 1 spendmodel.predict(test)

~/x_validation/venv/lib64/python3.7/site-packages/pypmml/model.py in predict(self, data)
144 raise PmmlError('Data type "{type}" not supported'.foramt(type=type(data).name))
145 except Exception as e:
--> 146 raise PmmlError('An error occurred caused by {message}'.format(message=str(e)))
147
148 @classmethod

PmmlError: An error occurred caused by An error occurred while calling o304.iterator. Trace:
java.lang.reflect.InaccessibleObjectException: Unable to make public final java.util.Iterator java.util.HashMap$KeySet.iterator() accessible: module java.base does not "opens java.util" to unnamed module @4f9ec8f6
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
at py4j.reflection.MethodInvoker$1.run(MethodInvoker.java:240)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:238)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:833)

I can't share my pmml object but this code can give you a similar error

iris = load_iris(as_frame = True)
x_train,y_train = iris['data'],iris['target']
pmml_pipeline = PMMLPipeline([
  ("classifier", XGBClassifier())
])

pmml_pipeline.fit(x_train, y_train)
sklearn2pmml(pmml_pipeline, "test.pmml")

model = Model.fromFile('test.pmml')
model.predict(x_train)

Logistic Regression Issue

Hello, Thanks for making this amazing library.

I tested logistic regression pmml file with iris dataset.

`from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
#from sklearn_pmml_model.ensemble import PMMLForestClassifier

Prepare data

iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=123)

from sklearn.linear_model import LogisticRegression

lrf = LogisticRegression().fit(X_train, y_train)
predicted = lrf.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

from sklearn2pmml import sklearn2pmml
from sklearn2pmml import PMMLPipeline

pipeline = PMMLPipeline([("estimator", lrf)])
sklearn2pmml(pipeline, "Logistic.pmml")

from pypmml import Model

model = Model.fromFile('Logistic.pmml')
result = model.predict({'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2})
#data = pd.read_csv('Iris.csv')
#result = model.predict(data)
print(result)`

Here is my code.

It resulted nan prediction.
How can I solve this problem. Please help me. Thank you.

Multiprocess model predict issue

I have been adopted the PyPMML to my personal project for days. it's so great.

however, I have written a multiprocess predict program to improve performance but It seems cannot work fine.

I've tried to figure out solutions and learn these project codes. but I failed.

may I have wrong? Do you have any idea to solve this problem? Thanks.

The following error logs for the multiprocess program.

    HTTPServerRequest(protocol='http', host='pmml-demo.default.example.com', method='POST', uri='/v1/models/pmml-demo:predict', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/Users/anyisalin/codes/kfserving/python/pmmlserver/pmmlserver/model.py", line 51, in predict
        result = [list(_) for _ in self._model.predict(instances) if isinstance(_, JavaList)]
      File "/usr/local/anaconda3/lib/python3.8/site-packages/pypmml/model.py", line 161, in predict
        return [self.call('predict', record) for record in data]
      File "/usr/local/anaconda3/lib/python3.8/site-packages/pypmml/model.py", line 161, in <listcomp>
        return [self.call('predict', record) for record in data]
      File "/usr/local/anaconda3/lib/python3.8/site-packages/pypmml/base.py", line 122, in call
        return call_java_func(getattr(self._java_model, name), *a)
      File "/usr/local/anaconda3/lib/python3.8/site-packages/pypmml/base.py", line 41, in call_java_func
        return _java2py(func(*args))
      File "/usr/local/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py", line 1296, in __call__
        args_command, temp_args = self._build_args(*args)
      File "/usr/local/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py", line 1260, in _build_args
        (new_args, temp_args) = self._get_args(args)
      File "/usr/local/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py", line 1247, in _get_args
        temp_arg = converter.convert(arg, self.gateway_client)
      File "/usr/local/anaconda3/lib/python3.8/site-packages/py4j/java_collections.py", line 511, in convert
        java_list.add(element)
    AttributeError: 'bool' object has no attribute 'add'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/local/anaconda3/lib/python3.8/site-packages/tornado/web.py", line 1703, in _execute
        result = await result
      File "/usr/local/anaconda3/lib/python3.8/site-packages/kfserving-0.5.0.1-py3.8.egg/kfserving/handlers/http.py", line 78, in post
        response = (await model.predict(request)) if inspect.iscoroutinefunction(model.predict) else model.predict(request)
      File "/Users/anyisalin/codes/kfserving/python/pmmlserver/pmmlserver/model.py", line 55, in predict
        raise Exception("Failed to predict %s" % e)
    Exception: Failed to predict 'bool' object has no attribute 'add'

autodeployai / pypmml Goto Github PK

pypmml's People

Contributors

Stargazers

Watchers

Forkers

pypmml's Issues

+-------+-----+-----+-----+

| word|xs[0]|xs[1]|xs[2]|

+-------+-----+-----+-----+

| assert| 1.0| 2.0| 3.0|

|require| 0.0| 2.0| 0.0|

+-------+-----+-----+-----+

I am having an issue predicting from an imported pmml file

Description

How to reproduce

Environment

Prepare data

Recommend Projects

Recommend Topics

Recommend Org