lolei / spmf-py Goto Github PK
View Code? Open in Web Editor NEWPython SPMF Wrapper ๐ ๐
License: GNU General Public License v3.0
Python SPMF Wrapper ๐ ๐
License: GNU General Public License v3.0
Thank you for writing this wrapper.
I have an issue when using to_pandas_dataframe()
method with 'Apriori_with_hash_tree'.
The following error is appear: ValueError: invalid literal for int() with base 10: '#SUP:'
from spmf import Spmf
spmf = Spmf("Apriori_with_hash_tree",
input_filename="contextPasquier99_name.txt",
output_filename="output.txt",
arguments=[0.40, 30, 2])
spmf.run()
print(spmf.to_pandas_dataframe())
spmf.to_csv("output.csv")
Regards.
Hello Sir,
Thank you very much for writing this wrapper.
I have an issue running the program, I would appreciate it if you helped me fix it.
I face a problem at the run function. Here is the log:
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\site-packages\spmf_init_.py", line 102, in run
proc = subprocess.check_output(subprocess_arguments)
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 395, in check_output
**kwargs).stdout
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 775, in init
restore_signals, start_new_session)
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Here is an example of how I run the program from example.py (spfm.jar and the input file are both in the same directory as the example.py)
spmf_jar_dir = pathlib.Path(__file__).parent.absolute()
spmf = Spmf("PrefixSpan", input_filename="contextPrefixSpan.txt",spmf_bin_location_dir=spmf_jar_dir,
output_filename="output.txt", arguments=[1, "", True])
spmf.run()
Here is what the variable status at line 102 of init.py
I've also tried with the absolute path of the input file, but got the same error.
Looking for clarity on the output of FP Growth Algorithm.
I am doing frequent itemset mining and various times I see negative values in the output itemsets even though my data set doesn't contain negative values.
Curious as to how to interpret this negative value.
Below is an example:
from spmf import Spmf
input_example_list = [
"1, 3, 4",
"2, 3, 5",
"1, 2, 3, 5",
"2, 5",
"1, 2, 4, 5"
]
spmf = Spmf("FPGrowth_itemsets",
input_direct=input_example_list,
input_type="text",
output_filename="C:\\spaces\\igt_eye\\trials\\itemset\\output.txt",
arguments=[0.4, 3, 3],
spmf_bin_location_dir="\\site-packages\\spmf\\")
spmf.run()
print(spmf.parse_output())
This produces the following output:
============= FP-GROWTH 2.42 - STATS =============
Transactions count from database : 5
Max memory usage: 8.0 mb
Frequent itemsets count : 9
Total time ~ 4 ms
===================================================
Post-processing to show result in terms of string values.
Post-processing completed.
[
['-2 1 4 #SUP: 2'],
['-2 3 5 #SUP: 2'],
['3 2 5 #SUP: 2'],
['-2 3 2 #SUP: 2'],
['-2 1 3 #SUP: 2'],
['-2 1 2 #SUP: 2'],
['-2 1 5 #SUP: 2'],
['1 2 5 #SUP: 2'],
['-2 2 5 #SUP: 4']
]
In the above output, I am not sure how to interpret this negative value (-2) in the itemset.
Any pointers/hints from the community?
Thank you very much for this repo. It is very useful and helpful.
But when I use it in ToPKClass rule, the output which I get for any number of K (boundary for the number of rule to be generated) is 2. But when I run in jar file with the same input parameter the output which I expected is what I get. Are there any means to modify this?
Hello,
I appear to be encountering an issue with running algorithms from a windows system with a direct input. I've tested this with a number of different sized inputs that are nested lists.
Example input:
test = [ [[127], [128], [129], [130]], [[178], [179], [180], [181], [182], [183], [184], [185]], [[251], [252], [253], [254], [255], [256], [257]] ]
Example call:
spmf = Spmf('GPS', input_direct=test, arguments=[0.5]) spmf.run()
(I've also trialled with 'PrefixScan' algorithm)
Error text (with my actual username replaced):
File "C:\Python39\lib\site-packages\spmf_init_.py", line 46, in init
self.input_ = self.handle_input(
File "C:\Python39\lib\site-packages\spmf_init_.py", line 73, in handle_input
return self.write_temp_input_file(seq_spmf, ".txt")
File "C:\Python39\lib\site-packages\spmf_init_.py", line 87, in write_temp_input_file
os.rename(name, name + file_ending)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\username\AppData\Local\Temp\tmp1qxu06i7' -> 'C:\Users\username\AppData\Local\Temp\tmp1qxu06i7.txt'
Appears to be some issue in trying to modify the file extension without having properly closed the file stream?
I'm trying to run an instance of the HirateYamana with time constraints. In which format I should encode the dataset to involve the timestamp value for each subsequence?
e.g.
dataset = [
# sequence: list of events
[(1, ['a']), (2, ['a', 'b', 'c']), (3, ['a', 'c']), (4, ['c'])], # event: (timestamp : [list of item])
[(1, ['a']), (2, ['c']), (3, ['b', 'c'])],
[(1, ['a', 'b']), (2, ['d']), (3, ['c']), (4, ['b']), (5, ['c'])],
[(1, ['a']), (2, ['c']), (3, ['b']), (4, ['c'])]
]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.