lolei / spmf-py Goto Github PK

Python SPMF Wrapper 🐍 🎁

License: GNU General Public License v3.0

Python 100.00%

spmf python wrapper data-mining pattern-mining frequent-patterns sequential-patterns hacktoberfest

spmf-py's Issues

ValueError: invalid literal for int() with base 10: '#SUP:'

Thank you for writing this wrapper.
I have an issue when using to_pandas_dataframe() method with 'Apriori_with_hash_tree'.
The following error is appear: ValueError: invalid literal for int() with base 10: '#SUP:'

from spmf import Spmf
spmf = Spmf("Apriori_with_hash_tree",
            input_filename="contextPasquier99_name.txt",
            output_filename="output.txt",
            arguments=[0.40, 30, 2])

spmf.run()
print(spmf.to_pandas_dataframe())
spmf.to_csv("output.csv")

contextPasquier99_name.txt

Regards.

FileNotFoundError at subprocess

Hello Sir,
Thank you very much for writing this wrapper.
I have an issue running the program, I would appreciate it if you helped me fix it.
I face a problem at the run function. Here is the log:

File "C:\ProgramData\Miniconda3\envs\vaenv\lib\site-packages\spmf_init_.py", line 102, in run
proc = subprocess.check_output(subprocess_arguments)
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 395, in check_output
**kwargs).stdout
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 775, in init
restore_signals, start_new_session)
File "C:\ProgramData\Miniconda3\envs\vaenv\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Here is an example of how I run the program from example.py (spfm.jar and the input file are both in the same directory as the example.py)

spmf_jar_dir = pathlib.Path(__file__).parent.absolute()

spmf = Spmf("PrefixSpan", input_filename="contextPrefixSpan.txt",spmf_bin_location_dir=spmf_jar_dir,
            output_filename="output.txt", arguments=[1, "", True])
spmf.run()

Here is what the variable status at line 102 of init.py

I've also tried with the absolute path of the input file, but got the same error.

[help] [frequent itemset mining] Understanding output with negative value

Looking for clarity on the output of FP Growth Algorithm.
I am doing frequent itemset mining and various times I see negative values in the output itemsets even though my data set doesn't contain negative values.
Curious as to how to interpret this negative value.

Below is an example:

from spmf import Spmf
input_example_list = [
    "1, 3, 4",
    "2, 3, 5",
    "1, 2, 3, 5",
    "2, 5",
    "1, 2, 4, 5"
]

spmf = Spmf("FPGrowth_itemsets",
            input_direct=input_example_list,
            input_type="text",
            output_filename="C:\\spaces\\igt_eye\\trials\\itemset\\output.txt",
            arguments=[0.4, 3, 3],
            spmf_bin_location_dir="\\site-packages\\spmf\\")
spmf.run()
print(spmf.parse_output())

This produces the following output:

=============  FP-GROWTH 2.42 - STATS =============
 Transactions count from database : 5
 Max memory usage: 8.0 mb 
 Frequent itemsets count : 9
 Total time ~ 4 ms
===================================================
Post-processing to show result in terms of string values.
Post-processing completed.

[
['-2 1 4 #SUP: 2'], 
['-2 3 5 #SUP: 2'], 
['3 2 5 #SUP: 2'], 
['-2 3 2 #SUP: 2'], 
['-2 1 3 #SUP: 2'], 
['-2 1 2 #SUP: 2'], 
['-2 1 5 #SUP: 2'], 
['1 2 5 #SUP: 2'], 
['-2 2 5 #SUP: 4']
]

In the above output, I am not sure how to interpret this negative value (-2) in the itemset.
Any pointers/hints from the community?

permission error spmf

When I ran this problem, I did not find a suitable solution, could you please take a look at this problem, if you can solve it, I would be very grateful, thank you

Gives different output when tested on other algorithm

Thank you very much for this repo. It is very useful and helpful.
But when I use it in ToPKClass rule, the output which I get for any number of K (boundary for the number of rule to be generated) is 2. But when I run in jar file with the same input parameter the output which I expected is what I get. Are there any means to modify this?

Windows PermissionError in temp file

Hello,

I appear to be encountering an issue with running algorithms from a windows system with a direct input. I've tested this with a number of different sized inputs that are nested lists.

Example input:
test = [ [[127], [128], [129], [130]], [[178], [179], [180], [181], [182], [183], [184], [185]], [[251], [252], [253], [254], [255], [256], [257]] ]

Example call:
spmf = Spmf('GPS', input_direct=test, arguments=[0.5]) spmf.run()
(I've also trialled with 'PrefixScan' algorithm)

Error text (with my actual username replaced):

File "C:\Python39\lib\site-packages\spmf_init_.py", line 46, in init
self.input_ = self.handle_input(
File "C:\Python39\lib\site-packages\spmf_init_.py", line 73, in handle_input
return self.write_temp_input_file(seq_spmf, ".txt")
File "C:\Python39\lib\site-packages\spmf_init_.py", line 87, in write_temp_input_file
os.rename(name, name + file_ending)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\username\AppData\Local\Temp\tmp1qxu06i7' -> 'C:\Users\username\AppData\Local\Temp\tmp1qxu06i7.txt'

Appears to be some issue in trying to modify the file extension without having properly closed the file stream?

How to handle datasets with timestamps for algorithms that involve time constraints?

I'm trying to run an instance of the HirateYamana with time constraints. In which format I should encode the dataset to involve the timestamp value for each subsequence?

e.g.

dataset = [
# sequence: list of events
[(1, ['a']), (2, ['a', 'b', 'c']), (3, ['a', 'c']), (4, ['c'])], # event: (timestamp : [list of item])
[(1, ['a']), (2, ['c']), (3, ['b', 'c'])],
[(1, ['a', 'b']), (2, ['d']), (3, ['c']), (4, ['b']), (5, ['c'])],
[(1, ['a']), (2, ['c']), (3, ['b']), (4, ['c'])]
]

lolei / spmf-py Goto Github PK

spmf-py's Issues

ValueError: invalid literal for int() with base 10: '#SUP:'

FileNotFoundError at subprocess

[help] [frequent itemset mining] Understanding output with negative value

permission error spmf

When I ran this problem, I did not find a suitable solution, could you please take a look at this problem, if you can solve it, I would be very grateful, thank you

Gives different output when tested on other algorithm

Windows PermissionError in temp file

How to handle datasets with timestamps for algorithms that involve time constraints?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent