keedi / rf-ace Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.55 MB

Automatically exported from code.google.com/p/rf-ace

Makefile 0.56% R 3.88% MATLAB 7.02% Shell 0.91% Batchfile 0.18% C++ 86.46% Python 0.99%

rf-ace's People

Contributors

Watchers

rf-ace's Issues

Seg fault on example ARFF file

What steps will reproduce the problem?
1. svn update
2. compile
3. run - see below

What is the expected output? What do you see instead?

billwhite@isaac~/src/rf-ace$ bin/rf_ace -I test_5by10_numeric_matrix.arff -i 5 
-O foo

 ------------------------------------------------------- 
|  RF-ACE version:  0.9.7, December 29th, 2011          |
|    Project page:  http://code.google.com/p/rf-ace     |
|     Report bugs:  [email protected]               |
 ------------------------------------------------------- 

Reading file 'test_5by10_numeric_matrix.arff', please wait... Segmentation fault
billwhite@isaac~/src/rf-ace$ 

What version of the product are you using? On what operating system?

Max OS X 10.6

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 10 Jan 2012 at 7:03

failed to compile on previously successful system

compiling with g++:

[rkreisbe@breve ~/rf-ace]$ g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
--enable-bootstrap --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-gnu-unique-object 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
--disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre 
--enable-libgcj-multifile --enable-java-maintainer-mode 
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib 
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) 


compile fails with:
g++ -O3 -std=c++0x -Wall -Wextra -pedantic -Isrc/ -lz src/rf_ace.cpp 
src/murmurhash3.cpp src/datadefs.cpp src/progress.cpp src/statistics.cpp 
src/math.cpp src/stochasticforest.cpp src/rootnode.cpp src/node.cpp 
src/treedata.cpp src/utils.cpp src/distributions.cpp src/reader.cpp 
src/feature.cpp -pthread -o bin/rf-ace
In file included from src/feature.hpp:10,
                 from src/feature.cpp:1:
src/datadefs.hpp: In function ‘bool datadefs::isNAN(const T&) [with T = 
std::unordered_set<unsigned int, std::hash<unsigned int>, 
std::equal_to<unsigned int>, std::allocator<unsigned int> >]’:
src/feature.cpp:144:   instantiated from here
src/datadefs.hpp:159: error: no match for ‘operator!=’ in ‘value != 
value’
make: *** [rf-ace] Error 1

Original issue reported on code.google.com by [email protected] on 21 Mar 2013 at 8:33

Add confidence interval to numerical perdictions

For categorical data predictions already contain confidence intervals, but for 
numerical data this feature is missing.

Original issue reported on code.google.com by [email protected] on 25 Mar 2012 at 4:51

seed parameter should be available for reproducability and comparability

What version of the product are you using? On what operating system?
rf_ace_v1.0.3_*
all operating systems

Please provide any additional information below.
The seed parameter for Random Forests should be available to be able to 
reproduce results where the trained model is not present any more. This is 
often the case where a huge amount of models is trained (I'm talking tens of 
thousands), and space on the harddrive is spared by just keeping the results.
Furthermore, it would be helpful for comparing the performance of the RF-ACE to 
other Random Forests like the ones in Mahout or Weka.

best regards,
Berni

Original issue reported on code.google.com by [email protected] on 22 Mar 2012 at 10:51

Segmentation fault reading .arff file

line 40 of treedata.cpp causes segmentation fault
sampleHeaders_.resize(rawMatrix[0].size(),"NO_SAMPLE_ID");
......................^^^^^^^^^^^^^^^^^^^

Original issue reported on code.google.com by [email protected] on 2 Jan 2012 at 3:10

OOP: Move performance evaluation responsibility from the Forest into the Tree

This would allow the program to become easier to manage on the tree level. 
Major implication: parallelizing building of trees across many CPUs/machines 
will become easier.

Original issue reported on code.google.com by [email protected] on 30 Mar 2012 at 3:27

Support for creating output directories

Currently, if one specifies a nonexistent directory in the output string, a 
segmentation fault is printed. I'll implement a platform independent support 
for creating directories.

Original issue reported on code.google.com by [email protected] on 1 Jul 2011 at 11:36

terminate called after throwing an instance of 'int'

Calling 

rf-ace --filter -I vector -i 0 -T vector.sub.arff -n 1000 -m 10 -o predictions

on the file attached gives 

terminate called after throwing an instance of 'int'

Does rf-ace support the sparse arff format?


What version of the product are you using? On what operating system?

|  RF-ACE version:  1.0.6, Aug 17 2012      |
|    Compile date:  Aug 23 2012, 17:04:14   |

uname -a : Linux lucid-vostro 2.6.35-32-generic-pae #68-Ubuntu SMP Tue Mar 27 
18:04:42 UTC 2012 i686 GNU/Linux

Original issue reported on code.google.com by digitalpebble on 30 Aug 2012 at 2:25

Attachments:

vector.sub.arff.zip

better description for the prediction results

It is not clear whether the predictions on the training data are out of bag 
(OOB) predictions, or not.

Original issue reported on code.google.com by [email protected] on 15 Jan 2012 at 7:37

Develop a format for reading/writing forests

Currently RF-ACE outputs a GBT predictor when parameter -F / --forest is used. 
However, the format lacks some crucial information and is not in very 
machine-readable. I think there should be only one format, which would be easy 
to read and interpret by both computer and human.

Original issue reported on code.google.com by [email protected] on 7 Jan 2012 at 11:50

error in test prediction

based on the model built on the attached training data, test predictions are 
all wrong. There should be problem.

Original issue reported on code.google.com by [email protected] on 15 Jan 2012 at 7:17

Attachments:

Improve train prediction caching

Now prediction caching is done per request, after growing the trees, but it can 
be done single-pass during tree-growing. StochasticForest will be responsible 
for storing the cached predictions and return them upon request. This will 
speed things up since importance score calculations rely heavily on train data 
predictions.

Original issue reported on code.google.com by [email protected] on 17 Aug 2012 at 10:12

Seed parameter not needed for prediction

What version of the product are you using? On what operating system?
rf-ace-predict-*.exe
Every OS

Please provide any additional information below.
The seed parameter should only play a role for building a classifier, but not 
for prediction. Hence, to avoid confusion it should be removed from both the 
interface as well as the command line feedback.

best regards,
Berni

Original issue reported on code.google.com by [email protected] on 30 Mar 2012 at 7:28

datadefs::mode(...) incorrectly specified interface; implementation causes erroneous behavior

The method interface and implementation of datadefs::mode(...) incorrectly 
handles for multiple values that occur with the same top frequency. In this 
case, the underlying implementation will select the first element in the 
natural key ordering of an STL std::map, effectively selecting the lower of two 
or more values.

This interface should be updated to return a set of values, and its underlying 
implementation refactored to not rely on the idiosyncrasies of max_element and 
related functions.

Original issue reported on code.google.com by [email protected] on 17 Aug 2011 at 8:50

Move parameter validation logic from options namespace to the main program

At the moment the logic is spread between the options namespace and the main 
program, which adds confusion. Thus, all logic will be lifted over to the main 
program eventually.

Original issue reported on code.google.com by [email protected] on 30 May 2012 at 10:45

Implement weighted feature sampling

It has been reported that feature selection problems with a gigantic number of 
features and only a tiny fraction of relevant features may prove to be 
problematic for RFs, however, which can be remedied by adapting sampling of 
features towards more informative ones. This will make base learners more 
accurate while retaining diversity of learners in the ensembles. See e.g.

http://bioinformatics.oxfordjournals.org/content/24/18/2010.abstract

http://clopinet.com/fextract-book/

for further information.

Original issue reported on code.google.com by [email protected] on 7 Jan 2012 at 11:37

Segmentation fault when using black list feature

What steps will reproduce the problem?
1. Executing bin/rf-ace --filter with -B option
2.
3.

What is the expected output? 
A successful execution
What do you see instead?
The program exits (139) with Segmentation fault. 

What version of the product are you using? 1.0.7
On what operating system?
CentOS release 6.3 (Final)
Linux 2.6.32-220.7.1.el6.x86_64 x86_64

Please provide any additional information below.
blacklist text file tested were: 
1. A list of feature names (one row per each)
2. Tab delimited line of feature names
3. 1 and 2 but with integers (index) of the features.

All tests resulted in the same error.

The exact execution without the -B option completed successfully and
produced the requested outputs.

Original issue reported on code.google.com by [email protected] on 4 Oct 2012 at 6:33

segmentation fault in generating associations

What steps will reproduce the problem?
1. run the following comment for the attached training file
rf_ace_win64 --traindata train.arff --target clas -O yaz.txt

What version of the product are you using? On what operating system?
Version 0.9.8, 64 bit version, on Windows 7 Home Premium (64 bit)

Original issue reported on code.google.com by [email protected] on 14 Jan 2012 at 12:55

Attachments:

train.arff

Making construction of Nodes in the trees dynamic

In order to make Randomforest and GBT lighter, a new dynamic construction 
process of trees will be introduced. This will also include the introduction of 
RootNode that has control over the child Nodes.

Original issue reported on code.google.com by [email protected] on 28 Jun 2011 at 10:01

Implement t-test for unequal population variances

The assumption of equal population variances may be one reason why p-values in 
some cases are behaving oddly.

Original issue reported on code.google.com by [email protected] on 30 Mar 2012 at 3:29

Moving the responsibility of node splitting away from Treedata

This is to make the class responsibilities clearer, and to make Treedata 
lighter as it is nor responsible of not just storing data but also for 
splitting it. Splitting will be the Node's responsibility in the future.

Original issue reported on code.google.com by [email protected] on 28 Jun 2011 at 9:58

different rf-ace behaviors between versions

Hi Timo,

I tried running the biovis feature on the new rf-ace release and got 
"No features match the specified target identifier '0'"
while an older release, r169, ran okay. 
I will try it with an older TCGA dataset too.

Thanks,
Jake

What steps will reproduce the problem?
1. feature matrix 
/proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv 7577x500 
run rf-ace_r227 (latest as of 07/12)

/proj/ilyalab/TCGA/rf-ace_r227/bin/rf_ace -I 
/proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv -i 0 -n 
500 -m 1000 -p 20 -t 1 -O associations_0.out

 --------------------------------------------------------------- 
| RF-ACE -- efficient feature selection with heterogeneous data |
|                                                               |
|  Version:      RF-ACE v0.5.8, July 8th, 2011                  |
|  Project page: http://code.google.com/p/rf-ace                |
|  Contact:      [email protected]                          |
|                [email protected]                        |
|                                                               |
|              DEVELOPMENT VERSION, BUGS EXIST!                 |
 --------------------------------------------------------------- 

Reading file 
'/proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv'
File type is unknown -- defaulting to Annotated Feature Matrix (AFM)
AFM orientation: features as rows
No features match the specified target identifier '0'


run older rf-ace version:
 rf-ace_r1169 (symlink)
/proj/ilyalab/TCGA/rf-ace/bin/rf_ace -I 
/proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv -i 0 -n 
500 -m 1000 -p 20 -t 1 -O associations_0.out

 --------------------------------------------------------------- 
| RF-ACE -- efficient feature selection with heterogeneous data |
|                                                               |
|  Version:      RF-ACE v0.3.1, June 24th, 2011                 |
|  Project page: http://code.google.com/p/rf-ace                |
|  Contact:      [email protected]                          |
|                                                               |
|              DEVELOPMENT VERSION, BUGS EXIST!                 |
 --------------------------------------------------------------- 

Reading file 
'/proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv'
File type is unknown -- defaulting to Annotated Feature Matrix (AFM)
AFM orientation: features as rows

RF-ACE parameter configuration:
  --input      = /proj/ilyalab/Patrick/bioviscontest_dataset_2011_v2/data/rf.input.tsv
  --nsamples   = 500 / 500 (0% missing)
  --nfeatures  = 7576
  --targetidx  = 0, header 'C:GENO:chr16:67319257:chr16:67319257:67319257::'
  --ntrees     = 500
  --mtry       = 1000
  --nodesize   = 25
  --nperms     = 20
  --pthresold  = 1
  --output     = associations_0.out

Growing 20 Random Forests (RFs), please wait...
  RF 1: 500 nodes (avg. 1 nodes / tree)
  RF 2: 500 nodes (avg. 1 nodes / tree)
  RF 3: 500 nodes (avg. 1 nodes / tree)
  RF 4: 500 nodes (avg. 1 nodes / tree)
  RF 5: 500 nodes (avg. 1 nodes / tree)
  RF 6: 500 nodes (avg. 1 nodes / tree)
  RF 7: 500 nodes (avg. 1 nodes / tree)
  RF 8: 500 nodes (avg. 1 nodes / tree)
  RF 9: 500 nodes (avg. 1 nodes / tree)
  RF 10: 500 nodes (avg. 1 nodes / tree)
  RF 11: 500 nodes (avg. 1 nodes / tree)
  RF 12: 500 nodes (avg. 1 nodes / tree)
  RF 13: 500 nodes (avg. 1 nodes / tree)
  RF 14: 500 nodes (avg. 1 nodes / tree)
  RF 15: 500 nodes (avg. 1 nodes / tree)
  RF 16: 500 nodes (avg. 1 nodes / tree)
  RF 17: 500 nodes (avg. 1 nodes / tree)
  RF 18: 500 nodes (avg. 1 nodes / tree)
  RF 19: 500 nodes (avg. 1 nodes / tree)
  RF 20: 500 nodes (avg. 1 nodes / tree)
20 RFs, 10000 trees, and 10000 nodes generated in 94.17 seconds (106.191 nodes 
per second)

Association file created. Format:
TARGET   PREDICTOR   P-VALUE   IMPORTANCE   CORRELATION

Done.


What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 12 Jul 2011 at 4:37

Gray Code change broke rf_ace on arff

What steps will reproduce the problem?
1. Point to a weka arff file as input.
2. Running rf_ace will fail.


What is the expected output? What do you see instead?
===> Uncovering associations...   0%rf_ace: src/partitionsequence.cpp:8: 
PartitionSequence::PartitionSequence(size_t): Assertion `nMaxLength < 
sizeof(graycode_t)' failed.


What version of the product are you using? On what operating system?
Latest trunk version. Reverting to last released version works.

Please provide any additional information below.

Original issue reported on code.google.com by sshivaji on 19 Oct 2011 at 1:50

make test in teh package fails

What steps will reproduce the problem?
1. Unpack the package (tar.gz)
2. run make, then make test

What is the expected output? What do you see instead?
- Expect some test to run and verify that the build is working
- Instead seeing multiple failures (see output below)
- Makefile seems to depend on stuff outside the package, like 
-I/home/erkkila2/include


What version of the product are you using? On what operating system?

Latest downloaded package: rf_ace_v1.0.4_src.tar.gz

Please provide any additional information below.

hostname 623 ~/src/rf-ace> make test
rm -f bin/test; g++ -L/home/erkkila2/lib -lcppunit -ldl -pedantic 
-I/home/erkkila2/include -I/usr/lib64/glib-2.12/include 
-I/usr/include/glib-2.12 -I/usr/ -Isrc/ test/run_tests.cpp src/progress.cpp 
src/statistics.cpp src/math.cpp src/gamma.cpp src/stochasticforest.cpp 
src/rootnode.cpp src/node.cpp src/splitter.cpp src/treedata.cpp src/mtrand.cpp 
src/datadefs.cpp src/utils.cpp -o bin/test -ggdb; ./bin/test
In file included from test/run_tests.cpp:6:0:
test/argparse_test.hpp:6:45: fatal error: cppunit/extensions/HelperMacros.h: No 
such file or directory
compilation terminated.
/bin/sh: ./bin/test: not found
make: *** [test] Error 127

Thanks.

Original issue reported on code.google.com by [email protected] on 10 Apr 2012 at 9:26

Implement sparse array

Two dimensional sparse array representation will be useful in some areas in 
RF-ACE, but also elsewhere.

Original issue reported on code.google.com by [email protected] on 22 Aug 2012 at 9:29

Extended output

What is the expected output? What do you see instead?
Currently, rf-ace-predict-win64.exe provides only 
TARGET   SAMPLE_ID     PREDICTION    CONFIDENCE
in the output file.
To better understand the outcome of the testrun additional information would be 
useful, such as accuracy, f-measure, ...
This summary could either be added to the console output or maybe even in an 
seperate file. Please find attached the output of WEKA for inspiration.

What version of the product are you using? On what operating system?
rf_ace_v1.0.4_win7_x64
WIN7

Original issue reported on code.google.com by [email protected] on 27 Mar 2012 at 7:28

Attachments:

testset_all.arff_trainset_all.arff.RF.200.10.model_result.txt

Corrected p-values

Transform p-values to corrected p-values / FDRs.

Original issue reported on code.google.com by [email protected] on 19 Jun 2012 at 1:08

OOP: turn RootNode into Tree

Currently RootNode is somewhat poor abstraction layer, considering that what it 
really does is it grows a tree.

Original issue reported on code.google.com by [email protected] on 30 Mar 2012 at 3:24

error returned when all features are "pruned"

When using the prune_features option, if all features are removed, RF-ACE 
appears to return an error.  I think it would be better not to call it an 
error, but simply normal behavior.  Maybe a warning could be written to the log 
file -- otherwise it's hard to be sure, when a few jobs out of 10s of 1000s 
return with an error whether it's a problem that needs to be tracked down or 
not.

Original issue reported on code.google.com by [email protected] on 22 Mar 2012 at 4:10

Implement prediction with novel data

One of the crucial features RF-ACE should have implemented ASAP. I've made 
considerable effort to implement this in an efficient manner, and I think we're 
only missing one piece from the puzzle.

Implementation proved more challenging than initially thought, mostly because I 
want to make prediction both fast and generic.

Original issue reported on code.google.com by [email protected] on 7 Jan 2012 at 11:33

unknown feature in blacklist causes 'Segmentation fault'

What steps will reproduce the problem?
1. add a non-existent feature to the black list
2. start rf-ace

What is the expected output? What do you see instead?
expected: an error message with the non-existent feature name
instead: Segmentation fault

What version of the product are you using? On what operating system?
1.0.7, Aug 28 2012, windows 7

Please provide any additional information below.
here is the command line and output:

pollux(src/2012_09_11_output)% 
/titan/cancerregulome9/workspaces/rf-ace/bin/rf-ace --filter --nThreads 1 -I 
2012_09_11_1704_preterm_cons.fm -i N:CLIN:TermCategory:NB:::: -O 
../2012_09_11_analysis/2012_09_11_1704_preterm_cons_22_bl_554_100_256.rf-ace.out
 -B bl.txt -S 22 -n 554 -m 100 -p 256

-----------------------------------------------------------
|  RF-ACE version:  1.0.7, Aug 28 2012                    |
|    Compile date:  Aug 28 2012, 00:14:10                 |
|   Report issues:  code.google.com/p/rf-ace/issues/list  |
-----------------------------------------------------------

===> Reading file '2012_09_11_1704_preterm_cons.fm', please wait... DONE
===> Reading blacklist 'bl.txt', please wait... DONE
===> Applying blacklist, keeping 557 / 603 features, please wait... 
Segmentation fault

Original issue reported on code.google.com by [email protected] on 13 Sep 2012 at 6:22

Make it possible to explicitly specify which features are to be tested per split

With this feature it would be easy to extend the StcohasticForest class to grow 
CARTs with arbitrary feature set restrictions. GBT implementation would become 
simplified also.

Original issue reported on code.google.com by [email protected] on 25 Mar 2012 at 10:30

some .arff files can't be parsed

What steps will reproduce the problem?
rf-ace-build-predictor-win64.exe -I oe1.train.arff -i class -O all.test.model -R

What is the expected output? What do you see instead?
Reading file 'oe1.train.arff', please wait... datadefs::str2num: ERROR: paramete
1513' could not be read properly. Quitting...
Assertion failed: false, file src\datadefs.cpp, line 168

What version of the product are you using? On what operating system?
WIN7, v1.0.3_win7_x64

Please provide any additional information below.
I attached a quite similar file (oe1.test.arff) which works fine. I already 
reduced both files to make it easier to track down the problem. The only 
difference I'm aware of is that the file that failes to be loaded was generated 
through appending operations via WEKA.
greetings,
Berni

Original issue reported on code.google.com by [email protected] on 22 Mar 2012 at 6:52

Attachments:

minor: uninitialized warning during make

What steps will reproduce the problem?
1. tar xzf rf_ace_v1.0.4_src.tar.gz
2. make

What is the expected output? What do you see instead?

- Expecting no warning during make

- Seeing:
src/node.cpp: In member function 'bool Node::regularSplitterSeek(Treedata*, 
size_t, const std::vector<long unsigned int>&, const std::vector<long unsigned 
int>&, const Node::GrowInstructions&, size_t&, std::vector<long unsigned int>&, 
std::vector<long unsigned int>&, datadefs::num_t&)':
src/node.cpp:377:92: warning: 'splitValue' may be used uninitialized in this 
function [-Wuninitialized]

In each step where node.cpp is compiled.


What version of the product are you using? On what operating system?
- v1.0.4 on Ubuntu Linux 11.10

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 9 Apr 2012 at 10:18

Large forests fail to be built

What steps will reproduce the problem?
1. Use a large trainset (48 features, 270000 instances)
2. Build forest with ntrees=300, mtry=12 features
3. Fails to be built, stacktrace is attached, but will not always be dumped


What version of the product are you using? On what operating system?
version r513
rf-ace-build-predictor.exe (built with gcc and make)
rf-ace-build-predictor-win64.exe (built with Visual Studio Express 10)

Please provide any additional information below.
I tried also 100 and 200 trees, which worked just fine, but with 300 there 
seems to be an issue. I don't think it's RAM related, since there were about 
6Gig free RAM left.

Original issue reported on code.google.com by [email protected] on 6 Apr 2012 at 7:10

Attachments:

rf-ace-build-predictor.exe.stackdump

Implement out-of-box prediction error output

Currently there are no built-in metrics based on which to assess the fitness of 
the generated models. OOB prediction error would give valuable information 
about how the forest fits the training data.

Original issue reported on code.google.com by [email protected] on 25 Feb 2012 at 8:15

Hide irrelevant parameters from the print-outs

Depending on whether RF, GBT, or CART is selected, some parameters become 
irrelevant. Say, with CART selected only one tree is grown and all features are 
tested with each split (so nTrees == 1 and mTry is irrelevant). Etc.

Original issue reported on code.google.com by [email protected] on 29 May 2012 at 8:04

Compilation error

What steps will reproduce the problem?
1. $ make

What is the expected output? What do you see instead?

The software is expected to build correctly, but gcc 4.7.1 raises many errors.


What version of the product are you using? On what operating system?

Arch Linux (current packages) with rf-ace 1.0.7.


Please provide any additional information below.

The errors are attached.

Original issue reported on code.google.com by [email protected] on 11 Sep 2012 at 6:14

Attachments:

Add proper NaN-checking for GBT

At the moment there may be some NaN-issues in the way GBT splits the nodes. 
This must be investigated.

Original issue reported on code.google.com by [email protected] on 4 Jul 2011 at 1:42

Implement new data type: ordinal

In most RF implementations, "ordinal" as feature type isn't supported, yet in 
many cases such data type is the most natural one. The good news is, ordinal 
feature type can be accounted for with very little modifications:

1. If ordinal feature splits, it is treated as numerical feature
   - IF encoded in a certain way
2. If ordinal feature is splitted, it is treated as categorical feature
   - no need to pay attention to internal formatting

Thus, an ordinal feature has the dual property of being both numerical and 
categorical at the same time. The proposed annotation for ordinal feature is 
naturally "O", e.g.

O:ordinal_feature

as per AFM notation. One problem arises: should the ARFF standard be extended 
to account for ordinal features?

Original issue reported on code.google.com by [email protected] on 27 Aug 2012 at 2:10

Implement grep for identifying target feature based on user input

Sometimes identifying the proper index in the input file for a desired target 
is cumbersome, but if one knows the name, that could be used instead.

Original issue reported on code.google.com by [email protected] on 5 Jul 2011 at 2:17

Add observables for feature frequency

It will be beneficial both for development and end-use to be able to assess the 
frequency at which a particular feature is showing up in the trees. Also, 
information about the show-up frequencies of contrast features, in comparison 
to real features, should assess quality of data.

Original issue reported on code.google.com by [email protected] on 3 Jul 2011 at 11:59

Confusing help regarding nodesize parameter

What steps will reproduce the problem?
1. Just call rf-ace-build-predictor-win64.exe without any parameter

What is the expected output? What do you see instead?
...
-s / --nodesize            Minimum number of train samples per node, affects 
tree depth
...

What version of the product are you using? On what operating system?
rf_ace_v1.0.4_*
Every OS

Please provide any additional information below.
The comment is a little bit confusing, since it could be interpreted as how 
many samples are at least used to determine the best split for the node.

Original issue reported on code.google.com by [email protected] on 31 Mar 2012 at 8:50

Working with whitelist

mtry parameter is set to default based on the all set of features although few 
features are provided with a whitelist.

It should be updated based on the number of features in the subset of features 
provided in the whitelist.

Original issue reported on code.google.com by [email protected] on 4 Apr 2012 at 5:43

Parsing ARFF files

What steps will reproduce the problem?
Running the following command after compiling the source code from SVN
bin/rf-ace -F test_5by10_numeric_matrix.arff -i 4 -n 100 -m 5 -A 
associations.tsv

What is the expected output? What do you see instead?
The expected output is a run of the feature selection process. Instead, it is 
reported that the target is missing from all samples.
Verbatim:
-----------------------------------------------------------
|  RF-ACE version:  1.1.0, Dec 5th 2012                   |
|    Compile date:  Feb 18 2013, 01:07:51                 |
|   Report issues:  code.google.com/p/rf-ace/issues/list  |
-----------------------------------------------------------

Random Forest (RF) configuration:
 -n / --nTrees         = 100
 -m / --mTry           = 5
 -s / --nodeSize       = 3
 -a / --nMaxLeaves     = 2147483646
 -q / --quantiles      = NOT SET
 -N / --noNABranching  = NOT SET

Filter options:
 -p / --nPerms         = 20
 -t / --pValueTh       = 0.05

-Reading file 'test_5by10_numeric_matrix.arff' for filtering

Feature 'y' chosen as target with 10 / 0 samples ( -inf % missing ) among 5 
features
Not enough samples (0) to perform a single split

What version of the product are you using? On what operating system?
RF-ACE version as in verbatim output above.
Operating system is Ubuntu precise (12.04.2 LTS)

Please provide any additional information below.
Same behaviour is observed with all ARFF files.

Original issue reported on code.google.com by [email protected] on 17 Feb 2013 at 7:46

minor: tar file contains many junk backup files

What steps will reproduce the problem?
1. tar tzvf rf_ace_v1.0.4_src.tar.gz | grep '~$' | wc -l

What is the expected output? What do you see instead?
- Expect to see 0 (zero)
- Seeing 47 (number of files ending with '~' in the tarball)

What version of the product are you using? On what operating system?
- rf-ace v1.0.4

Please provide any additional information below.

May want to add *~ to the 'clean' target in the Makefile too.  Thanks.

Original issue reported on code.google.com by [email protected] on 9 Apr 2012 at 10:46

ArgParse fails for select cases, due to dependency on GNU getopt_long

As documented in the source code, certain cases appear to cause ArgParse to 
fail after its rewrite to rely upon GNU C's getopt_long:

* When long arguments are specified in form "--longoption value" 
* When short arguments are packed together, such as "-abcd valueForD"

Wrapping this code in a standard try-catch fails since the code throws across 
linking barriers. Other attempts to catch the error are equally ineffectual.


Given the problems inherent to use of getopt_long as a drop-in rewrite of the 
previous iteration of ArgParse, while maintaining its inefficient time 
complexity, it's advised that this construct be rewritten to use a hashmap with 
a very limited, well-defined number of input cases. Such a framework is trivial 
once all of the supported input cases are defined.

Original issue reported on code.google.com by [email protected] on 15 Aug 2011 at 10:20

Segmentation fault

What steps will reproduce the problem?
1. valgrind --track-origins=yes  $RF/rf-ace-build-predictor -I 
$DATA/adult.test.arff -O tree -i class 2>err.log >std.log

What version of the product are you using? On what operating system?
0.9.9, February 2nd, 2012

64 bit Linux: Linux 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011 
x86_64 x86_64 x86_64 GNU/Linux

Original issue reported on code.google.com by [email protected] on 9 Feb 2012 at 11:40

Attachments:

files.tar.gz

parameter mtry should be accepting positive integers

What version of the product are you using? On what operating system?
rf_ace_v1.0.3_*
all operating systems


Please provide any additional information below.
The parameter mTry should not relate to the total amount of features, but 
instead accept a positive integer with the absolute values of features to be 
selected.
As a default value there are two suggestions which performed quite nice for me 
in the past:
1 - mTRy = root(M)...where M is the total amount of features as suggested by 
Breiman
2 - mTry = log2(M)+1...as implemented in WEKA

best regards,
Berni

Original issue reported on code.google.com by [email protected] on 22 Mar 2012 at 10:42

Default values of nmaxleaves and nodesize

What steps will reproduce the problem?
1. Train a predictor with default nmaxleaves and nodesize parameter:
rf-ace-build-predictor-win64.exe -I trainset_all.arff -i class -R -n 24 -m 12 
-O trainset_all.arff.model -S 1

What is the expected output? What do you see instead?
The default parameters aren't performing well in terms of OOB error.

What version of the product are you using? On what operating system?
rf_ace_v1.0.4_*
Every OS

Please provide any additional information below.
Breiman suggests to build unpruned trees for a Random Forest,
so I would like to propose to set the default values of nmaxleaves and nodesize 
in a manner that unpruned trees are generated.

best regards,
Berni

Original issue reported on code.google.com by [email protected] on 31 Mar 2012 at 8:21

keedi / rf-ace Goto Github PK

rf-ace's People

Contributors

Watchers

rf-ace's Issues

Recommend Projects

Recommend Topics

Recommend Org