Giter Club home page Giter Club logo

Comments (13)

tristanpenman avatar tristanpenman commented on August 28, 2024 2

I'm going to jump in here and suggest that once we understand how the data was generated, this could serve as the basis for some good unit tests.

from amazon-dsstne.

tristanpenman avatar tristanpenman commented on August 28, 2024 1

I have been able to reproduce this issue on the DSSTNE AMI running on a g2.2xlarge EC2 instance, with the dataset provided. What I found is that while the predict utility is correctly loading all 65075 lines of the feature_input file, some of those lines contain duplicate IDs.

Line 64175, for example, is malformed. With hidden/control characters enabled in vi, you can see the formatting error (a second tab character):

4549498^I4549491^I4549528,10.0:4549526,10.0:4549498,10.0:4549501,10.0$

Both the generateNetCDF and predict applications should be able to detect this kind of error, and I will raise a separate issue to track that work. In the mean time, this should help you to fix the dataset itself.

from amazon-dsstne.

tristanpenman avatar tristanpenman commented on August 28, 2024

A quick glance at the code for the predict application suggests a couple of possible causes - we'll need to narrow this down.

We can dig into this further by rebuilding DSSTNE with the DEBUG flag enabled - the flag can be found in Makefile.inc under /src/amazon/dsstne, near the beginning of that file. If you can reproduce the issue on a debug build, the seg fault output should contain line numbers that will help narrow down the potential causes.

Be sure to run make clean before running make again.

Any other information you can provide (e.g. OS/distro, GPU used) would also be helpful.

from amazon-dsstne.

oyotong avatar oyotong commented on August 28, 2024

I enabled the debug flag as below, but I could not get more detail debug information.
Env info:
OS: Ubuntu 14.04
CUDA: release 7.5, V7.5.17, NVIDIA-SMI 352.39
GPU: GeForce GTX 970

===== Makefile.inc [start] =====
....
CPPFLAGS = -traditional -P -std=c++0x -DMEMTRACKING -gdwarf-3
....
DEBUG = 1
ifeq ($(DEBUG), 1)
$(info ************ DEBUG mode ************)
CFLAGS = -DOMPI_SKIP_MPICXX -std=c++0x -g -O0 -DMEMTRACKING -gdwarf-3
else
....
===== Makefile.inc [end] =====

===== Make Info [start] =====
************ DEBUG mode ************
make[1]: Entering directory `/home/dsstne/amazon-dsstne/src/amazon/dsstne/utils'
===== Make Info [end] =====

===== Exception Message [start] =====
GpuContext::Startup: Process 0 out of 1 initialized.
Allocating 8 bytes of GPU memory
Mem++: 8 8
GpuContext::Startup: Single node flag on GPU for process 0 is 1
GpuContext::Startup: P2P support flags on GPU for process 0 are 1 1
GpuContext::Startup: GPU for process 0 initialized.
GpuContext::SetRandomSeed: Random seed set to 12134.
Loaded input feature index with 65064 entries.
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 1.0682
Progress Parsing20000Time 1.0648
Progress Parsing30000Time 0.959654
Progress Parsing40000Time 0.987968
Progress Parsing50000Time 0.783489
Progress Parsing60000Time 0.800305
Exported gl_input_predict.samplesIndex with 65075 entries.
Raw max index is: 65064
Rounded up max index to: 65152
Created NetCDF file gl_input_predict.nc for dataset gl_input
Number of network input nodes: 65064
Number of entries to generate predictions for: 65075
LoadNetCDF: Loading UInt data set
NNDataSet::NNDataSet: Name of data set: gl_input
NNDataSet::NNDataSet: Attributes: Sparse Boolean
NNDataSet::NNDataSet: 1-dimensional data comprised of (65152, 1, 1) datapoints.
NNDataSet::NNDataSet: 3778407 total datapoints.
NNDataSet::NNDataSet: 65075 examples.
[snx-dsstne:04470] *** Process received signal ***
[snx-dsstne:04470] Signal: Segmentation fault (11)
[snx-dsstne:04470] Signal code: Address not mapped (1)
[snx-dsstne:04470] Failing at address: 0xc5f77f0
[snx-dsstne:04470] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fd1a2766330]
[snx-dsstne:04470] [ 1] predict[0x447eb7]
[snx-dsstne:04470] [ 2] predict[0x43714c]
[snx-dsstne:04470] [ 3] predict[0x431088]
[snx-dsstne:04470] [ 4] predict[0x42e1f8]
[snx-dsstne:04470] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fd1a23b2f45]
[snx-dsstne:04470] [ 6] predict[0x407d31]
[snx-dsstne:04470] *** End of error message ***
Segmentation fault (core dumped)
===== Exception Message [end] =====

from amazon-dsstne.

scottlegrand avatar scottlegrand commented on August 28, 2024

Never mind what I wrote, could you run this from gdb?

It looks like the dataset is corrupted somehow to me.

from amazon-dsstne.

oyotong avatar oyotong commented on August 28, 2024

Run this from gdb and got below info:

Starting program: /home/dsstne/amazon-dsstne/src/amazon/dsstne/bin/predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f dss_sku_sku -s recs -r dss_sku_sku
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffee132700 (LWP 4552)]
GpuContext::Startup: Process 0 out of 1 initialized.
[New Thread 0x7fffe598a700 (LWP 4553)]
[New Thread 0x7fffdcfff700 (LWP 4554)]
Allocating 8 bytes of GPU memory
Mem++: 8 8
GpuContext::Startup: Single node flag on GPU for process 0 is 1
GpuContext::Startup: P2P support flags on GPU for process 0 are 1 1
GpuContext::Startup: GPU for process 0 initialized.
GpuContext::SetRandomSeed: Random seed set to 12134.
Loaded input feature index with 65064 entries.
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 1.07443
Progress Parsing20000Time 1.07139
Progress Parsing30000Time 0.968824
Progress Parsing40000Time 0.994079
Progress Parsing50000Time 0.787785
Progress Parsing60000Time 0.80526
Exported gl_input_predict.samplesIndex with 65075 entries.
Raw max index is: 65064
Rounded up max index to: 65152
Created NetCDF file gl_input_predict.nc for dataset gl_input
Number of network input nodes: 65064
Number of entries to generate predictions for: 65075
LoadNetCDF: Loading UInt data set
NNDataSet::NNDataSet: Name of data set: gl_input
NNDataSet::NNDataSet: Attributes: Sparse Boolean
NNDataSet::NNDataSet: 1-dimensional data comprised of (65152, 1, 1) datapoints.
NNDataSet::NNDataSet: 3778407 total datapoints.
NNDataSet::NNDataSet: 65075 examples.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000447eb7 in NNDataSet::CalculateSparseDatapointCounts (this=0x8b69a40) at NNTypes.cpp:868

868 _vSparseDatapointCount[x]++;

from amazon-dsstne.

scottlegrand avatar scottlegrand commented on August 28, 2024

Awesome, so looking at that section:
// Calculate individual counts for each datapoint
uint64_t N = _width * _height * _length;
_vSparseDatapointCount.resize(N);
std::fill(_vSparseDatapointCount.begin(), _vSparseDatapointCount.end(), 0);
for (auto x : _vSparseIndex)
{
_vSparseDatapointCount[x]++;
}
You have a sparse index that is out of range, can you check that all your indices in

vector<uint32_t> _vSparseIndex

are < 65152 because I'm betting that they're not... Or in this case just test x.

from amazon-dsstne.

oyotong avatar oyotong commented on August 28, 2024

Is it an issue?
How to fix/bypass?

from amazon-dsstne.

rgeorgej avatar rgeorgej commented on August 28, 2024

Can u send us the steps you did and also a sampled data

from amazon-dsstne.

scottlegrand avatar scottlegrand commented on August 28, 2024

Yes, the dataset appears to be corrupted with out of range indices. How exactly was the dataset generated?

Also we should add guard code to detect this situation but we still have to fix the data set

from amazon-dsstne.

oyotong avatar oyotong commented on August 28, 2024

You can get the dataset from here: -- Coud you help to test?
https://s3.amazonaws.com/andy.tang.test/dataset.zip

generateNetCDF -d gl_input -i dss_sku_sku -o gl_input.nc -f features_input -s samples_input -c
generateNetCDF -d gl_output -i dss_sku_sku -o gl_output.nc -f features_output -s samples_input -c
train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10
predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f dss_sku_sku -s recs -r dss_sku_sku

from amazon-dsstne.

scottlegrand avatar scottlegrand commented on August 28, 2024

Interesting, I get a different sized dataset.
./generateNetCDF -d gl_input -i dss_sku_sku -o gl_input.nc -f features_input -s samples_input -c
Flag -c is set. Will create a new feature file and overwrite: features_input
Generating dataset of type: indicator
Will create a new samples index file: samples_input
Will create a new features index file: features_input
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 0.827208
Progress Parsing20000Time 0.749772
Progress Parsing30000Time 0.670679
Progress Parsing40000Time 0.685743
Progress Parsing50000Time 0.54209
Progress Parsing60000Time 0.556289
Exported features_input with 65217 entries.
Exported samples_input with 65075 entries.
Raw max index is: 65217
Rounded up max index to: 65280
Created NetCDF file gl_input.nc for dataset gl_input
Total time for generating NetCDF: 4.54689 secs.

Can you pull ToT, rebuild, and try again?

from amazon-dsstne.

oyotong avatar oyotong commented on August 28, 2024

Thank you for your help!!

I fixed those malformed data. It work fine now.

from amazon-dsstne.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.