Comments (13)
I'm going to jump in here and suggest that once we understand how the data was generated, this could serve as the basis for some good unit tests.
from amazon-dsstne.
I have been able to reproduce this issue on the DSSTNE AMI running on a g2.2xlarge EC2 instance, with the dataset provided. What I found is that while the predict utility is correctly loading all 65075 lines of the feature_input file, some of those lines contain duplicate IDs.
Line 64175, for example, is malformed. With hidden/control characters enabled in vi, you can see the formatting error (a second tab character):
4549498^I4549491^I4549528,10.0:4549526,10.0:4549498,10.0:4549501,10.0$
Both the generateNetCDF and predict applications should be able to detect this kind of error, and I will raise a separate issue to track that work. In the mean time, this should help you to fix the dataset itself.
from amazon-dsstne.
A quick glance at the code for the predict application suggests a couple of possible causes - we'll need to narrow this down.
We can dig into this further by rebuilding DSSTNE with the DEBUG flag enabled - the flag can be found in Makefile.inc under /src/amazon/dsstne, near the beginning of that file. If you can reproduce the issue on a debug build, the seg fault output should contain line numbers that will help narrow down the potential causes.
Be sure to run make clean
before running make
again.
Any other information you can provide (e.g. OS/distro, GPU used) would also be helpful.
from amazon-dsstne.
I enabled the debug flag as below, but I could not get more detail debug information.
Env info:
OS: Ubuntu 14.04
CUDA: release 7.5, V7.5.17, NVIDIA-SMI 352.39
GPU: GeForce GTX 970
===== Makefile.inc [start] =====
....
CPPFLAGS = -traditional -P -std=c++0x -DMEMTRACKING -gdwarf-3
....
DEBUG = 1
ifeq ($(DEBUG), 1)
$(info ************ DEBUG mode ************)
CFLAGS = -DOMPI_SKIP_MPICXX -std=c++0x -g -O0 -DMEMTRACKING -gdwarf-3
else
....
===== Makefile.inc [end] =====
===== Make Info [start] =====
************ DEBUG mode ************
make[1]: Entering directory `/home/dsstne/amazon-dsstne/src/amazon/dsstne/utils'
===== Make Info [end] =====
===== Exception Message [start] =====
GpuContext::Startup: Process 0 out of 1 initialized.
Allocating 8 bytes of GPU memory
Mem++: 8 8
GpuContext::Startup: Single node flag on GPU for process 0 is 1
GpuContext::Startup: P2P support flags on GPU for process 0 are 1 1
GpuContext::Startup: GPU for process 0 initialized.
GpuContext::SetRandomSeed: Random seed set to 12134.
Loaded input feature index with 65064 entries.
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 1.0682
Progress Parsing20000Time 1.0648
Progress Parsing30000Time 0.959654
Progress Parsing40000Time 0.987968
Progress Parsing50000Time 0.783489
Progress Parsing60000Time 0.800305
Exported gl_input_predict.samplesIndex with 65075 entries.
Raw max index is: 65064
Rounded up max index to: 65152
Created NetCDF file gl_input_predict.nc for dataset gl_input
Number of network input nodes: 65064
Number of entries to generate predictions for: 65075
LoadNetCDF: Loading UInt data set
NNDataSet::NNDataSet: Name of data set: gl_input
NNDataSet::NNDataSet: Attributes: Sparse Boolean
NNDataSet::NNDataSet: 1-dimensional data comprised of (65152, 1, 1) datapoints.
NNDataSet::NNDataSet: 3778407 total datapoints.
NNDataSet::NNDataSet: 65075 examples.
[snx-dsstne:04470] *** Process received signal ***
[snx-dsstne:04470] Signal: Segmentation fault (11)
[snx-dsstne:04470] Signal code: Address not mapped (1)
[snx-dsstne:04470] Failing at address: 0xc5f77f0
[snx-dsstne:04470] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fd1a2766330]
[snx-dsstne:04470] [ 1] predict[0x447eb7]
[snx-dsstne:04470] [ 2] predict[0x43714c]
[snx-dsstne:04470] [ 3] predict[0x431088]
[snx-dsstne:04470] [ 4] predict[0x42e1f8]
[snx-dsstne:04470] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fd1a23b2f45]
[snx-dsstne:04470] [ 6] predict[0x407d31]
[snx-dsstne:04470] *** End of error message ***
Segmentation fault (core dumped)
===== Exception Message [end] =====
from amazon-dsstne.
Never mind what I wrote, could you run this from gdb?
It looks like the dataset is corrupted somehow to me.
from amazon-dsstne.
Run this from gdb and got below info:
Starting program: /home/dsstne/amazon-dsstne/src/amazon/dsstne/bin/predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f dss_sku_sku -s recs -r dss_sku_sku
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffee132700 (LWP 4552)]
GpuContext::Startup: Process 0 out of 1 initialized.
[New Thread 0x7fffe598a700 (LWP 4553)]
[New Thread 0x7fffdcfff700 (LWP 4554)]
Allocating 8 bytes of GPU memory
Mem++: 8 8
GpuContext::Startup: Single node flag on GPU for process 0 is 1
GpuContext::Startup: P2P support flags on GPU for process 0 are 1 1
GpuContext::Startup: GPU for process 0 initialized.
GpuContext::SetRandomSeed: Random seed set to 12134.
Loaded input feature index with 65064 entries.
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 1.07443
Progress Parsing20000Time 1.07139
Progress Parsing30000Time 0.968824
Progress Parsing40000Time 0.994079
Progress Parsing50000Time 0.787785
Progress Parsing60000Time 0.80526
Exported gl_input_predict.samplesIndex with 65075 entries.
Raw max index is: 65064
Rounded up max index to: 65152
Created NetCDF file gl_input_predict.nc for dataset gl_input
Number of network input nodes: 65064
Number of entries to generate predictions for: 65075
LoadNetCDF: Loading UInt data set
NNDataSet::NNDataSet: Name of data set: gl_input
NNDataSet::NNDataSet: Attributes: Sparse Boolean
NNDataSet::NNDataSet: 1-dimensional data comprised of (65152, 1, 1) datapoints.
NNDataSet::NNDataSet: 3778407 total datapoints.
NNDataSet::NNDataSet: 65075 examples.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000447eb7 in NNDataSet::CalculateSparseDatapointCounts (this=0x8b69a40) at NNTypes.cpp:868
868 _vSparseDatapointCount[x]++;
from amazon-dsstne.
Awesome, so looking at that section:
// Calculate individual counts for each datapoint
uint64_t N = _width * _height * _length;
_vSparseDatapointCount.resize(N);
std::fill(_vSparseDatapointCount.begin(), _vSparseDatapointCount.end(), 0);
for (auto x : _vSparseIndex)
{
_vSparseDatapointCount[x]++;
}
You have a sparse index that is out of range, can you check that all your indices in
vector<uint32_t> _vSparseIndex
are < 65152 because I'm betting that they're not... Or in this case just test x.
from amazon-dsstne.
Is it an issue?
How to fix/bypass?
from amazon-dsstne.
Can u send us the steps you did and also a sampled data
from amazon-dsstne.
Yes, the dataset appears to be corrupted with out of range indices. How exactly was the dataset generated?
Also we should add guard code to detect this situation but we still have to fix the data set
from amazon-dsstne.
You can get the dataset from here: -- Coud you help to test?
https://s3.amazonaws.com/andy.tang.test/dataset.zip
generateNetCDF -d gl_input -i dss_sku_sku -o gl_input.nc -f features_input -s samples_input -c
generateNetCDF -d gl_output -i dss_sku_sku -o gl_output.nc -f features_output -s samples_input -c
train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 10
predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f dss_sku_sku -s recs -r dss_sku_sku
from amazon-dsstne.
Interesting, I get a different sized dataset.
./generateNetCDF -d gl_input -i dss_sku_sku -o gl_input.nc -f features_input -s samples_input -c
Flag -c is set. Will create a new feature file and overwrite: features_input
Generating dataset of type: indicator
Will create a new samples index file: samples_input
Will create a new features index file: features_input
Indexing 1 files
Indexing file: dss_sku_sku
Progress Parsing10000Time 0.827208
Progress Parsing20000Time 0.749772
Progress Parsing30000Time 0.670679
Progress Parsing40000Time 0.685743
Progress Parsing50000Time 0.54209
Progress Parsing60000Time 0.556289
Exported features_input with 65217 entries.
Exported samples_input with 65075 entries.
Raw max index is: 65217
Rounded up max index to: 65280
Created NetCDF file gl_input.nc for dataset gl_input
Total time for generating NetCDF: 4.54689 secs.
Can you pull ToT, rebuild, and try again?
from amazon-dsstne.
Thank you for your help!!
I fixed those malformed data. It work fine now.
from amazon-dsstne.
Related Issues (20)
- Predict with k=30 result on all item with score=0.000 HOT 1
- movielens predicting timestamp? HOT 2
- Data for tensorflow benchmark HOT 1
- Output layer question HOT 6
- same reccomendations for the most of the users HOT 1
- Use NNDataSet::_attributes Sparse throughout
- Run amazon-dsstne on the Google Colaboratory HOT 2
- Fix dsstne headers need to be included in order
- Remove Utils.h dependency from NNNetwork.cpp
- Build issues HOT 1
- How to deal with sparse time series before establishing a unified prediction model for a large number of time series?
- please help - can't compile in linux ubuntu RTERROR(status, "GpuBuffer::Deallocate failed (cudaFree)")
- Stream pointer might be NULL
- build error due to mismatch libnetcdf-c++4 API HOT 6
- build error due to legacy shuffle API HOT 3
- Input stream data and updating model
- Problem with netcdf after built on container HOT 1
- Amazonne
- Test dsstne module via python fails
- AMI in the setup guide is no available
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-dsstne.