Comments (9)
Looks like it is trying to allocate 655MB of memory which is not available. Can you run this test using the environment variable set:
MIOPEN_LOG_LEVEL=6
It will help us see what the configuration looks like. Also, the above message is not enough for us to understand what is going on. What is your system environment and total allocations for the model you're running?
from miopen.
export MIOPEN_LOG_LEVEL=6
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-18 14:38:49.807364: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7ff672f144f0
2018-07-18 14:38:49.807438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-18 14:38:49.807450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-18 14:38:49.807455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-18 14:38:49.807460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功,模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
已放弃 (核心已转储)
from miopen.
it 's look no more details when export MIOPEN_LOG_LEVEL=6.
hipconfig info:
HIP version : 1.5.18151
== hipconfig
HIP_PATH : /opt/rocm/hip
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/hip/include -I/opt/rocm/hcc/include
== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18184-74f5fa9-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive
=== Environment Variables
PATH=/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/asrtspeechenv/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/home/ken/bin:/home/ken/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin/
LD_LIBRARY_PATH=/opt/rocm/lib/
HIP_PATH=/opt/rocm/hip
HCC_HOME=/opt/rocm/hcc
== Linux Kernel
Hostname : ken-B250M-D3H
Linux ken-B250M-D3H 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
from miopen.
rocminfo
HSA System Attributes
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
Agent 1
Name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3800
BDFID: 0
Compute Unit: 4
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A
Agent 2
Name: gfx900
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 26751
Cacheline Size: 64
Max Clock Frequency (MHz):1630
BDFID: 768
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 50332672
Dim[2]: 604110848
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: AMD:AMDGPU:9:0:0
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***
from miopen.
rocm_bandwidth_test
......
....
RocmBandwidthTest Version: 1.0.0
Device: 0, Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Device: 1, Device 687f
Device Access
D/D 0 1
0 1 1
1 1 1
Device Numa Distance
D/D 0 1
0 0 N/A
1 0 0
Unidirectional peak bandwidth GB/s
D/D 0 1
0 N/A 13.915766
1 14.088893 394.403061
Bdirectional peak bandwidth GB/s
D/D 0 1
0 N/A 15.290195
1 15.624503 N/A
from miopen.
Epoch` 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
Seems like you're simply running out of memory, however let's try one more thing, can you rerun it using this environment variable:
MIOPEN_ENABLE_LOGGING=1
from miopen.
thanks for your help! @daniellowell
export MIOPEN_ENABLE_LOGGING=1
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-19 14:48:25.069862: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7f12e54dfa70
2018-07-19 14:48:25.069964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-19 14:48:25.069976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-19 14:48:25.069981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-19 14:48:25.069987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功,模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-19 14:48:27.635339: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 1
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 32
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 32
c = 1
h = 3
w = 3
}
MIOpen(HIP): miopenStatus_t miopenCreateConvolutionDescriptor(miopenConvolutionDescriptor_t *){
convDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenInitConvolutionDescriptor(miopenConvolutionDescriptor_t, miopenConvolutionMode_t, int, int, int, int, int, int){
convDesc = 0, 0, 1, 1, 1, 1,
c_mode = 0
pad_h = 1
pad_w = 1
u = 1
v = 1
dilation_h = 1
dilation_w = 1
}
MIOpen(HIP): miopenStatus_t miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t, const miopenTensorDescriptor_t, const miopenTensorDescriptor_t, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, size_t *){
wDesc = 32, 1, 3, 3
yDesc = 16, 32, 1600, 200
convDesc = 1, 1, 1, 1, 1, 1,
workSpaceSize = 14471916849344069120
}
MIOpen(HIP): miopenStatus_t miopenFindConvolutionForwardAlgorithm(miopenHandle_t, const miopenTensorDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, void *, const int, int *, miopenConvAlgoPerf_t *, void *, size_t, bool){
xDesc = 16, 1, 1600, 200
x = 0x909575200
wDesc = 32, 1, 3, 3
w = 0x908573600
convDesc = 1, 1, 1, 1, 1, 1,
yDesc = 16, 32, 1600, 200
y = 0x932542600
requestAlgoCount = 1
returnedAlgoCount = -4176939
perfResults =
workSpace = 0x959642600
workSpaceSize = 11520000
exhaustiveSearch = 0
}
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-19 14:48:27.636525: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
from miopen.
@greatken999 Can you try this on the current software stack.
from miopen.
@daniellowell ,sorry ,my vega64 had hangup problem now .
from miopen.
Related Issues (20)
- Add custom cmake test wrapper which can get gtest_parallel and run a single-binary test with custom regexp
- Rename gtests according to the new scheme
- Remove environment variables from gtest HOT 1
- Add base class for gtests
- Add approprite formatting function for gtests. HOT 2
- Revisit all gtests and check for appropriate assertion and test cases generation functions are used
- [HIP][OpenCL] Do not pass GPU type from the host code, use predefined macros provided by hip-clang instead
- GTest improvements HOT 3
- Switch GTest type from integration testing to unit testing.
- Add regexp check for gtests naming
- [tests] Remove default datatype and fail when datatype is not set. Fix tests. HOT 1
- miopenReduceTensor MIOPEN_REDUCE_TENSOR_AVG is failing when using f16 datatype
- [CI] failed: SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat.FloatTest_smoke_solver_ConvHipImplicitGemmBwdDataV4R1/0 HOT 1
- MIOpen not found by hipcc? (ld.lld: error: unable to find library) HOT 5
- [CI][Failure] shared_ptr_base.h:199:9: runtime error: member call on address which does not point to an object of type 'std::_Sp_counted_base<>' HOT 10
- [CI] test_tensor_transform unstable after #3184 HOT 3
- OpenCL deprication
- MIOpen 6.2.0 file not found grouped_convolution_forward_bilinear.hpp HOT 1
- Finish dropout kernel replacement started in PR #3088 HOT 1
- Missing gfx942 KDB file in ROCm 6.2 release for RHEL HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from miopen.