2018-07-17 09:05:55.622488: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tu

rocm_bandwidth_test ...... .... <div class="snippet-clipboard-content notr

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

thanks for your help! <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 about miopen HOT 9 CLOSED

rocm commented on September 28, 2024

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000

from miopen.

Comments (9)

daniellowell commented on September 28, 2024

Looks like it is trying to allocate 655MB of memory which is not available. Can you run this test using the environment variable set:

MIOPEN_LOG_LEVEL=6

It will help us see what the configuration looks like. Also, the above message is not enough for us to understand what is going on. What is your system environment and total allocations for the model you're running?

from miopen.

greatken999 commented on September 28, 2024

export MIOPEN_LOG_LEVEL=6
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-18 14:38:49.807364: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7ff672f144f0
2018-07-18 14:38:49.807438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-18 14:38:49.807450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-18 14:38:49.807455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-18 14:38:49.807460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功，模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
已放弃 (核心已转储)

from miopen.

greatken999 commented on September 28, 2024

it 's look no more details when export MIOPEN_LOG_LEVEL=6.
hipconfig info:
HIP version : 1.5.18151

== hipconfig
HIP_PATH : /opt/rocm/hip
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/hip/include -I/opt/rocm/hcc/include

== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18184-74f5fa9-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake

Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/asrtspeechenv/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/home/ken/bin:/home/ken/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin/
LD_LIBRARY_PATH=/opt/rocm/lib/
HIP_PATH=/opt/rocm/hip
HCC_HOME=/opt/rocm/hcc

== Linux Kernel
Hostname : ken-B250M-D3H
Linux ken-B250M-D3H 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

from miopen.

greatken999 commented on September 28, 2024

rocminfo

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents

Agent 1

Name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3800
BDFID: 0
Compute Unit: 4
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A

Agent 2

Name: gfx900
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 26751
Cacheline Size: 64
Max Clock Frequency (MHz):1630
BDFID: 768
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 50332672
Dim[2]: 604110848
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: AMD:AMDGPU:9:0:0
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***

from miopen.

greatken999 commented on September 28, 2024

rocm_bandwidth_test
......
....

      RocmBandwidthTest Version: 1.0.0

      Device: 0,  Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
      Device: 1,  Device 687f

      Device Access

      D/D       0         1         

      0         1         1         

      1         1         1         


      Device Numa Distance

      D/D       0         1         

      0         0         N/A       

      1         0         0         


      Unidirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         13.915766   

      1         14.088893   394.403061  


      Bdirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         15.290195   

      1         15.624503   N/A

from miopen.

daniellowell commented on September 28, 2024

Epoch` 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

Seems like you're simply running out of memory, however let's try one more thing, can you rerun it using this environment variable:
MIOPEN_ENABLE_LOGGING=1

from miopen.

greatken999 commented on September 28, 2024

thanks for your help! @daniellowell
export MIOPEN_ENABLE_LOGGING=1
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-19 14:48:25.069862: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7f12e54dfa70
2018-07-19 14:48:25.069964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-19 14:48:25.069976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-19 14:48:25.069981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-19 14:48:25.069987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功，模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-19 14:48:27.635339: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 1
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 32
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 32
c = 1
h = 3
w = 3
}
MIOpen(HIP): miopenStatus_t miopenCreateConvolutionDescriptor(miopenConvolutionDescriptor_t *){
convDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenInitConvolutionDescriptor(miopenConvolutionDescriptor_t, miopenConvolutionMode_t, int, int, int, int, int, int){
convDesc = 0, 0, 1, 1, 1, 1,
c_mode = 0
pad_h = 1
pad_w = 1
u = 1
v = 1
dilation_h = 1
dilation_w = 1
}
MIOpen(HIP): miopenStatus_t miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t, const miopenTensorDescriptor_t, const miopenTensorDescriptor_t, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, size_t *){
wDesc = 32, 1, 3, 3
yDesc = 16, 32, 1600, 200
convDesc = 1, 1, 1, 1, 1, 1,
workSpaceSize = 14471916849344069120
}
MIOpen(HIP): miopenStatus_t miopenFindConvolutionForwardAlgorithm(miopenHandle_t, const miopenTensorDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, void *, const int, int *, miopenConvAlgoPerf_t *, void *, size_t, bool){
xDesc = 16, 1, 1600, 200
x = 0x909575200
wDesc = 32, 1, 3, 3
w = 0x908573600
convDesc = 1, 1, 1, 1, 1, 1,
yDesc = 16, 32, 1600, 200
y = 0x932542600
requestAlgoCount = 1
returnedAlgoCount = -4176939
perfResults =
workSpace = 0x959642600
workSpaceSize = 11520000
exhaustiveSearch = 0
}
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-19 14:48:27.636525: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

from miopen.

daniellowell commented on September 28, 2024

@greatken999 Can you try this on the current software stack.

from miopen.

greatken999 commented on September 28, 2024

@daniellowell ,sorry ,my vega64 had hangup problem now .

from miopen.

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 about miopen HOT 9 CLOSED

Comments (9)

rocminfo

HSA System Attributes

==========
HSA Agents

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (9)

rocminfo

HSA System Attributes

========== HSA Agents

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

==========
HSA Agents