Giter Club home page Giter Club logo

cmu-safari / ramulator-pim Goto Github PK

View Code? Open in Web Editor NEW
132.0 12.0 58.0 48.19 MB

A fast and flexible simulation infrastructure for exploring general-purpose processing-in-memory (PIM) architectures. Ramulator-PIM combines a widely-used simulator for out-of-order and in-order processors (ZSim) with Ramulator, a DRAM simulator with memory models for DDRx, LPDDRx, GDDRx, WIOx, HBMx, and HMCx. Ramulator is described in the IEEE CAL 2015 paper by Kim et al. at https://people.inf.ethz.ch/omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf Ramulator-PIM is used in the DAC 2019 paper by Singh et al. at https://people.inf.ethz.ch/omutlu/pub/NAPEL-near-memory-computing-performance-prediction-via-ML_dac19.pdf

Makefile 3.31% Python 0.88% Shell 4.55% C++ 59.43% C 17.07% Roff 8.08% TeX 1.94% M4 0.02% CMake 0.05% Batchfile 0.02% Ruby 0.01% Fortran 0.01% Java 0.01% HTML 0.47% CSS 0.16% Perl 1.89% PHP 0.18% Assembly 1.91% GDB 0.02% Objective-C 0.02%

ramulator-pim's Introduction

ZSim+Ramulator - A Processing-in-Memory Simulation Framework

ZSim+Ramulator is a framework for design space exploration of general-purpose Processing-in-Memory (PIM) architectures. The framework is based on two widely-known simulators: ZSim [1] and Ramulator [2][3].

We consider a computing system that includes host CPU cores and general-purpose PIM cores. The PIM cores are placed in the logic layer of a 3D-stacked memory (Ramulator's HMC model). With this simulation framework, we can simulate host CPU cores and general-purpose PIM cores with the aim of comparing the performance of both for an application or parts of it. The simulation framework does not currently support concurrent execution on host and PIM cores.

We use ZSim to generate memory traces that are fed to Ramulator. We modified ZSim to generate two type of traces: 1) filtered traces for simulation of the host CPU cores, and 2) unfiltered traces for the simulation of the PIM cores.

  1. Filtered traces: We obtain them by gathering memory requests at the memory controller. This way, the cache hierarchy of the host (including the coherence protocol) is simulated in ZSim. ZSim can also simulate hardware prefetchers.

  2. Unfiltered traces: We obtain them by gathering all memory requests as soon as they are issued by the core pipeline.

Ramulator simulates the memory accesses of the host cores and the PIM cores by using, respectively, the filtered and unfiltered traces produced with ZSim. Ramulator contains simple models of out-of-order and in-order cores that can be used for simulation of host and PIM. For the simulation of the PIM cores in the logic layer of 3D-stacked memory, we modified Ramulator to avoid the overheads of the off-chip link.

Citation

Please cite the following papers if you find this simulation infrastructure useful:

Yoongu Kim, Weikun Yang, and Onur Mutlu, "Ramulator: A Fast and Extensible DRAM Simulator". IEEE Computer Architecture Letters (CAL), March 2015.

Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo Francisco de Oliveira, Stefano Corda, Sander Stujik, Onur Mutlu, and Henk Corporaal, "NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning". Proceedings of the 56th Design Automation Conference (DAC), Las Vegas, NV, USA, June 2019.

Repository Structure and Installation

We point out next the repository structure and some important folders and files.

.
+-- README.md
+-- common/
|   +-- DRAMPower/
+-- ramulator/
|   +-- Configs/
|   +-- sample_traces/
|	|	+-- host/
|	|	+-- pim/
|	+-- src/
+-- zsim-ramulator/
|   +-- benchmarks/
|	|	+-- rodinia/
|   +-- clean.sh
|   +-- compile.sh
|   +-- setup.sh
|   +-- tests/
|	|	+-- host.cfg
|	|	+-- host_prefetch.cfg
|	|	+-- pim.cfg

Prerequisites

Our framework requires both ZSim and Ramulator dependencies.

  • Ramulator requires a C++11 compiler (e.g., clang++, g++-5).
  • ZSim requires gcc >=4.6, pin, scons, libconfig, libhdf5, libelfg0. We provide two scripts setup.sh and compile.sh under zsim-ramulator to facilitate ZSim's installation. The first one installs all ZSim's dependencies. The second one compiles ZSim.

Installing

To install ZSim:

cd zsim-ramulator
sudo sh setup.sh
sh compile.sh

To install Ramulator:

cd ramulator
make -j

Alternatively, to resolve all dependencies on the common libraries, you can use the below commands:

sh compileramulator.sh

Generating Traces with ZSim

There are three steps to generate traces with ZSim:

  1. Instrument the code with the hooks provided in zsim-ramulator/misc/hooks/zsim_hooks.h.
  2. Create configuration files for ZSim.
  3. Run.

Next, we describe the three steps in detail:

  1. First, we identify the application's hotspot. We refer to it as offload region, i.e., the region of code that will run in the PIM cores. We instrument the application by including the following code:
#include "zsim-ramulator/misc/hooks/zsim_hooks.h"
foo(){
    /*
    * zsim_roi_begin() marks the beginning of the region of interest (ROI).
    * It must be included in a serial part of the code.
    */
	zsim_roi_begin(); 
	zsim_PIM_function_begin(); // Indicates the beginning of the code to simulate (hotspot).
	...
	zsim_PIM_function_end(); // Indicates the end of the code to simulate.
    /*
    * zsim_roi_end() marks the end of the ROI. 
    * It must be included in a serial part of the code. 
    */
	zsim_roi_end(); 
}
  1. Second, we create the configuration files to execute the application using ZSim. Sample configuration files for filtered (host.cfg, host_prefetch.cfg) and unfiltered (pim.cfg) traces are provided under zsim-ramulator/tests/. Please, check those files to understand how to configure the number of cores, number of caches and their sizes, and number prefetchers. Next, we describe other important knobs that can be changed in the configuration files:
  • only_offload=true|false: When set to true, ZSim will generate traces only between zsim_PIM_function_begin and zsim_PIM_function_end. When set to false, it will generate traces for the whole ROI.
  • pim_traces=true|false: When set to true, ZSim will generate unfiltered traces. When set to false, it will generate filtered traces.
  • instr_traces=true|false: When set to true, ZSim will also get traces for Instruction Cache Misses.
  • outFile=string: Name of the output file.
  • max_offload_instrs: Maximum number of offload instructions to instrument.
  • merge_hostTraces = true|false: When set to true, ZSim will generate a single file with the whole trace for N Cores. When set to false, it will generate one trace file per core. Setting this flag to true slows down the trace collection significantly due to syncronization.
  1. Third, we run ZSim:
./build/opt/zsim configuration_file.cfg

Trace format

ZSim generates traces with the following format:

THREAD_ID PROCESSOR_ID INSTR_NUM TYPE ADDRESS SIZE

where TYPE can be:

  • L: Memory trace of a load (read) request
  • S: Memory trace of a store (write) request
  • P: Memory trace of a prefetching request
  • I: Memory trace of an instruction request

The other fields in the trace file are:

  • THREAD_ID: Which thread generated the request.
  • PROCESSOR_ID: Which core generated the request.
  • INSTR_NUM: The number of non-memory instructions that were executed before the memory request was generated.
  • ADDRESS: The virtual memory address of the memory request.
  • SIZE: The size of the memory request in bytes.

Example Trace Generation

Under zsim_ramulator/benchmarks we provide a sample instrumented code for the BFS implementation included in the Rodinia benchmarks suite. The code is under zsim_ramulator/benchmarks/rodinia. We instrument the function BFSGraph. To compile the application:

cd zsim_ramulator/benchmarks/rodinia/
make 

The sample ZSim configuration files under tests (host.cfg, host_prefetch.cfg,pim.cfg) are configured to read the binary generated by the previous step and run the BFS application. Therefore, to generate the trace files, from zsim-ramulator:

./build/opt/zsim tests/host.cfg
./build/opt/zsim tests/host_prefetch.cfg
./build/opt/zsim tests/pim.cfg

These will generate the trace files under zsim-ramulator (rodiniaBFS.out.*,rodiniaBFS-prefetcher.out.*, pim-rodiniaBFS.out.*, respectively).

Running Ramulator

This is a modified version of Ramulator for simulation of PIM architectures. This version of Ramulator can simulate both host and PIM cores. To configure the mode, include in the configuration file (.cfg) one of the next two options:

  • pim_mode = 0 (for host)
  • pim_mode = 1 (for pim)

Sample configuration files are provided under ramulator/Configs/.

Some other knobs that can be configured are:

  • --config: Ramulator's configuration file.
  • --stats: The file to which Ramulator will write the results of the simulator.
  • --trace: The trace file that should be loaded.
  • --disable-per-scheduling true|false: When set to true, it enables perfect memory scheduling, where each memory request is placed in the respective vault based on HMC interleaving. When set to false, the requests are scheduled based on the PROCESSOR_ID in each memory trace of the trace file.
  • --core-org=outOrder|inOrder: For simulation of out-of-order or in-order cores.
  • --number-cores=: Number of cores to simulate.
  • --trace-format=zsim|pin|pisa: Defines the source of the memory traces. Use zsim for filtered or unfiltered traces generated with our modified ZSim. Other options are traces generated with a Pin tool or with PISA.
  • --split-trace=true|false: When set to true, Ramulator will open a single trace file, store it in memory, and split the trace file among the number-cores cores according to the core-id in the trace. When set to false, Ramulator will open one trace file per core and read it line-by-line during the simulation (it expects each trace to end with .core_id -- from 0 ... number-cores-1). We strongly suggest to use --split-trace=true because large trace files might lead to memory overload.
  • --mode=cpu: Ramulator can simulate either a system with cpu and DRAM, or only the DRAM by itself. Ramulator must operate in CPU trace mode for this framework.

Sample ZSim trace files are provided under sample_traces/. Before using them, decompress each trace file.

To run the host simulation:

./ramulator --config Configs/host.cfg --disable-perf-scheduling true --mode=cpu --stats host.stats --trace sample_traces/host/rodiniaBFS.out --core-org=outOrder --number-cores=4 --trace-format=zsim --split-trace=true

To run the PIM simulation:

./ramulator --config Configs/pim.cfg --disable-perf-scheduling true --mode=cpu --stats pim.stats --trace sample_traces/pim/pim-rodiniaBFS.out --core-org=outOrder --number-cores=4 --trace-format=zsim --split-trace=true

Acknowledgments

The development of this simulation framework was partially supported by the Semiconductor Research Corporation. We also thank our industrial partners, especially Alibaba, Facebook, Google, Huawei, Intel, Microsoft, and VMware, for their generous donations.

ramulator-pim's People

Contributors

el1goluj avatar geraldofojunior avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ramulator-pim's Issues

ZSim Cache Stats don't match Trace-files

Hello,
in issue #6 my advisor, Veronia Iskandar, asked about comparing the IPC of host and PIM simulation runs.
The response by @gearaldojunior stated that the ZSim simulation stats should be used to complete the total cycle count.

I have tried to do this for my evaluation, but for most of my simulations the sum of L1 hits and misses does not match the total lines in the unfiltered trace-files. In fact zsim reports 100% more cache accesses than it prints in the
trace.

For a simulation of the BFS search algo with two offloading regions I got the following results:

L1d_filtered+L1i_filtered+l1d_hits+L1i_hits+L1d_misses+L1i_misses = 8,005,603

cat pim-rodiniaBFS.out.[0-3] | wc -l > 4,238,695

All files necessary to reproduce these numbers are included here:
bfs_stats.zip

There are also other simulations I have done where the discrepancies are even higher. I wasn't able to identify any single stat or category of trace instruction (L/S/I) that is to blame, none of the match.

For the host run, the number of L3 hits and misses matches the number of lines in the filtered trace very closely, so this mismatch in the PIM stats and output has me made me uncertain about how to calculate correct runtime results.

Looking at the program flow starting from the ooo_core.cpp where the tracefile is written to, it seems like exactly one of the aggregate stats in coherence_ctrl.cpp should be incremented. So I don't understand how this mismatch emerges.

I am hoping to utilize ramulator-pim for the evaluation of my master-thesis experiments, any advice on how to resolve this issue would be greatly appreciated.

Thanks,
Johannes Kath

Changing number of TSVs on HMC

Just wondering if there is any way to modify the number of TSV channels in the HMC file, as we see in the file ramulator/src/HMC.h line 201 indicates the number of TSV channels per vault (variable: channel_width). I have tried increasing the default number from 32 to 64, and it made no difference in the final results. Wondering if changing the channel_width parameter makes a difference. I am very new to this software and still learning the ropes.

trace file is not being generated , the cycle count shows zero

ramulator.active_cycles_0 0 # Total active cycles for level _0
ramulator.busy_cycles_0 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0
ramulator.serving_requests_0 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0
ramulator.average_serving_requests_0 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0
ramulator.active_cycles_0_0 0 # Total active cycles for level _0_0
ramulator.busy_cycles_0_0 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0
ramulator.serving_requests_0_0 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0
ramulator.average_serving_requests_0_0 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0
ramulator.active_cycles_0_0_0 0 # Total active cycles for level _0_0_0
ramulator.busy_cycles_0_0_0 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0
ramulator.serving_requests_0_0_0 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0
ramulator.average_serving_requests_0_0_0 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0
ramulator.active_cycles_0_0_0_0 0 # Total active cycles for level _0_0_0_0
ramulator.busy_cycles_0_0_0_0 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0_0
ramulator.serving_requests_0_0_0_0 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_0
ramulator.average_serving_requests_0_0_0_0 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_0
ramulator.active_cycles_0_0_0_1 0 # Total active cycles for level _0_0_0_1
ramulator.busy_cycles_0_0_0_1 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0_1
ramulator.serving_requests_0_0_0_1 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_1
ramulator.average_serving_requests_0_0_0_1 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_1
ramulator.active_cycles_0_0_0_2 0 # Total active cycles for level _0_0_0_2
ramulator.busy_cycles_0_0_0_2 0 # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0_2
ramulator.serving_requests_0_0_0_2 0 # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_2
ramulator.average_serving_requests_0_0_0_2 -nan # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0_2
ramulator.active_cycles_0_0_0_3 0 # Total active cycles for level _0_0_0_3

can anyone help me solve this issue as the trace with both pim as well as host the output of cycles is 0..

Pin Runtime Error: ExecuteSysArchPrct1: 664: Unknown sub-function for SYS_arch_prctl

It seems pin is incompatible with kernel 5.4.0-128-generic (Ubuntu 20.04.5 LTS) when running tests/pim.cfg

Any leads will be much appreciated.

$ ./build/opt/zsim tests/pim.cfg                                                                                                                                                                                                     [H] Starting zsim, built Mon 17 Oct 2022 10:48:20 AM UTC (rev no git repo)                                                                                                                                                           [H] Creating global segment, 1024 MBs                                                                                                                                                                                                [H] Global segment shmid = 1703992                                                                                                                                                                                                   [H] Deadlock detection ON                                                                                                                                                                                                            [S 0] Started instance                                                                                                                                                                                                               [S 0] Started RR scheduler, quantum=10000 phases                                                                                                                                                                                     prefix: sys.caches.l3.                                                                                                                                                                                                               ****Prefetcher Stream                                                                                                                                                                                                                prefix: sys.caches.l2.                                                                                                                                                                                                               ****Prefetcher Stream                                                                                                                                                                                                                prefix: sys.caches.l1i.                                                                                                                                                                                                              ****Prefetcher Stream                                                                                                                                                                                                                prefix: sys.caches.l1d.                                                                                                                                                                                                              ****Prefetcher Stream                                                                                                                                                                                                                [S 0] Hierarchy: [ l1i-0 l1d-0 ] -> l2-0                                                                                                                                                                                             [S 0] Hierarchy: [ l1i-1 l1d-1 ] -> l2-1                                                                                                                                                                                             [S 0] Hierarchy: [ l1i-2 l1d-2 ] -> l2-2                                                                                                                                                                                             [S 0] Hierarchy: [ l1i-3 l1d-3 ] -> l2-3                                                                                                                                                                                             [S 0] Hierarchy: [ l2-0 l2-1 l2-2 l2-3 ] -> l3-0b0..l3-0b3                                                                                                                                                                           --> Generating traces in the core                                                                                                                                                                                                    --> Conf File: pim-rodiniaBFS.out.0                                                                                                                                                                                                  --> Generating traces in the core                                                                                                                                                                                                    --> Conf File: pim-rodiniaBFS.out.1                                                                                                                                                                                                  --> Generating traces in the core                                                                                                                                                                                                    --> Conf File: pim-rodiniaBFS.out.2                                                                                                                                                                                                  --> Generating traces in the core                                                                                                                                                                                                    --> Conf File: pim-rodiniaBFS.out.3                                                                                                                                                                                                  [S 0] Initialized system                                                                                                                                                                                                             [S 0] HDF5 backend: Opening pim-rodiniaBFS.out.zsim.h5
[S 0] HDF5 backend: Created table, 3720 bytes/record, 282 records/write
[S 0] HDF5 backend: Opening pim-rodiniaBFS.out.zsim-ev.h5
[S 0] HDF5 backend: Created table, 3720 bytes/record, 36 records/write
[S 0] HDF5 backend: Opening pim-rodiniaBFS.out.zsim-cmp.h5
[S 0] HDF5 backend: Created table, 2304 bytes/record, 1 records/write
[S 0] Initialization complete
[S 0] Started process, PID 853898
[S 0] procMask: 0x0
[S 0] [0] Adjusting clocks, domain 0, de-ffwd 0
[S 0] vDSO info initialized
[H] Attached to global heap
[S 0] FF thread 0 starting
[S 0] Started contention simulation thread 0
[S 0] Started scheduler watchdog thread
[S 0] FF control Thread TID 853903
XSAVE
A: Source/pin/vm_ia32_l/emu_ia32_linux.cpp: ExecuteSysArchPrct1: 664: Unknown sub-function for SYS_arch_prctl

################################################################################
## STACK TRACE
################################################################################
addr2line -C -f -e "/home/staff/shaojiemike/github/ramulator-pim/zsim-ramulator/pin/intel64/bin/pinbin" 0x3041c07f9 0x3041c15ce 0x3041c18a0 0x3043c2122 0x3043c48e6 0x30432df2b 0x304317874 0x3042f95db 0x304380868
LEVEL_BASE::MESSAGE_TYPE::DumpTrace()
??:?
LEVEL_BASE::MESSAGE_TYPE::MessageInternal(std::string const&, bool, PIN_ERRTYPE, __va_list_tag*, int)
??:?
LEVEL_BASE::MESSAGE_TYPE::MessageNoReturn(std::string const&, bool, PIN_ERRTYPE, int, ...)
??:?
LEVEL_VM::EMULATOR::ExecuteSysArchPrct1()
??:?
LEVEL_VM::EMULATOR::ExecuteSysCall(LEVEL_VM::INS_EMU_INFO const*)
??:?
LEVEL_VM::EMULATOR_IA32::EmulateOneInstruction(LEVEL_VM::INS_EMU_INFO const*)
??:?
LEVEL_VM::VMSVC_Emu(LEVEL_VM::SCT_ATTRIBUTES const*, LEVEL_VM::PCTXT*, LEVEL_VM::VMSVC_EMU_ARGS const*)
??:?
LEVEL_VM::VM::Dispatch(LEVEL_VM::VMSVC_ARGS const*, LEVEL_VM::PCTXT*)
??:?
VmLeave
??:?
Detach Service Count: 110
Pin 2.14
Copyright (c) 2003-2015, Intel Corporation. All rights reserved.
@CHARM-VERSION: $Rev: 71293 $
@CHARM-BUILDER: BUILDER
@CHARM-COMPILER: gcc 4.4.7
@CHARM-TARGET: ia32e
@CHARM-CFLAGS:  __OPTIMIZE__=1  __NO_INLINE__=__NO_INLINE__
[H] Child 853898 done
[H] Panic on build/opt/zsim_harness.cpp:123: Child 853898 (idx 0) exit was anomalous, killing simulation

error while downloading both zsim and ramulator

while downloading the zsim as i am running the given commands i am getting errors as g+=-6 is not found but i do have g++ installed . make file is also not working . may be there is some issue in the versions .
this are the following errors for all commands
//while downloading ramulator//

(kali㉿kali)-[~/Downloads/ramulator-pim-master]
└─$ cd ramulator

┌──(kali㉿kali)-[~/Downloads/ramulator-pim-master/ramulator]
└─$ make -j
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
/bin/sh: 1: g++-6: not found
make: *** No rule to make target 'obj/.depend', needed by 'depend'. Stop.

//while downloading zsim//

┌──(kali㉿kali)-[~/Downloads/ramulator-pim-master/zsim-ramulator]
└─$ sh compile.sh
Compiling all ...
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
configure: libconfig - made with pride in Colorado
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define EXTENSIONS... yes
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/bin/sed
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking whether gcc understands -c and -o together... (cached) yes
checking dependency style of gcc... (cached) gcc3
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for flex... flex
checking lex output file root... lex.yy
checking lex library... -lfl
checking whether yytext is a pointer... yes
checking for bison... bison -y
checking for compiler switch to enable full C/C++ warnings... -Wall -Wshadow -Wextra -Wdeclaration-after-statement -Wno-unused-parameter, -Wall -Wshadow -Wextra -Wno-unused-parameter
checking for ANSI C header files... (cached) yes
checking for unistd.h... (cached) yes
checking for stdint.h... (cached) yes
checking xlocale.h usability... no
checking xlocale.h presence... no
checking for xlocale.h... no
checking for an ANSI C-conforming const... yes
checking for newlocale... yes
checking for uselocale... yes
checking for freelocale... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating lib/Makefile
config.status: creating lib/libconfig.pc
config.status: creating lib/libconfig++.pc
config.status: creating lib/libconfigConfig.cmake
config.status: creating lib/libconfig++Config.cmake
config.status: creating doc/Makefile
config.status: creating examples/Makefile
config.status: creating examples/c/Makefile
config.status: creating examples/c++/Makefile
config.status: creating tinytest/Makefile
config.status: creating tests/Makefile
config.status: creating libconfig.spec
config.status: creating ac_config.h
config.status: executing depfiles commands
config.status: executing libtool commands
Making install in lib
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
make install-am
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
CC libconfig_la-grammar.lo
CC libconfig_la-libconfig.lo
CC libconfig_la-scanctx.lo
CC libconfig_la-scanner.lo
scanner.c: In function 'yy_get_next_buffer':
scanner.c:751:32: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Wsign-compare]
751 | for ( n = 0; n < max_size &&
| ^
scanner.c:1467:17: note: in expansion of macro 'YY_INPUT'
1467 | YY_INPUT( (&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move]),
| ^~~~~~~~
CC libconfig_la-strbuf.lo
CC libconfig_la-strvec.lo
CC libconfig_la-util.lo
CC libconfig_la-wincompat.lo
CCLD libconfig.la
ar: u' modifier ignored since D' is the default (see U') CC libconfig___la-grammar.lo CC libconfig___la-libconfig.lo CC libconfig___la-scanctx.lo CC libconfig___la-scanner.lo scanner.c: In function 'yy_get_next_buffer': scanner.c:751:32: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Wsign-compare] 751 | for ( n = 0; n < max_size && \ | ^ scanner.c:1467:17: note: in expansion of macro 'YY_INPUT' 1467 | YY_INPUT( (&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move]), | ^~~~~~~~ CC libconfig___la-strbuf.lo CC libconfig___la-strvec.lo CC libconfig___la-util.lo CC libconfig___la-wincompat.lo CXX libconfig___la-libconfigcpp.lo CXXLD libconfig++.la ar: u' modifier ignored since D' is the default (see U')
make[3]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
/bin/bash ../libtool --mode=install /usr/bin/install -c libconfig.la libconfig++.la '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
libtool: install: /usr/bin/install -c .libs/libconfig.so.11.0.2 /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig.so.11.0.2
libtool: install: (cd /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib && { ln -s -f libconfig.so.11.0.2 libconfig.so.11 || { rm -f libconfig.so.11 && ln -s libconfig.so.11.0.2 libconfig.so.11; }; })
libtool: install: (cd /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib && { ln -s -f libconfig.so.11.0.2 libconfig.so || { rm -f libconfig.so && ln -s libconfig.so.11.0.2 libconfig.so; }; })
libtool: install: /usr/bin/install -c .libs/libconfig.lai /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig.la
libtool: install: /usr/bin/install -c .libs/libconfig++.so.11.0.2 /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig++.so.11.0.2
libtool: install: (cd /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib && { ln -s -f libconfig++.so.11.0.2 libconfig++.so.11 || { rm -f libconfig++.so.11 && ln -s libconfig++.so.11.0.2 libconfig++.so.11; }; })
libtool: install: (cd /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib && { ln -s -f libconfig++.so.11.0.2 libconfig++.so || { rm -f libconfig++.so && ln -s libconfig++.so.11.0.2 libconfig++.so; }; })
libtool: install: /usr/bin/install -c .libs/libconfig++.lai /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig++.la
libtool: install: /usr/bin/install -c .libs/libconfig.a /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig.a
libtool: install: chmod 644 /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig.a
libtool: install: ranlib /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig.a
libtool: install: /usr/bin/install -c .libs/libconfig++.a /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig++.a
libtool: install: chmod 644 /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig++.a
libtool: install: ranlib /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/libconfig++.a
libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games:/sbin" ldconfig -n /home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib

Libraries have been installed in:
/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:

  • add LIBDIR to the `LD_LIBRARY_PATH' environment variable
    during execution
  • add LIBDIR to the `LD_RUN_PATH' environment variable
    during linking
  • use the `-Wl,-rpath -Wl,LIBDIR' linker flag
  • have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/cmake/libconfig'
/usr/bin/install -c -m 644 libconfigConfig.cmake '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/cmake/libconfig'
/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/cmake/libconfig++'
/usr/bin/install -c -m 644 libconfig++Config.cmake '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/cmake/libconfig++'
/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/include'
/usr/bin/install -c -m 644 libconfig.h libconfig.h++ '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/include'
/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/pkgconfig'
/usr/bin/install -c -m 644 libconfig.pc libconfig++.pc '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib/pkgconfig'
make[3]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/lib'
Making install in doc
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/doc'
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/doc'
make[2]: Nothing to be done for 'install-exec-am'.
/usr/bin/mkdir -p '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/share/info'
/usr/bin/install -c -m 644 ./libconfig.info '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/share/info'
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/doc'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/doc'
Making install in tinytest
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tinytest'
CC tinytest.o
AR libtinytest.a
ar: u' modifier ignored since D' is the default (see `U')
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tinytest'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tinytest'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tinytest'
Making install in tests
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tests'
CC libconfig_tests-tests.o
tests.c: In function ‘ParseInvalidStrings’:
tests.c:97:60: warning: ‘%s’ directive output may be truncated writing up to 127 bytes into a region of size 121 [-Wformat-truncation=]
97 | snprintf(expected_error, sizeof(expected_error), "(null):%s", parse_error);
| ^~
......
209 | parse_string_and_compare_error(input_text, error_text);
| ~~~~~~~~~~
tests.c:97:3: note: ‘snprintf’ output between 8 and 135 bytes into a destination of size 128
97 | snprintf(expected_error, sizeof(expected_error), "(null):%s", parse_error);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tests.c: In function ‘ParseInvalidFiles’:
tests.c:73:56: warning: ‘%s’ directive output may be truncated writing up to 127 bytes into a region of size between 0 and 127 [-Wformat-truncation=]
73 | snprintf(expected_error, sizeof(expected_error), "%s:%s",
| ^~
......
179 | parse_file_and_compare_error(input_file, error_text);
| ~~~~~~~~~~
tests.c:73:3: note: ‘snprintf’ output between 2 and 256 bytes into a destination of size 128
73 | snprintf(expected_error, sizeof(expected_error), "%s:%s",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74 | input_file, parse_error);
| ~~~~~~~~~~~~~~~~~~~~~~~~
CCLD libconfig_tests
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tests'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tests'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/tests'
Making install in examples
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
Making install in c
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c'
CC example1.o
CCLD example1
CC example2.o
CCLD example2
CC example3.o
CCLD example3
CC example4.o
CCLD example4
make[3]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c'
make[3]: Nothing to be done for 'install-exec-am'.
make[3]: Nothing to be done for 'install-data-am'.
make[3]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c'
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c'
Making install in c++
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c++'
CXX example1.o
CXXLD example1
CXX example2.o
CXXLD example2
CXX example3.o
CXXLD example3
CXX example4.o
CXXLD example4
make[3]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c++'
make[3]: Nothing to be done for 'install-exec-am'.
make[3]: Nothing to be done for 'install-data-am'.
make[3]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c++'
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples/c++'
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
make[3]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
make[3]: Nothing to be done for 'install-exec-am'.
make[3]: Nothing to be done for 'install-data-am'.
make[3]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig/examples'
make[1]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig'
make[2]: Entering directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig'
make[1]: Leaving directory '/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/libconfig'
scons: Reading SConscript files ...
File "/home/kali/Downloads/ramulator-pim-master/zsim-ramulator/SConstruct", line 11

print "Building " + type + " zsim at " + buildDir

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

//while running the example//
(kali㉿kali)-[~/…/ramulator-pim-master/zsim-ramulator/benchmarks/rodinia]
└─$ make
gcc -g -fopenmp -O2 -DOMP_OFFLOAD bfs.cpp -o bfs_offload
/usr/bin/ld: /tmp/cceZJ48C.o:(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
collect2: error: ld returned 1 exit status
make: *** [Makefile:12: bfs_offload] Error 1

These are errors i am facing please help me out with the solution as soon as possible.
Thank you,
Rithal.

PiM core with L1 cache dumps totally wrong output trace.

Hi

Why do you generate unfiltered memory trace for PiM?
In your Napel paper, you said you simulated NMP where core has L1 cache.
However, since your memory trace is generated unfiltered, cache size does not affect the zsim output at all.
I have verified that if you are using pim_trace mode, cache does not affect the trace output.

How did you simulate your NMP with L1 cache using this framework?

make Error in the folder benchmarks/rodinia

~/Ramulator-pim/zsim-ramulator/benchmarks/rodinia
$ make
g++-9 -g -fopenmp -O2 bfs.cpp -o bfs
bfs.cpp: In function ‘void BFSGraph(int, char**)’:
bfs.cpp:67:8: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
67 | fscanf(fp,"%d",&no_of_nodes);
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~
bfs.cpp:79:9: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
79 | fscanf(fp,"%d %d",&start,&edgeno);
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
bfs.cpp:88:8: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
88 | fscanf(fp,"%d",&source);
| ~~~~~~^~~~~~~~~~~~~~~~~
bfs.cpp:95:8: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
95 | fscanf(fp,"%d",&edge_list_size);
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
bfs.cpp:101:9: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
101 | fscanf(fp,"%d",&id);
| ~~~~~~^~~~~~~~~~~~~
bfs.cpp:102:9: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
102 | fscanf(fp,"%d",&cost);
| ~~~~~~^~~~~~~~~~~~~~~
bfs.cpp: In function ‘BFSGraph(int, char**) [clone ._omp_fn.3] [clone .hsa.0]’:
cc1plus: warning: could not emit HSAIL for the function [-Whsa]
In file included from bfs.cpp:7:
../../misc/hooks/zsim_hooks.h:8:67: note: support for HSA does not implement gimple statement gimple_asm
8 | #define COMPILER_BARRIER() { asm volatile("" ::: "memory");}
| ^
bfs.cpp: In function ‘BFSGraph(int, char**) [clone ._omp_fn.2] [clone .hsa.0]’:
cc1plus: warning: could not emit HSAIL for the function [-Whsa]
cc1plus: note: support for HSA does not implement non-gridified OpenMP parallel constructs.
bfs.cpp: In function ‘BFSGraph(int, char**) [clone ._omp_fn.1] [clone .hsa.0]’:
cc1plus: warning: could not emit HSAIL for the function [-Whsa]
../../misc/hooks/zsim_hooks.h:8:67: note: support for HSA does not implement gimple statement gimple_asm
8 | #define COMPILER_BARRIER() { asm volatile("" ::: "memory");}
| ^
bfs.cpp: In function ‘BFSGraph(int, char**) [clone ._omp_fn.0] [clone .hsa.0]’:
cc1plus: warning: could not emit HSAIL for the function [-Whsa]
cc1plus: note: support for HSA does not implement non-gridified OpenMP parallel constructs.
lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in /usr/lib/gcc/x86_64-linux-gnu/9/:/usr/lib/gcc/x86_64-linux-gnu/9/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/9/:/usr/lib/gcc/x86_64-linux-gnu/ (consider using ‘-B’)

compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make: *** [Makefile:10: bfs] Error 1

When I execute the make command in the zsim_ramulator/benchmarks/rodinia/ directory, I encounter the above error. How can I resolve this issue? Thank you very much for your help.

ramulator make errors

I use make -j to compile ramultor while encountering errors:
/usr/bin/ld: cannot find -ldrampowerxml
/usr/bin/ld: cannot find -ldrampower

inOrder Simulation Fails

Hi,
I noticed that inOrder simulation fails without any error. But, the final simulation result is incorrect
Thanks

Zsim running in PIM.cfg does not terminate

Hi,

I'm trying to replicate the NAPEL paper Test cases for Polybench and facing the following issues -

  1. Zsim with hooks for PIM offloading does not terminate (or force terminated after max_instructions reached). Adding one sample case for cholesky, but the issue remains for all benchmarks. Could you share an example with Polybench test case? It would be helpful to replicate.
  2. Rodinia.bfs example shared also does not terminate for the suggested test case of 1.0m Nodes.
    Error
[S 0] Thread 15 starting
[S 0] WARN: Futex wake matching failed (0/31) (external/ff waiters?)
[S 0] WARN: Stalled for 20s 

I have done the suggested changes in the paper as follows
Configuration file 32 cores PIM (PIM.cfg)

// This system is similar to a 6-core, 2.4GHz Westmere with 10 Niagara-like cores attached to the L3
sys = {
    lineSize = 64;
    frequency = 2400;

    cores = {
        core = {
            type = "OOO";
            cores = 32;
            icache = "l1i";
            dcache = "l1d";
        };
    };

  
    caches = {
        l1d = {
            array = {
                type = "SetAssoc";
                ways = 8;
            };
            caches = 32;
            latency = 4;
            size = 32768;
        };
        l1i = {
            array = {
                type = "SetAssoc";
                ways = 4;
            };
            caches = 32;
            latency = 3;
            size = 32768;
        };
        l2 = {
            array = {
                type = "SetAssoc";
                ways = 8;
            };
	    //type = "Timing";
	    //mshrs = 10;
            caches = 32;
            latency = 7;
            children = "l1i|l1d";
            size = 262144;
        };
        l3 = {
            array = {
                hash = "H3";
                type = "SetAssoc";
                ways = 16;
            };
	    //type = "Timing";
	    //mshrs = 16;
            banks = 32;
            caches = 1;
            latency = 27;
            children = "l2";
	    size = 67108864;
        };


    };
    
    mem = {
        type = "Traces";
        instr_traces = true;
	      only_offload = true;
	      pim_traces = true;
 
        outFile = "pim-poly_cholesky_32.out"
    };

};

sim = {
    phaseLength = 10000;
    maxTotalInstrs = 10000000000L;
    statsPhaseInterval = 1000;
    printHierarchy = true;
    // attachDebugger = True;
};

process0 = {
    command = "benchmarks/PolyBench-ACC-master/OpenMP/linear-algebra/kernels/cholesky/cholesky" ;
    startFastForwarded = True;
//    command = "ls -la";
//    command = "unzip tracesLois.out.gz";
};

Polybench example
In cholesky.h file added a dataset for test case of dimension = 2000 . cholesky.c file is modified as follows

/* POLYBENCH/GPU-OPENMP
 *
 * This file is a part of the Polybench/GPU-OpenMP suite
 *
 * Contact:
 * William Killian <[email protected]>
 * 
 * Copyright 2013, The University of Delaware
 */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <math.h>

/* Include polybench common header. */
#include <polybench.h>

/* Include benchmark-specific header. */
/* Default data type is double, default size is 4000. */
#include "cholesky.h"
#include "../../../../../../misc/hooks/zsim_hooks.h"

/* Array initialization. */
static
void init_array(int n,
		DATA_TYPE POLYBENCH_1D(p,N,n),
		DATA_TYPE POLYBENCH_2D(A,N,N,n,n))
{
  int i, j;

  for (i = 0; i < n; i++)
    {
      p[i] = 1.0 / n;
      for (j = 0; j < n; j++)
	A[i][j] = 1.0 / n;
    }
}


/* DCE code. Must scan the entire live-out data.
   Can be used also to check the correctness of the output. */
static
void print_array(int n,
		 DATA_TYPE POLYBENCH_2D(A,N,N,n,n))

{
  int i, j;

  for (i = 0; i < n; i++)
    for (j = 0; j < n; j++) {
    fprintf (stderr, DATA_PRINTF_MODIFIER, A[i][j]);
    if ((i * N + j) % 20 == 0) fprintf (stderr, "\n");
  }
}


/* Main computational kernel. The whole function will be timed,
   including the call and return. */
static
void kernel_cholesky(int n,
		     DATA_TYPE POLYBENCH_1D(p,N,n),
		     DATA_TYPE POLYBENCH_2D(A,N,N,n,n))
{
  
  int i, j, k;
  int	 num_omp_threads;
	num_omp_threads = 32;
  DATA_TYPE x;
  #pragma scop
  #pragma omp parallel
  {  
    
    #pragma omp for private (j,k)
    for (i = 0; i < _PB_N; ++i)
      { 
        zsim_PIM_function_begin();
	      x = A[i][i];
	      for (j = 0; j <= i - 1; ++j)
             
	          x = x - A[i][j] * A[i][j];
            p[i] = 1.0 /sqrt(x);
            
	      for (j = i + 1; j < _PB_N; ++j)
	        {
	          x = A[i][j];
	          for (k = 0; k <= i - 1; ++k)
	            x = x - A[j][k] * A[i][k];
	            A[j][i] = x * p[i];
	        }
          zsim_PIM_function_end(); 
      }
   
  }
  #pragma endscop
  
}


int main(int argc, char** argv)
{
  /* Retrieve problem size. */
  int n = N;

  /* Variable declaration/allocation. */
  POLYBENCH_2D_ARRAY_DECL(A, DATA_TYPE, N, N, n, n);
  POLYBENCH_1D_ARRAY_DECL(p, DATA_TYPE, N, n);


  /* Initialize array(s). */
  init_array (n, POLYBENCH_ARRAY(p), POLYBENCH_ARRAY(A));

  /* Start timer. */
  polybench_start_instruments;

  /* Run kernel. */
  zsim_roi_begin();
  kernel_cholesky (n, POLYBENCH_ARRAY(p), POLYBENCH_ARRAY(A));
  zsim_roi_end();
  /* Stop and print timer. */
  polybench_stop_instruments;
  polybench_print_instruments;

  /* Prevent dead-code elimination. All live-out data must be printed
     by the function call in argument. */
  polybench_prevent_dce(print_array(n, POLYBENCH_ARRAY(A)));

  /* Be clean. */
  POLYBENCH_FREE_ARRAY(A);
  POLYBENCH_FREE_ARRAY(p);

  return 0;
}

How to use the "Trace-Driven" feature?

Hi.
First, Thanks for making this program because it's helpful for our research.

Um...I want to use trace-driven feature , what i have to do? When I first saw the code,

the section in init.cpp is like this,

zinfo->traceDriven = config.get<bool>("sim.traceDriven", false);

So I changed this part to true and built it, but it didn't work.

i use the configure file written like this.

sys = {
  frequency = 2500;
  cores = {
    westmere = {
      type = "OOO";
      cores = 1;
      icache = "l1i";
      dcache = "l1d";

      properties = {
        bp_nb = 11;
        bp_hb = 18;
        bp_lb = 14;
      }
    };
  };

  caches = {
    l1i = {
      caches = 1;
      size = 32768;
      array = {
        type = "SetAssoc";
        ways = 8;
      };
      latency = 3;
      #Next line prefetcher at L1 access
      numLinesNLP = 1;
      #Perfect memory, all memory accesses (instructions) have L1 latency
      zeroLatencyCache = false;
    };

    l1d = {
      caches = 1;
      size = 32768;
      array = {
        type = "SetAssoc";
        ways = 8;
      };
      latency = 4;
      #Next line prefetcher at L1 access
      numLinesNLP = 1;
      #Perfect memory, all memory accesses (data) have L1 latency
      zeroLatencyCache = false;
    };

    l2 = {
      caches = 1;
      size = 1048576;
      array = {
        type = "SetAssoc";
        ways = 16;
      };
      latency = 7;
      children = "l1i|l1d";
    };

    l3 = {
      caches = 1;
      banks = 1;
      size = 10485760;
      #size = 10485760;
      #size = 47185920;
      latency = 27;
      array = {
        type = "SetAssoc";
        hash = "H3";
        ways = 20;
      };
      children = "l2";
    };
  };

  mem = {
    latency = 225;
    type = "WeaveMD1";
    boundLatency = 225;
    bandwidth = 120000;
    #latency = 1;
    #type = "DDR";
    #controllers = 6;
    #tech = "DDR3-1066-CL8";
  };
};
sim = {
    phaseLength = 10000;
    max_offload_instrs = 1000000000L;
    statsPhaseInterval = 1000;
    printHierarchy = true;
    traceDriven = true;
};

process0 = {
    command = "benchmarks/rodinia/bfs 4 benchmarks/rodinia/data/bfs/graph65536.txt "
    startFastForwarded = True;
};

#intel processor trace:
trace0 = "/home/kimsungju/문서/ramulator-pim-master/zsim-ramulator/rodiniaBFS.out.0";
#type PT:
trace_type = "PT";

There is weak explanation for Trace-driven part so i wanna ask this.

Thanks. :)

How RowClone is implemented in Ramulator?

The authors mentioned that RowClone is implemented in Ramulator.
There is a function in processor.cpp named "get_rowclone_request" but I am not sure how it's used. Please comment!

Ramulator make -j errors

Hi,
I was trying out the simulator, but got errors when running make -j for Ramulator. Following are the errors:

$ make -j
g++-6 -O3 -std=c++11 -g -w -Wall -I../../common/DRAMPower/src -L../../common/DRAMPower/src -DRAMULATOR -c -o obj/Controller.o   -lboost_program_options -ldrampowerxml -ldrampower -lxerces-c src/Controller.cpp
g++-6 -O3 -std=c++11 -g -w -Wall -I../../common/DRAMPower/src -L../../common/DRAMPower/src -DRAMULATOR -c -o obj/LogicLayer.o   -lboost_program_options -ldrampowerxml -ldrampower -lxerces-c src/LogicLayer.cpp
g++-6 -O3 -std=c++11 -g -w -Wall -I../../common/DRAMPower/src -L../../common/DRAMPower/src -DRAMULATOR -c -o obj/Processor.o   -lboost_program_options -ldrampowerxml -ldrampower -lxerces-c src/Processor.cpp
g++-6 -O3 -std=c++11 -g -w -Wall -I../../common/DRAMPower/src -L../../common/DRAMPower/src -DRAMULATOR -c -o obj/MemoryFactory.o   -lboost_program_options -ldrampowerxml -ldrampower -lxerces-c src/MemoryFactory.cpp
g++-6 -O3 -std=c++11 -g -w -Wall -I../../common/DRAMPower/src -L../../common/DRAMPower/src -DRAMULATOR -c -o obj/Refresh.o   -lboost_program_options -ldrampowerxml -ldrampower -lxerces-c src/Refresh.cpp
In file included from src/Memory.h:7:0,
                 from src/MemoryFactory.h:9,
                 from src/MemoryFactory.cpp:1:
src/Controller.h:30:39: fatal error: libdrampower/LibDRAMPower.h: No such file or directory
 #include "libdrampower/LibDRAMPower.h"
                                       ^
compilation terminated.
In file included from src/Controller.cpp:1:0:
src/Controller.h:30:39: fatal error: libdrampower/LibDRAMPower.h: No such file or directory
 #include "libdrampower/LibDRAMPower.h"
                                       ^
compilation terminated.
In file included from src/Refresh.cpp:13:0:
src/Controller.h:30:39: fatal error: libdrampower/LibDRAMPower.h: No such file or directory
 #include "libdrampower/LibDRAMPower.h"
                                       ^
compilation terminated.
Makefile:55: recipe for target 'obj/MemoryFactory.o' failed
make: *** [obj/MemoryFactory.o] Error 1
make: *** Waiting for unfinished jobs....
Makefile:55: recipe for target 'obj/Controller.o' failed
make: *** [obj/Controller.o] Error 1
Makefile:55: recipe for target 'obj/Refresh.o' failed
make: *** [obj/Refresh.o] Error 1
In file included from src/HMC_Controller.h:12:0,
                 from src/LogicLayer.h:6,
                 from src/LogicLayer.cpp:4:
src/Controller.h:30:39: fatal error: libdrampower/LibDRAMPower.h: No such file or directory
 #include "libdrampower/LibDRAMPower.h"
                                       ^
compilation terminated.
In file included from src/Memory.h:7:0,
                 from src/Processor.h:6,
                 from src/Processor.cpp:1:
src/Controller.h:30:39: fatal error: libdrampower/LibDRAMPower.h: No such file or directory
 #include "libdrampower/LibDRAMPower.h"
                                       ^
compilation terminated.
Makefile:55: recipe for target 'obj/LogicLayer.o' failed
make: *** [obj/LogicLayer.o] Error 1
Makefile:55: recipe for target 'obj/Processor.o' failed
make: *** [obj/Processor.o] Error 1

FYI: I have a standalone Ramulator in separate folder because I have used it for some time, and it is still running fine.
For this pim simulator, i use the integrated package given, but I wonder why the Ramulator which come within, is giving the error when I run make -j.
Thanks in advance.

Speedup due to PIM execution

Hello,
I'm new to ramulator-pim. When I try to run the sample traces, the host execution is better than PIM in terms of ipc and time. Could you point me to an example to show the benefits of using PIM? (i.e PIM speedup a part of code for example)
Thanks!

host code

Hi,
When I tried to run the zsim code for host.cfg(./build/opt/zsim tests/host.cfg).It's not generating multiple traces for multiple cores. But pim.cfg seems like working fine though. Can someone please let me know what might be the possible thing I'm missing here.

can't simulate many cores

Hi,
when i try to set many cores(>6) ,zsim can't simulate all cores, it only simulate 6 cores.For example,when i set cores = 8; the stats file said [core-6][core-7] cycles instrs IPC were all 0. 😢
Can someone please let me know what might be the possible thing I'm missing here.

Ramulator runs forever with 3 levels of caches!

When I am running the Ramulator with having three levels of caches (cache = all) or (cache = L3), it never finishes.
In cache.h file, it was hardcoded to L1. I have changed it to accept more cache levels (L2 and L3). It works fine when cache parameter = L1L2 (cpu parameters in Config file of Ramulator) but when I change it to cache = L3, Ramulator runs forever!

From my experience, Ramulator doesn't support shared LLC.

Any help or suggestion to fix this issue?

Thanks,

Energy measurements

I am trying to use ramulator-pim to simulate a PIM device similar to the one you used in the NAPEL paper. In this paper, it is mentioned that ramulator simulates the time and energy of the HMC-PIM device.While I can get the execution time, I do not understand what memspec file from DRAMPower should I use. The only 3D-stacked memory in the files is the WIDE I/O memory, but it is not HMC.
I also noticed that starting from line 218 of HMC_Controller.h, there is some code to compute the energy using drampower library, but it seems not active. Trying to force this code line and inserting drampower_memspecs = <path_to_memspecfile> ramulator crashed after a while. Is it accurate to just run cmd-traces using drampower and using the memespec for WIDE I/O memory for HMC ?

To replicate the error : In the file src/HMC_Controller.h, uncommenting these lines starting from 218. . I tried to enable it and insert into the config file for pim: drampower_memspecs = ../common/DRAMPower/memspecs/JEDEC_256Mb_WIDEIO_SDR-266_128bit.xml

/*if (configs["drampower_memspecs"] != "") {
   with_drampower = true;
   drampower = new libDRAMPower(
   Data::MemSpecParser::getMemSpecFromXML(
   configs["drampower_memspecs"]),
   true);
   }*/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.