Comments (19)
Hi,
Thanks for trying gem-forge framework. The trace based simulation is still buried in the code, but unfortunately de-prioritized after we shift to execution-based simulation. You can try run the command one-by-one and see if the driver generates the trace. I can help if you run into specific problems (assuming you' are using the vec-add example).
The first thing you want to do is add these to gem5/util/m5/m5op_empty.cpp
.
__attribute__((noinline)) extern void
m5_stream_nuca_region(const char *regionName, const void *buffer,
uint64_t elementSize, uint64_t dim1, uint64_t dim2,
uint64_t dim3) {
volatile int x = dummy;
(void)x;
}
__attribute__((noinline)) extern void
m5_stream_nuca_align(const void *A, const void *B, int64_t elementOffset) {
volatile int x = dummy;
(void)x;
}
__attribute__((noinline)) extern void m5_stream_nuca_remap() {
volatile int x = dummy;
(void)x;
}
__attribute__((noinline)) extern uint64_t
m5_stream_nuca_get_cached_bytes(void *buffer) {
return 0;
}
from gem-forge-framework.
Hi,
This is very likely due to some changes in gem5. I have investigated it a little bit. You need to add these lines to LLVMTraceCPU.py
:
from m5.objects.X86CPU import X86CPU
from m5.objects.X86MMU import X86MMU
# We need to inherit X86CPU to fake the ArchMMU.
class LLVMTraceCPU(BaseCPU, X86CPU):
type = 'LLVMTraceCPU'
cxx_header = 'cpu/gem_forge/llvm_trace_cpu.hh'
cxx_class = 'gem5::LLVMTraceCPU'
# Same: fake the mmu.
mmu = X86MMU()
After doing this, rebuild gem5 with make gem5
. Then you will encounter a new error which can be fixed by editing gem_forge/run.py
:
if not args.llvm_standalone:
system.workload = SEWorkload.init_compatible(
system.cpu[0].workload[0].executable
)
However, I encounter a new error complaining that X86 interrupts has no ThreadContext
:
warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
IntlvLow 6 IntlvBits 1 0 134217728 XORHighBit 20
IntlvLow 6 IntlvBits 1 0 134217728 XORHighBit 20
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
build/X86/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (64 Mbytes)
build/X86/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (64 Mbytes)
build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. gem5.opt: build/X86/arch/x86/interrupts.cc:385: gem5::AddrRangeList gem5::X86ISA::Interrupts::getAddrRanges() const: Assertion `tc' failed. Program aborted at tick 0
This is because my LLVMTraceCPU does not create a ThreadContext, but now gem5 requires it (since I am borrowing the X86Interrupts). I can try fix this next week, or you can try yourself. Sorry about that.
As for you question: This is not a replication of Tony's original paper. Our workflow is to collect the trace and you can transform the trace to add/delete more microarchitecture constraints (e.g. vectorizatio, dataflow, etc). Then the transformed trace can be simulated. Tony's work seems to mix these two phases (transformation and simulation), while we separate them since I don't want to implement the heavy transformation work inside gem5. But yes some microarchitecture details are exposed to the transformation. I hope this answers your questions.
from gem-forge-framework.
No worries. I just fixed the LLVMTraceCPU and at least I can simulate the trace with the o8 configuration. Here is the git diff patch:
Feel free to try it.
diff --git a/configs/example/gem_forge/GemForgeCPUConfig.py b/configs/example/gem_forge/GemForgeCPUConfig.py
index 091087c3ad..16cab37df6 100644
--- a/configs/example/gem_forge/GemForgeCPUConfig.py
+++ b/configs/example/gem_forge/GemForgeCPUConfig.py
@@ -152,7 +152,7 @@ def createCPUNonStandalone(args, CPUClass, multiprocesses, numThreads):
# For each process, add a LLVMTraceCPU for simulation.
llvm_trace_cpu = \
GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
- options, len(cpus))
+ args, len(cpus))
llvm_trace_cpu.cpu_id = len(cpus)
llvm_trace_cpu.traceFile = tdg_fn
@@ -194,7 +194,7 @@ def createCPUStandalone(args):
# For each process, add a LLVMTraceCPU for simulation.
llvm_trace_cpu = \
GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
- options, len(cpus))
+ args, len(cpus))
# A dummy null driver to make the python script happy.
llvm_trace_cpu.cpu_id = len(cpus)
diff --git a/configs/example/gem_forge/run.py b/configs/example/gem_forge/run.py
index 0addd4eeb8..71c80bb02a 100644
--- a/configs/example/gem_forge/run.py
+++ b/configs/example/gem_forge/run.py
@@ -492,9 +492,10 @@ system = System(cpu=initial_cpus,
if future_cpus:
system.future_cpus = future_cpus
-system.workload = SEWorkload.init_compatible(
- system.cpu[0].workload[0].executable
-)
+if not args.llvm_standalone:
+ system.workload = SEWorkload.init_compatible(
+ system.cpu[0].workload[0].executable
+ )
# Set the work count options.
Simulation.setWorkCountOptions(system, args)
diff --git a/src/arch/x86/interrupts.cc b/src/arch/x86/interrupts.cc
index bfea600535..29eb75f068 100644
--- a/src/arch/x86/interrupts.cc
+++ b/src/arch/x86/interrupts.cc
@@ -382,8 +382,20 @@ X86ISA::Interrupts::completeIPI(PacketPtr pkt)
AddrRangeList
X86ISA::Interrupts::getAddrRanges() const
{
- assert(tc);
+ /**
+ * Originally we assert here for tc.
+ * However, for LLVMTraceCPU we have no TC, and no interrupts at all.
+ * Therefore, here we return empty range when we have no tc.
+ * Then we should receive no interrupts at all.
+ * TODO: Check if this is really LLVMTraceCPU.
+ */
+ // assert(tc);
AddrRangeList ranges;
+ if (!tc) {
+ warn("Miss ThreadContext. Return empty addr range for interrupt.\n"
+ " Make sure this is LLVMTraceCPU!\n");
+ return ranges;
+ }
ranges.push_back(RangeSize(pioAddr, PageBytes));
return ranges;
}
diff --git a/src/cpu/gem_forge/LLVMTraceCPU.py b/src/cpu/gem_forge/LLVMTraceCPU.py
index 548714fcc5..f39d4ff616 100644
--- a/src/cpu/gem_forge/LLVMTraceCPU.py
+++ b/src/cpu/gem_forge/LLVMTraceCPU.py
@@ -7,6 +7,8 @@ from m5.objects.FuncUnitConfig import *
from m5.objects.FUPool import FUPool
from m5.objects.BranchPredictor import *
from m5.objects.Process import EmulatedDriver
+from m5.objects.X86CPU import X86CPU
+from m5.objects.X86MMU import X86MMU
class LLVMAccel(FUDesc):
@@ -34,11 +36,15 @@ class DefaultFUPool(FUPool):
SIMD_Unit(), WritePort(), RdWrPort(), IprPort(), LLVMAccel()]
-class LLVMTraceCPU(BaseCPU):
+# We need to inherit X86CPU to fake the ArchMMU.
+class LLVMTraceCPU(BaseCPU, X86CPU):
type = 'LLVMTraceCPU'
cxx_header = 'cpu/gem_forge/llvm_trace_cpu.hh'
cxx_class = 'gem5::LLVMTraceCPU'
+ # Same: fake the mmu.
+ mmu = X86MMU()
+
traceFile = Param.String('', 'The input llvm trace file.')
# Hack information of total active cpus that executes a trace.
diff --git a/src/cpu/gem_forge/llvm_trace_cpu.cc b/src/cpu/gem_forge/llvm_trace_cpu.cc
index b7e82f8bed..1c87868b9b 100644
--- a/src/cpu/gem_forge/llvm_trace_cpu.cc
+++ b/src/cpu/gem_forge/llvm_trace_cpu.cc
@@ -19,6 +19,7 @@ LLVMTraceCPU::LLVMTraceCPU(const Params ¶ms)
: BaseCPU(params), cpuParams(¶ms),
pageTable(params.name + ".page_table", 0, params.system,
params.isa[0]->getPageBytes()),
+ memPools(log2i(params.isa[0]->getPageBytes())),
instPort(params.name + ".inst_port", this),
dataPort(params.name + ".data_port", this),
traceFileName(params.traceFile), totalActiveCPUs(params.totalActiveCPUs),
@@ -126,6 +127,15 @@ void LLVMTraceCPU::init() {
// Create the delegator and handshake with the accelerator manager.
this->accelManager->handshake(this->cpuDelegator.get());
}
+
+ AddrRangeList memories = this->system->getPhysMem().getConfAddrRanges();
+ const auto &m5op_range = this->system->m5opRange();
+
+ if (m5op_range.valid()) {
+ memories -= m5op_range;
+ }
+
+ memPools.populate(memories);
}
void LLVMTraceCPU::tick() {
@@ -553,9 +563,7 @@ Addr LLVMTraceCPU::translateAndAllocatePhysMem(Addr vaddr) {
// Handle the page fault.
Addr pageBytes = this->pageTable.pageSize();
Addr startVaddr = this->pageTable.pageAlign(vaddr);
- assert(this->process);
- assert(this->process->seWorkload);
- auto startPaddr = this->process->seWorkload->allocPhysPages(1);
+ auto startPaddr = this->memPools.allocPhysPages(1);
this->pageTable.map(startVaddr, startPaddr, pageBytes);
DPRINTF(LLVMTraceCPU, "Map vaddr 0x%x to paddr 0x%x\n", startVaddr,
startPaddr);
diff --git a/src/cpu/gem_forge/llvm_trace_cpu.hh b/src/cpu/gem_forge/llvm_trace_cpu.hh
index 6f0a42fc65..a72a8ce132 100644
--- a/src/cpu/gem_forge/llvm_trace_cpu.hh
+++ b/src/cpu/gem_forge/llvm_trace_cpu.hh
@@ -24,6 +24,7 @@
#include "cpu/gem_forge/thread_context.hh"
#include "cpu/o3/fu_pool.hh"
#include "mem/page_table.hh"
+#include "sim/mem_pool.hh"
#include "params/LLVMTraceCPU.hh"
namespace gem5 {
@@ -131,6 +132,8 @@ private:
public:
const LLVMTraceCPUParams *cpuParams;
EmulationPageTable pageTable;
+ // MemPool to allocate paddr.
+ MemPools memPools;
CPUPort instPort;
CPUPort dataPort;
from gem-forge-framework.
Thanks sincerely for the rapid response. I'm running commands in gfm.sh
. After adding the function implementations in gem5/util/m5/m5op_empty.cpp
, the python Driver.py $Benchmark--trace
command encounters the following errors if I remove the --fake-trace
option :
Traceback (most recent call last):
File "/home/uvxiao/repos/gem-forge-framework/driver/JobScheduler.py", line 36, in __call__
out = result.get(self.__timeout)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "Driver.py", line 68, in trace
benchmark.trace()
File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/GemForgeMicroSuite.py", line 770, in trace
self.run_trace()
File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/Benchmark.py", line 695, in run_trace
if self.get_args() is not None:
TypeError: get_args() missing 1 required positional argument: 'input_name'
It looks like that the name of the input file required for tracing is not provided for the get_args()
method. So I modify Driver.py
, Benchmark.py
, and GemForgeMicroBenchmark.py
to pass sim_inputs
into trace()
method to provide the argument.
After the fixing, I encounter the following error:
Error when executing ./trace.exe 1 4194304 0 1
Traceback (most recent call last):
File "/home/uvxiao/repos/gem-forge-framework/driver/JobScheduler.py", line 36, in __call__
out = result.get(self.__timeout)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "Driver.py", line 68, in trace
benchmark.trace(trace_input)
File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/GemForgeMicroSuite.py", line 770, in trace
self.run_trace(input_name)
File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/Benchmark.py", line 707, in run_trace
Util.call_helper(run_cmd, env=env)
File "/home/uvxiao/repos/gem-forge-framework/driver/Util.py", line 19, in call_helper
raise e
File "/home/uvxiao/repos/gem-forge-framework/driver/Util.py", line 16, in call_helper
subprocess.check_call(cmd, stdout=stdout, stderr=stderr, env=env)
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./trace.exe', '1', '4194304', '0', '1']' died with <Signals.SIGILL: 4>.
If I manually run the command, the error message is:
[1] 1308153 illegal hardware instruction (core dumped)
I use gdb
to trace the error, and the message is:
Program received signal SIGILL, Illegal instruction.
0x0000000000411fbd in foo () at ./../omp_vec_add_avx/omp_vec_add_avx.c:38
38 ValueAVX valA = ValueAVXLoad(a + i);
I wonder how should I enable the tracing in the presence of stream instructions. Could you please provide some suggestions?
I appreciate your help very much.
from gem-forge-framework.
I've tried another benchmark (transform/benchmark/GemForgeMicroSuite/cond_array_sum/cond_array_sum.c
) including no streaming instructions, through similar commands. However, the same errors occur as follows:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/example-trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 call
[0]3: bb 2 br
Program received signal SIGILL, Illegal instruction.
0x000000000040acb4 in main () at /home/uvxiao/repos/gem-forge-framework/transform/benchmark/GemForgeMicroSuite/cond_array_sum/cond_array_sum.c:35
35 for (long long i = 0; i < N; i++) {
Is there any information that the trace.exe
could provide for the debugging purpose? I'm kind of stuck on the errors.
from gem-forge-framework.
This is due to your CPU does not support AVX-512 instructions. Can you modify the benchmark to get rid of the AVX-512 instructions and try again? I would also suggest run the tracing binary with gdb to locate which instruction is not supported. I hope this is helpful. Let me know if you run into new problems!
from gem-forge-framework.
Thanks! I've removed -mavx512f
from compilation flags, and the trace.exe
produces 0.bbtrace
, 0.profile
, and 0.profile.txt
. I wonder how I should read the 0.bbtrace
file, which seems to have an unsupported text encoding. Additionally, how could I run the trace-based simulation given the trace files? I failed to find the corresponding scripts or gem5 configuration files that work well. Thanks again for your help!
from gem-forge-framework.
That's good news. For trace-based simulation, the workflow is actually similar to execution-based simulation, but you need different LLVM transform passes to to process the trace instead of generating a binary.
For the baseline, can you try replace the valid.ex
transform with replay
? See if that enables trace processing. Then you need to simulate the processed trace in gem5, can you try driver/Configurations/Simulations/o8.json
. It seems to me this will simulate the trace file using the LLVMTraceCPU
in gem5. Let me know if you run into more problems.
from gem-forge-framework.
I've created a specialized script gfm-trace.sh
for trace-based simulation as follows:
#!/bin/bash
Benchmark='-b '
Benchmark+='gfm.vec_add,'
SimInput=small-cold
Threads=64
python3 Driver.py $Benchmark --build
python3 Driver.py $Benchmark $SimTrace --trace --sim-input-size $SimInput
BaseTrans=replay
Parallel=100
sim_replay=o8
python3 Driver.py $Benchmark $SimTrace -t $BaseTrans --sim-input-size $SimInput --sim-configs $sim_replay --input-threads $Threads -s -j $Parallel --gem5-debug DRAMsim3 --gem5-debug-start 15502083420 | tee gfm.log
As stated in my previous comments, python3 Driver.py $Benchmark $SimTrace --trace --sim-input-size $SimInput
produces 0.bbtrace
, 0.profile
under the trace folder with the following outputs, which looks good:
initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/result/stream/gfm/vec_add/trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 icmp
[0]3: bb 2 br
[0]4: bb3 0 getelementptr
Number of Threads: 1.
Data size 1024kB Offset 0kB Random 0.
Profiled Inst #789306
Done!
However, the last command in the script fails to schedule any simulations. The print
s that I've added show that the driver failed to find legal traces at Driver.py, which initializes traces at Benchmark.py from *.trace
files.
I found that the codes at Benchmark.simulate seem to invoke trace-based simulation. Maybe I could type the required commands manually instead of running the Driver.py
. Or how should I pass the trace files into the driver running? How do you like that?
Thanks!
from gem-forge-framework.
Looking at your script it seems you missed one command to process the trace. It should have four commands:
- Build the LLVM IR bytecode (bc file)
- Build the tracing binary and trace the binary.
- Processing the trace (the one you missed).
- Simulate it in gem5.
In the original script this command processes the trace:
python Driver.py $Benchmark $SimTrace -t $BaseTrans -d
Can you check if this fixes the problem?
from gem-forge-framework.
Thanks for the rapid reply!
I've added the -d
command for trace processing. However, the "trace missing" problem still occurs. I looked into the code Driver.schedule_transform, retrieving tdgs
by calling Benchmark.get_tdgs, which gets tdgs
from the traces. However, since the "trace the binary" command didn't produce *.trace
files, the tdgs
list is empty and no processing or simulation is executed. Should I do some modification for the ”trace the binary“ process to produce the required files?
from gem-forge-framework.
Can you check the replay
folder in the result directory? I tried on my side and I successfully generate the tdg files to be simulated. This is the structure in my result folder:
➜ vec_add git:(main) ✗ ll
total 120K
-rw-r--r-- 1 gf gf 28K Dec 8 20:13 gfm.vec_add.replay.bc
-rw-r--r-- 1 gf gf 28K Dec 8 19:59 raw.bc
-rw-r--r-- 1 gf gf 64K Dec 8 19:59 raw.ll
drwxr-xr-x 3 gf gf 100 Dec 8 20:13 replay
drwxr-xr-x 2 gf gf 142 Dec 8 20:06 trace
➜ vec_add git:(main) ✗ ll trace
total 123M
-rw-r--r-- 1 gf gf 123M Dec 8 20:06 0.0.trace
-rw-r--r-- 1 gf gf 525K Dec 8 20:06 0.bbtrace
-rw-r--r-- 1 gf gf 208 Dec 8 20:06 0.profile
-rw-r--r-- 1 gf gf 19 Dec 8 20:06 0.profile.txt
-rw-r--r-- 1 gf gf 36K Dec 8 20:05 inst.uid
-rw-r--r-- 1 gf gf 34K Dec 8 20:05 inst.uid.txt
➜ vec_add git:(main) ✗ ll replay
total 393M
-rw-r--r-- 1 gf gf 354M Dec 8 20:13 0.tdg
-rw-r--r-- 1 gf gf 40M Dec 8 20:13 0.tdg.cache
drwxr-xr-x 2 gf gf 81 Dec 8 20:13 0.tdg.extra
-rw-r--r-- 1 gf gf 364 Dec 8 20:13 0.tdg.stats.txt
Once you have the tdg files, you should be able to simulate (I hope). Just in case, here is my modified script. Notice that I pass input-size
instead of sim-input-size
, but that should not matter.
#!/bin/bash
# rm -f /tmp/job_scheduler.*
# Specify the benchmark. The source file is in
# transform/benchmark/GemForgeMicroSuite/stream/vec_add/omp_vec_add_avx
Benchmark='-b '
Benchmark+='gfm.vec_add,'
# Specify the input size. Check driver/BenchmarkDrivers/GemForgeMicroSuite.py
# for the details of how the benchmark is built and simulated.
SimInput=large
# Specify the number of threads. The workload is parallelized with OpenMP.
Threads=64
# The following two commands build the LLVM bytecode of the workload in
# /gem-forge-stack/result/stream/gfm/vec_add
# You can check the IR in raw.ll (text format).
# python Driver.py $Benchmark --build
# This command traces the binary and generates the trace in
# /gem-forge-stack/result/stream/gfm/vec_add/trace
# python Driver.py $Benchmark --input-size $SimInput --trace
# This command processes the LLVM IR trace for simulation.
# Baseline means there is no stream specialization.
# The processed trace is located in:
# /gem-forge-stack/result/stream/gfm/vec_add/replay/0.tdg
BaseTrans=replay
python Driver.py $Benchmark -t $BaseTrans -d
from gem-forge-framework.
It seems that our scripts are almost the same except for the input-size
parameter.
After running the script, where the last -d
command takes no effect, here's the structure in my result folder:
[10:36:28] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll
total 100K
-rw-rw-r-- 1 uvxiao uvxiao 27K 12月 9 10:27 raw.bc
-rw-rw-r-- 1 uvxiao uvxiao 61K 12月 9 10:27 raw.ll
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月 9 10:27 replay
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月 9 10:27 trace
[10:36:29] [cost 0.100s] ll
[10:41:40] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll trace
total 252K
-rw-rw-r-- 1 uvxiao uvxiao 168K 12月 9 10:27 0.bbtrace
-rw-rw-r-- 1 uvxiao uvxiao 448 12月 9 10:27 0.profile
-rw-rw-r-- 1 uvxiao uvxiao 39 12月 9 10:27 0.profile.txt
-rw-rw-r-- 1 uvxiao uvxiao 37K 12月 9 10:27 inst.uid
-rw-rw-r-- 1 uvxiao uvxiao 34K 12月 9 10:27 inst.uid.txt
[10:41:45] [cost 0.143s] ll trace
[10:41:46] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll replay
total 0
[10:41:50] [cost 0.090s] ll replay
I found that the missing files are the gfm.vec_add.replay.bc
and the trace/0.0.trace
. That's kind of weird, since the --trace
command seems to work successfully. It gives outputs as follows:
initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/result/stream/gfm/vec_add/trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 icmp
[0]3: bb 2 br
[0]4: bb3 0 getelementptr
Number of Threads: 1.
Data size 16384kB Offset 0kB Random 0.
[0]10000000: bb22 4 getelementptr
[0]10000001: bb22 5 bitcast
[0]10000002: bb22 6 load
[0]10000003: bb22 7 getelementptr
[0]10000004: bb22 8 bitcast
Profiled Inst #12627546
Done!
Is that a version problem? I'm using gem-forge-framework at the commit 8ebd48 (main
), with the driver at 95e004 (main
) and the transform
at b05a77 (main
).
from gem-forge-framework.
Here is my output of tracing. It seems yours missed the clean up log.
The first one is func enter.
[0]10000000:main (0) -> foo (1) -> bb6 9 icmp
[0]10000001:main (0) -> foo (1) -> bb6 10 br
[0]10000002:main (0) -> foo (1) -> bb6 0 phi
[0]10000003:main (0) -> foo (1) -> bb6 1 getelementptr
[0]10000004:main (0) -> foo (1) -> bb6 2 load
[0]20000000:main (0) -> foo (1) -> bb6 8 add
[0]20000001:main (0) -> foo (1) -> bb6 9 icmp
[0]20000002:main (0) -> foo (1) -> bb6 10 br
[0]20000003:main (0) -> foo (1) -> bb6 0 phi
[0]20000004:main (0) -> foo (1) -> bb6 1 getelementptr
[0]30000000:main (0) -> foo (1) -> bb6 7 store
[0]30000001:main (0) -> foo (1) -> bb6 8 add
[0]30000002:main (0) -> foo (1) -> bb6 9 icmp
[0]30000003:main (0) -> foo (1) -> bb6 10 br
[0]30000004:main (0) -> foo (1) -> bb6 0 phi
[0]40000000:main (0) -> foo (1) -> bb6 6 getelementptr
[0]40000001:main (0) -> foo (1) -> bb6 7 store
[0]40000002:main (0) -> foo (1) -> bb6 8 add
[0]40000003:main (0) -> foo (1) -> bb6 9 icmp
[0]40000004:main (0) -> foo (1) -> bb6 10 br
Clean up.
Clean up tid 0.
Thread 0 Traced #46137349
Clean up tid 1.
Clean up tid 2.
Clean up tid 3.
Clean up tid 4.
Clean up tid 5.
Clean up tid 6.
Clean up tid 7.
Clean up tid 8.
Clean up tid 9.
Profiled Inst #46137348
I am not sure this is a version problem, since tracing part hasn't changed for more than two years. My guess is that somehow your binary doesn't write it to the file. Can you take a look at TracerProtobuf.cpp
? There is a cleanup
function which should be called and dump the trace to file. Or you can change the input size larger in GemForgeMicroSuite.py
to make sure the binary runs longer and has more instruction in the trace (my guess is that when the number of instruction traced is low, the tracer has a bug to not call cleanup()
function). I hope this is helpful.
from gem-forge-framework.
Truly thanks for your help! I found that it was my bad, self-assertive updates applied to Benchmark.py
that caused the failure of generating *.trace
tracing results. Specifically, the buggy updates are as follows:
# # Remember to set the environment for trace.
# os.putenv('LLVM_TDG_TRACE_FOLDER', self.get_trace_folder_abs())
# os.putenv('LLVM_TDG_INST_UID_FILE', self.get_trace_inst_uid())
# # We need libunwind.so for profiling.
# os.putenv('LD_LIBRARY_PATH', os.path.join(C.LLVM_PATH, 'lib'))
env = os.environ.copy()
env['LLVM_TDG_TRACE_FOLDER'] = self.get_trace_folder_abs()
env['LLVM_TDG_INST_UID_FILE'] = self.get_trace_inst_uid()
env['LD_LIBRARY_PATH'] = '{}:{}'.format(os.path.join(C.LLVM_PATH, 'lib'), env['LD_LIBRARY_PATH'])
run_cmd = [
'./' + self.get_trace_bin(),
]
if self.get_args(input_name) is not None:
run_cmd += self.get_args(input_name)
# print('# Run traced binary...')
# Util.call_helper(run_cmd)
subprocess.run(run_cmd, env=env, check=True)
It's really stupid. After recovering the code, I attained the replay
folder as follows:
[14:43:14] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll replay
total 6.9M
-rw-rw-r-- 1 uvxiao uvxiao 6.1M 12月 10 14:40 0.tdg
-rw-rw-r-- 1 uvxiao uvxiao 748K 12月 10 14:40 0.tdg.cache
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月 10 14:40 0.tdg.extra
-rw-rw-r-- 1 uvxiao uvxiao 362 12月 10 14:39 0.tdg.stats.txt
[14:43:17] [cost 0.116s] ll replay
However, I'm actually kind of confused about the trace processing. According to "Analyzing Behavior Specialized Acceleration" at ASPLOS '16, the TDG representation composing
After generating the *.tdg
file, I tried to trigger the simulation. I met the following error:
warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
NameError: name 'options' is not defined
At:
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(196): createCPUStandalone
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(222): initializeCPUs
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/run.py(466): <module>
build/X86/python/m5/main.py(599): main
The code segment is as follows:
# For each process, add a LLVMTraceCPU for simulation.
llvm_trace_cpu = \
GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
options, len(cpus))
It seems the options
are not defined. I guess the expected argument is args
. After changing options
to args
, another error happens:
AttributeError: object 'LLVMTraceCPU' has no attribute 'ArchMMU'
(C++ object is not yet constructed, so wrapped C++ methods are unavailable.)
At:
build/X86/python/m5/SimObject.py(851): __getattr__
build/X86/cpu/BaseCPU.py(344): __init__
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeLLVMTraceCPUConfig.py(5): initializeLLVMTraceCPU
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(195): createCPUStandalone
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(222): initializeCPUs
/home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/run.py(466): <module>
build/X86/python/m5/main.py(599): main
I found a similar issue (GEM5-1335). So I guess that it might be a GEM5 bug. How do you like that?
Thanks a lot.
from gem-forge-framework.
Thanks sincerely for your selfless help!I've understood more about the methodology, and I'd like to look at the transform for more details.
Thanks again for your debugging and maintaining efforts!
from gem-forge-framework.
BTW I forgot to mention that my master thesis is about this tool, you can take a look if you like.
from gem-forge-framework.
Sincerely appreciate your exciting work and selfless help!
Sorry for the late reply. I've just recovered from influenza 😭 I followed the git diff to get the code updated, and it seems to work well now! And your thesis helped me considerably to understand this tool.
Thanks again for your help!
from gem-forge-framework.
Take care. Good luck!
from gem-forge-framework.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gem-forge-framework.