Hello, I'm exploring the trace-based simulation feature in gem-forge

Thanks sincerely for the rapid response. I'm running commands in <code class="notransl

I've created a specialized gfm-trace.sh for tr

Looking at your it seems you missed one command to process the trace. It should

Request for guidance on trace-based simulation in gem-forge ,about polyarch/gem-forge-framework

Comments (19)

seanzw commented on July 26, 2024 1

Hi,

Thanks for trying gem-forge framework. The trace based simulation is still buried in the code, but unfortunately de-prioritized after we shift to execution-based simulation. You can try run the command one-by-one and see if the driver generates the trace. I can help if you run into specific problems (assuming you' are using the vec-add example).

The first thing you want to do is add these to gem5/util/m5/m5op_empty.cpp.

__attribute__((noinline)) extern void
m5_stream_nuca_region(const char *regionName, const void *buffer,
                      uint64_t elementSize, uint64_t dim1, uint64_t dim2,
                      uint64_t dim3) {
  volatile int x = dummy;
  (void)x;
}
__attribute__((noinline)) extern void
m5_stream_nuca_align(const void *A, const void *B, int64_t elementOffset) {
  volatile int x = dummy;
  (void)x;
}
__attribute__((noinline)) extern void m5_stream_nuca_remap() {
  volatile int x = dummy;
  (void)x;
}
__attribute__((noinline)) extern uint64_t
m5_stream_nuca_get_cached_bytes(void *buffer) {
  return 0;
}

from gem-forge-framework.

seanzw commented on July 26, 2024 1

Hi,

This is very likely due to some changes in gem5. I have investigated it a little bit. You need to add these lines to LLVMTraceCPU.py:

from m5.objects.X86CPU import X86CPU
from m5.objects.X86MMU import X86MMU

# We need to inherit X86CPU to fake the ArchMMU.
class LLVMTraceCPU(BaseCPU, X86CPU):
    type = 'LLVMTraceCPU'
    cxx_header = 'cpu/gem_forge/llvm_trace_cpu.hh'
    cxx_class = 'gem5::LLVMTraceCPU'

    # Same: fake the mmu.
    mmu = X86MMU()

After doing this, rebuild gem5 with make gem5. Then you will encounter a new error which can be fixed by editing gem_forge/run.py:

if not args.llvm_standalone:
    system.workload = SEWorkload.init_compatible(
        system.cpu[0].workload[0].executable
    )

However, I encounter a new error complaining that X86 interrupts has no ThreadContext:

warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
IntlvLow 6 IntlvBits 1 0 134217728 XORHighBit 20
IntlvLow 6 IntlvBits 1 0 134217728 XORHighBit 20
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
build/X86/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (64 Mbytes)
build/X86/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (64 Mbytes)
build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.                                                                                                   build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.                                                                                                   build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.                                                                                                   build/X86/base/statistics.hh:285: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.                                                                                                   gem5.opt: build/X86/arch/x86/interrupts.cc:385: gem5::AddrRangeList gem5::X86ISA::Interrupts::getAddrRanges() const: Assertion `tc' failed.                                                                                                                                  Program aborted at tick 0

This is because my LLVMTraceCPU does not create a ThreadContext, but now gem5 requires it (since I am borrowing the X86Interrupts). I can try fix this next week, or you can try yourself. Sorry about that.

As for you question: This is not a replication of Tony's original paper. Our workflow is to collect the trace and you can transform the trace to add/delete more microarchitecture constraints (e.g. vectorizatio, dataflow, etc). Then the transformed trace can be simulated. Tony's work seems to mix these two phases (transformation and simulation), while we separate them since I don't want to implement the heavy transformation work inside gem5. But yes some microarchitecture details are exposed to the transformation. I hope this answers your questions.

from gem-forge-framework.

seanzw commented on July 26, 2024 1

No worries. I just fixed the LLVMTraceCPU and at least I can simulate the trace with the o8 configuration. Here is the git diff patch:
Feel free to try it.

diff --git a/configs/example/gem_forge/GemForgeCPUConfig.py b/configs/example/gem_forge/GemForgeCPUConfig.py
index 091087c3ad..16cab37df6 100644
--- a/configs/example/gem_forge/GemForgeCPUConfig.py
+++ b/configs/example/gem_forge/GemForgeCPUConfig.py
@@ -152,7 +152,7 @@ def createCPUNonStandalone(args, CPUClass, multiprocesses, numThreads):
             # For each process, add a LLVMTraceCPU for simulation.
             llvm_trace_cpu = \
                 GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
-                    options, len(cpus))
+                    args, len(cpus))
 
             llvm_trace_cpu.cpu_id = len(cpus)
             llvm_trace_cpu.traceFile = tdg_fn
@@ -194,7 +194,7 @@ def createCPUStandalone(args):
         # For each process, add a LLVMTraceCPU for simulation.
         llvm_trace_cpu = \
             GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
-                options, len(cpus))
+                args, len(cpus))
 
         # A dummy null driver to make the python script happy.
         llvm_trace_cpu.cpu_id = len(cpus)
diff --git a/configs/example/gem_forge/run.py b/configs/example/gem_forge/run.py
index 0addd4eeb8..71c80bb02a 100644
--- a/configs/example/gem_forge/run.py
+++ b/configs/example/gem_forge/run.py
@@ -492,9 +492,10 @@ system = System(cpu=initial_cpus,
 if future_cpus:
     system.future_cpus = future_cpus
 
-system.workload = SEWorkload.init_compatible(
-    system.cpu[0].workload[0].executable
-)
+if not args.llvm_standalone:
+    system.workload = SEWorkload.init_compatible(
+        system.cpu[0].workload[0].executable
+    )
 
 # Set the work count options.
 Simulation.setWorkCountOptions(system, args)
diff --git a/src/arch/x86/interrupts.cc b/src/arch/x86/interrupts.cc
index bfea600535..29eb75f068 100644
--- a/src/arch/x86/interrupts.cc
+++ b/src/arch/x86/interrupts.cc
@@ -382,8 +382,20 @@ X86ISA::Interrupts::completeIPI(PacketPtr pkt)
 AddrRangeList
 X86ISA::Interrupts::getAddrRanges() const
 {
-    assert(tc);
+    /**
+     * Originally we assert here for tc.
+     * However, for LLVMTraceCPU we have no TC, and no interrupts at all.
+     * Therefore, here we return empty range when we have no tc.
+     * Then we should receive no interrupts at all.
+     * TODO: Check if this is really LLVMTraceCPU.
+     */
+    // assert(tc);
     AddrRangeList ranges;
+    if (!tc) {
+      warn("Miss ThreadContext. Return empty addr range for interrupt.\n"
+        "    Make sure this is LLVMTraceCPU!\n");
+      return ranges;
+    }
     ranges.push_back(RangeSize(pioAddr, PageBytes));
     return ranges;
 }
diff --git a/src/cpu/gem_forge/LLVMTraceCPU.py b/src/cpu/gem_forge/LLVMTraceCPU.py
index 548714fcc5..f39d4ff616 100644
--- a/src/cpu/gem_forge/LLVMTraceCPU.py
+++ b/src/cpu/gem_forge/LLVMTraceCPU.py
@@ -7,6 +7,8 @@ from m5.objects.FuncUnitConfig import *
 from m5.objects.FUPool import FUPool
 from m5.objects.BranchPredictor import *
 from m5.objects.Process import EmulatedDriver
+from m5.objects.X86CPU import X86CPU
+from m5.objects.X86MMU import X86MMU
 
 
 class LLVMAccel(FUDesc):
@@ -34,11 +36,15 @@ class DefaultFUPool(FUPool):
               SIMD_Unit(), WritePort(), RdWrPort(), IprPort(), LLVMAccel()]
 
 
-class LLVMTraceCPU(BaseCPU):
+# We need to inherit X86CPU to fake the ArchMMU.
+class LLVMTraceCPU(BaseCPU, X86CPU):
     type = 'LLVMTraceCPU'
     cxx_header = 'cpu/gem_forge/llvm_trace_cpu.hh'
     cxx_class = 'gem5::LLVMTraceCPU'
 
+    # Same: fake the mmu.
+    mmu = X86MMU()
+
     traceFile = Param.String('', 'The input llvm trace file.')
 
     # Hack information of total active cpus that executes a trace.
diff --git a/src/cpu/gem_forge/llvm_trace_cpu.cc b/src/cpu/gem_forge/llvm_trace_cpu.cc
index b7e82f8bed..1c87868b9b 100644
--- a/src/cpu/gem_forge/llvm_trace_cpu.cc
+++ b/src/cpu/gem_forge/llvm_trace_cpu.cc
@@ -19,6 +19,7 @@ LLVMTraceCPU::LLVMTraceCPU(const Params &params)
     : BaseCPU(params), cpuParams(&params),
       pageTable(params.name + ".page_table", 0, params.system,
                 params.isa[0]->getPageBytes()),
+      memPools(log2i(params.isa[0]->getPageBytes())),
       instPort(params.name + ".inst_port", this),
       dataPort(params.name + ".data_port", this),
       traceFileName(params.traceFile), totalActiveCPUs(params.totalActiveCPUs),
@@ -126,6 +127,15 @@ void LLVMTraceCPU::init() {
     // Create the delegator and handshake with the accelerator manager.
     this->accelManager->handshake(this->cpuDelegator.get());
   }
+
+  AddrRangeList memories = this->system->getPhysMem().getConfAddrRanges();
+  const auto &m5op_range = this->system->m5opRange();
+
+  if (m5op_range.valid()) {
+      memories -= m5op_range;
+  }
+
+  memPools.populate(memories);
 }
 
 void LLVMTraceCPU::tick() {
@@ -553,9 +563,7 @@ Addr LLVMTraceCPU::translateAndAllocatePhysMem(Addr vaddr) {
     // Handle the page fault.
     Addr pageBytes = this->pageTable.pageSize();
     Addr startVaddr = this->pageTable.pageAlign(vaddr);
-    assert(this->process);
-    assert(this->process->seWorkload);
-    auto startPaddr = this->process->seWorkload->allocPhysPages(1);
+    auto startPaddr = this->memPools.allocPhysPages(1);
     this->pageTable.map(startVaddr, startPaddr, pageBytes);
     DPRINTF(LLVMTraceCPU, "Map vaddr 0x%x to paddr 0x%x\n", startVaddr,
             startPaddr);
diff --git a/src/cpu/gem_forge/llvm_trace_cpu.hh b/src/cpu/gem_forge/llvm_trace_cpu.hh
index 6f0a42fc65..a72a8ce132 100644
--- a/src/cpu/gem_forge/llvm_trace_cpu.hh
+++ b/src/cpu/gem_forge/llvm_trace_cpu.hh
@@ -24,6 +24,7 @@
 #include "cpu/gem_forge/thread_context.hh"
 #include "cpu/o3/fu_pool.hh"
 #include "mem/page_table.hh"
+#include "sim/mem_pool.hh"
 #include "params/LLVMTraceCPU.hh"
 
 namespace gem5 {
@@ -131,6 +132,8 @@ private:
 public:
   const LLVMTraceCPUParams *cpuParams;
   EmulationPageTable pageTable;
+  // MemPool to allocate paddr.
+  MemPools memPools;
   CPUPort instPort;
   CPUPort dataPort;

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Thanks sincerely for the rapid response. I'm running commands in gfm.sh. After adding the function implementations in gem5/util/m5/m5op_empty.cpp, the python Driver.py $Benchmark--trace command encounters the following errors if I remove the --fake-trace option :

Traceback (most recent call last):
  File "/home/uvxiao/repos/gem-forge-framework/driver/JobScheduler.py", line 36, in __call__
    out = result.get(self.__timeout)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "Driver.py", line 68, in trace
    benchmark.trace()
  File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/GemForgeMicroSuite.py", line 770, in trace
    self.run_trace()
  File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/Benchmark.py", line 695, in run_trace
    if self.get_args() is not None:
TypeError: get_args() missing 1 required positional argument: 'input_name'

It looks like that the name of the input file required for tracing is not provided for the get_args() method. So I modify Driver.py, Benchmark.py, and GemForgeMicroBenchmark.py to pass sim_inputs into trace() method to provide the argument.

After the fixing, I encounter the following error:

Error when executing ./trace.exe 1 4194304 0 1
Traceback (most recent call last):
  File "/home/uvxiao/repos/gem-forge-framework/driver/JobScheduler.py", line 36, in __call__
    out = result.get(self.__timeout)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "Driver.py", line 68, in trace
    benchmark.trace(trace_input)
  File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/GemForgeMicroSuite.py", line 770, in trace
    self.run_trace(input_name)
  File "/home/uvxiao/repos/gem-forge-framework/driver/BenchmarkDrivers/Benchmark.py", line 707, in run_trace
    Util.call_helper(run_cmd, env=env)
  File "/home/uvxiao/repos/gem-forge-framework/driver/Util.py", line 19, in call_helper
    raise e
  File "/home/uvxiao/repos/gem-forge-framework/driver/Util.py", line 16, in call_helper
    subprocess.check_call(cmd, stdout=stdout, stderr=stderr, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./trace.exe', '1', '4194304', '0', '1']' died with <Signals.SIGILL: 4>.

If I manually run the command, the error message is:

[1]    1308153 illegal hardware instruction (core dumped)

I use gdb to trace the error, and the message is:

Program received signal SIGILL, Illegal instruction.
0x0000000000411fbd in foo () at ./../omp_vec_add_avx/omp_vec_add_avx.c:38
38          ValueAVX valA = ValueAVXLoad(a + i);

I wonder how should I enable the tracing in the presence of stream instructions. Could you please provide some suggestions?
I appreciate your help very much.

from gem-forge-framework.

uv-xiao commented on July 26, 2024

I've tried another benchmark (transform/benchmark/GemForgeMicroSuite/cond_array_sum/cond_array_sum.c) including no streaming instructions, through similar commands. However, the same errors occur as follows:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/example-trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 call
[0]3: bb 2 br

Program received signal SIGILL, Illegal instruction.
0x000000000040acb4 in main () at /home/uvxiao/repos/gem-forge-framework/transform/benchmark/GemForgeMicroSuite/cond_array_sum/cond_array_sum.c:35
35        for (long long i = 0; i < N; i++) {

Is there any information that the trace.exe could provide for the debugging purpose? I'm kind of stuck on the errors.

from gem-forge-framework.

seanzw commented on July 26, 2024

This is due to your CPU does not support AVX-512 instructions. Can you modify the benchmark to get rid of the AVX-512 instructions and try again? I would also suggest run the tracing binary with gdb to locate which instruction is not supported. I hope this is helpful. Let me know if you run into new problems!

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Thanks! I've removed -mavx512f from compilation flags, and the trace.exe produces 0.bbtrace, 0.profile, and 0.profile.txt. I wonder how I should read the 0.bbtrace file, which seems to have an unsupported text encoding. Additionally, how could I run the trace-based simulation given the trace files? I failed to find the corresponding scripts or gem5 configuration files that work well. Thanks again for your help!

from gem-forge-framework.

seanzw commented on July 26, 2024

That's good news. For trace-based simulation, the workflow is actually similar to execution-based simulation, but you need different LLVM transform passes to to process the trace instead of generating a binary.

For the baseline, can you try replace the valid.ex transform with replay? See if that enables trace processing. Then you need to simulate the processed trace in gem5, can you try driver/Configurations/Simulations/o8.json. It seems to me this will simulate the trace file using the LLVMTraceCPU in gem5. Let me know if you run into more problems.

from gem-forge-framework.

uv-xiao commented on July 26, 2024

I've created a specialized script gfm-trace.sh for trace-based simulation as follows:

#!/bin/bash

Benchmark='-b '
Benchmark+='gfm.vec_add,'

SimInput=small-cold
Threads=64

python3 Driver.py $Benchmark --build
python3 Driver.py $Benchmark $SimTrace --trace --sim-input-size $SimInput

BaseTrans=replay
Parallel=100
sim_replay=o8

python3 Driver.py $Benchmark $SimTrace -t $BaseTrans --sim-input-size $SimInput --sim-configs $sim_replay --input-threads $Threads -s -j $Parallel --gem5-debug DRAMsim3 --gem5-debug-start 15502083420 | tee gfm.log

As stated in my previous comments, python3 Driver.py $Benchmark $SimTrace --trace --sim-input-size $SimInput produces 0.bbtrace, 0.profile under the trace folder with the following outputs, which looks good:

initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/result/stream/gfm/vec_add/trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 icmp
[0]3: bb 2 br
[0]4: bb3 0 getelementptr

Number of Threads: 1.
Data size 1024kB Offset 0kB Random 0.
Profiled Inst #789306
Done!

However, the last command in the script fails to schedule any simulations. The prints that I've added show that the driver failed to find legal traces at Driver.py, which initializes traces at Benchmark.py from *.trace files.

I found that the codes at Benchmark.simulate seem to invoke trace-based simulation. Maybe I could type the required commands manually instead of running the Driver.py. Or how should I pass the trace files into the driver running? How do you like that?

Thanks!

from gem-forge-framework.

seanzw commented on July 26, 2024

Looking at your script it seems you missed one command to process the trace. It should have four commands:

Build the LLVM IR bytecode (bc file)
Build the tracing binary and trace the binary.
Processing the trace (the one you missed).
Simulate it in gem5.

In the original script this command processes the trace:

python Driver.py $Benchmark $SimTrace -t $BaseTrans -d

Can you check if this fixes the problem?

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Thanks for the rapid reply!
I've added the -d command for trace processing. However, the "trace missing" problem still occurs. I looked into the code Driver.schedule_transform, retrieving tdgs by calling Benchmark.get_tdgs, which gets tdgs from the traces. However, since the "trace the binary" command didn't produce *.trace files, the tdgs list is empty and no processing or simulation is executed. Should I do some modification for the ”trace the binary“ process to produce the required files?

from gem-forge-framework.

seanzw commented on July 26, 2024

Can you check the replay folder in the result directory? I tried on my side and I successfully generate the tdg files to be simulated. This is the structure in my result folder:

➜  vec_add git:(main) ✗ ll
total 120K
-rw-r--r-- 1 gf gf 28K Dec  8 20:13 gfm.vec_add.replay.bc
-rw-r--r-- 1 gf gf 28K Dec  8 19:59 raw.bc
-rw-r--r-- 1 gf gf 64K Dec  8 19:59 raw.ll
drwxr-xr-x 3 gf gf 100 Dec  8 20:13 replay
drwxr-xr-x 2 gf gf 142 Dec  8 20:06 trace
➜  vec_add git:(main) ✗ ll trace
total 123M
-rw-r--r-- 1 gf gf 123M Dec  8 20:06 0.0.trace
-rw-r--r-- 1 gf gf 525K Dec  8 20:06 0.bbtrace
-rw-r--r-- 1 gf gf  208 Dec  8 20:06 0.profile
-rw-r--r-- 1 gf gf   19 Dec  8 20:06 0.profile.txt
-rw-r--r-- 1 gf gf  36K Dec  8 20:05 inst.uid
-rw-r--r-- 1 gf gf  34K Dec  8 20:05 inst.uid.txt
➜  vec_add git:(main) ✗ ll replay
total 393M
-rw-r--r-- 1 gf gf 354M Dec  8 20:13 0.tdg
-rw-r--r-- 1 gf gf  40M Dec  8 20:13 0.tdg.cache
drwxr-xr-x 2 gf gf   81 Dec  8 20:13 0.tdg.extra
-rw-r--r-- 1 gf gf  364 Dec  8 20:13 0.tdg.stats.txt

Once you have the tdg files, you should be able to simulate (I hope). Just in case, here is my modified script. Notice that I pass input-size instead of sim-input-size, but that should not matter.

#!/bin/bash

# rm -f /tmp/job_scheduler.*

# Specify the benchmark. The source file is in
# transform/benchmark/GemForgeMicroSuite/stream/vec_add/omp_vec_add_avx
Benchmark='-b '
Benchmark+='gfm.vec_add,'

# Specify the input size. Check driver/BenchmarkDrivers/GemForgeMicroSuite.py
# for the details of how the benchmark is built and simulated.
SimInput=large

# Specify the number of threads. The workload is parallelized with OpenMP.
Threads=64

# The following two commands build the LLVM bytecode of the workload in
# /gem-forge-stack/result/stream/gfm/vec_add
# You can check the IR in raw.ll (text format).
# python Driver.py $Benchmark --build

# This command traces the binary and generates the trace in
# /gem-forge-stack/result/stream/gfm/vec_add/trace
# python Driver.py $Benchmark --input-size $SimInput --trace

# This command processes the LLVM IR trace for simulation.
# Baseline means there is no stream specialization.
# The processed trace is located in:
# /gem-forge-stack/result/stream/gfm/vec_add/replay/0.tdg
BaseTrans=replay
python Driver.py $Benchmark -t $BaseTrans -d

from gem-forge-framework.

uv-xiao commented on July 26, 2024

It seems that our scripts are almost the same except for the input-size parameter.
After running the script, where the last -d command takes no effect, here's the structure in my result folder:

[10:36:28] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll 
total 100K
-rw-rw-r-- 1 uvxiao uvxiao  27K 12月  9 10:27 raw.bc
-rw-rw-r-- 1 uvxiao uvxiao  61K 12月  9 10:27 raw.ll
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月  9 10:27 replay
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月  9 10:27 trace
[10:36:29] [cost 0.100s] ll                                                                                                

[10:41:40] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll trace
total 252K
-rw-rw-r-- 1 uvxiao uvxiao 168K 12月  9 10:27 0.bbtrace
-rw-rw-r-- 1 uvxiao uvxiao  448 12月  9 10:27 0.profile
-rw-rw-r-- 1 uvxiao uvxiao   39 12月  9 10:27 0.profile.txt
-rw-rw-r-- 1 uvxiao uvxiao  37K 12月  9 10:27 inst.uid
-rw-rw-r-- 1 uvxiao uvxiao  34K 12月  9 10:27 inst.uid.txt
[10:41:45] [cost 0.143s] ll trace                                                                                          

[10:41:46] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll replay
total 0
[10:41:50] [cost 0.090s] ll replay

I found that the missing files are the gfm.vec_add.replay.bc and the trace/0.0.trace. That's kind of weird, since the --trace command seems to work successfully. It gives outputs as follows:

initialize hardExitCount to 10000000000.
initializing tracer...
Initializing traceMode to 0.
Initializing traceFolder to /home/uvxiao/repos/gem-forge-framework/result/stream/gfm/vec_add/trace.
Initializing traceROI to 0.
Initializing SKIP_INST to 0...
Initializing MAX_INST to 1...
Initializing START_INST to 0...
Initializing END_INST to 0...
Initializing PRINT_INTERVAL to 10000000...
Initializing isDebug to 0...
[0]1: bb 0 alloca
[0]2: bb 1 icmp
[0]3: bb 2 br
[0]4: bb3 0 getelementptr

Number of Threads: 1.
Data size 16384kB Offset 0kB Random 0.
[0]10000000: bb22 4 getelementptr
[0]10000001: bb22 5 bitcast
[0]10000002: bb22 6 load
[0]10000003: bb22 7 getelementptr
[0]10000004: bb22 8 bitcast

Profiled Inst #12627546
Done!

Is that a version problem? I'm using gem-forge-framework at the commit 8ebd48 (main), with the driver at 95e004 (main) and the transform at b05a77 (main).

from gem-forge-framework.

seanzw commented on July 26, 2024

Here is my output of tracing. It seems yours missed the clean up log.

The first one is func enter.
[0]10000000:main (0) -> foo (1) ->  bb6 9 icmp
[0]10000001:main (0) -> foo (1) ->  bb6 10 br
[0]10000002:main (0) -> foo (1) ->  bb6 0 phi
[0]10000003:main (0) -> foo (1) ->  bb6 1 getelementptr
[0]10000004:main (0) -> foo (1) ->  bb6 2 load

[0]20000000:main (0) -> foo (1) ->  bb6 8 add
[0]20000001:main (0) -> foo (1) ->  bb6 9 icmp
[0]20000002:main (0) -> foo (1) ->  bb6 10 br
[0]20000003:main (0) -> foo (1) ->  bb6 0 phi
[0]20000004:main (0) -> foo (1) ->  bb6 1 getelementptr

[0]30000000:main (0) -> foo (1) ->  bb6 7 store
[0]30000001:main (0) -> foo (1) ->  bb6 8 add
[0]30000002:main (0) -> foo (1) ->  bb6 9 icmp
[0]30000003:main (0) -> foo (1) ->  bb6 10 br
[0]30000004:main (0) -> foo (1) ->  bb6 0 phi

[0]40000000:main (0) -> foo (1) ->  bb6 6 getelementptr
[0]40000001:main (0) -> foo (1) ->  bb6 7 store
[0]40000002:main (0) -> foo (1) ->  bb6 8 add
[0]40000003:main (0) -> foo (1) ->  bb6 9 icmp
[0]40000004:main (0) -> foo (1) ->  bb6 10 br

Clean up.
Clean up tid 0.
Thread 0 Traced #46137349
Clean up tid 1.
Clean up tid 2.
Clean up tid 3.
Clean up tid 4.
Clean up tid 5.
Clean up tid 6.
Clean up tid 7.
Clean up tid 8.
Clean up tid 9.
Profiled Inst #46137348

I am not sure this is a version problem, since tracing part hasn't changed for more than two years. My guess is that somehow your binary doesn't write it to the file. Can you take a look at TracerProtobuf.cpp? There is a cleanup function which should be called and dump the trace to file. Or you can change the input size larger in GemForgeMicroSuite.py to make sure the binary runs longer and has more instruction in the trace (my guess is that when the number of instruction traced is low, the tracer has a bug to not call cleanup() function). I hope this is helpful.

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Truly thanks for your help! I found that it was my bad, self-assertive updates applied to Benchmark.py that caused the failure of generating *.trace tracing results. Specifically, the buggy updates are as follows:

            # # Remember to set the environment for trace.
            # os.putenv('LLVM_TDG_TRACE_FOLDER', self.get_trace_folder_abs())
            # os.putenv('LLVM_TDG_INST_UID_FILE', self.get_trace_inst_uid())
            # # We need libunwind.so for profiling.
            # os.putenv('LD_LIBRARY_PATH', os.path.join(C.LLVM_PATH, 'lib'))
            
            env = os.environ.copy()
            env['LLVM_TDG_TRACE_FOLDER'] = self.get_trace_folder_abs()
            env['LLVM_TDG_INST_UID_FILE'] = self.get_trace_inst_uid()
            env['LD_LIBRARY_PATH'] = '{}:{}'.format(os.path.join(C.LLVM_PATH, 'lib'), env['LD_LIBRARY_PATH'])

            run_cmd = [
                './' + self.get_trace_bin(),
            ]

            if self.get_args(input_name) is not None:
                run_cmd += self.get_args(input_name)
            
            # print('# Run traced binary...')
            # Util.call_helper(run_cmd)
            subprocess.run(run_cmd, env=env, check=True)

It's really stupid. After recovering the code, I attained the replay folder as follows:

[14:43:14] [~/repos/gem-forge-framework/result/stream/gfm/vec_add] [main ✖] ❱❱❱ ll replay 
total 6.9M
-rw-rw-r-- 1 uvxiao uvxiao 6.1M 12月 10 14:40 0.tdg
-rw-rw-r-- 1 uvxiao uvxiao 748K 12月 10 14:40 0.tdg.cache
drwxrwxr-x 2 uvxiao uvxiao 4.0K 12月 10 14:40 0.tdg.extra
-rw-rw-r-- 1 uvxiao uvxiao  362 12月 10 14:39 0.tdg.stats.txt
[14:43:17] [cost 0.116s] ll replay

However, I'm actually kind of confused about the trace processing. According to "Analyzing Behavior Specialized Acceleration" at ASPLOS '16, the TDG representation composing $\mu$DG and program IR comes from the results of a simulator. But here the TDG is produced through processing tracing results. I suppose that the program IR is implied by the traced instructions. However, does it suggest that the $\mu$arch things can be inferred from the tracing results? I hope you could provide some suggestions to help me understand the insights behind the methodology. Thanks sincerely in advance.

After generating the *.tdg file, I tried to trigger the simulation. I met the following error:

warn: The `get_runtime_isa` function is deprecated. Please migrate away from using this function.
NameError: name 'options' is not defined

At:
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(196): createCPUStandalone
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(222): initializeCPUs
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/run.py(466): <module>
  build/X86/python/m5/main.py(599): main

The code segment is as follows:

        # For each process, add a LLVMTraceCPU for simulation.
        llvm_trace_cpu = \
            GemForgeLLVMTraceCPUConfig.initializeLLVMTraceCPU(
                options, len(cpus))

It seems the options are not defined. I guess the expected argument is args. After changing options to args, another error happens:

AttributeError: object 'LLVMTraceCPU' has no attribute 'ArchMMU'
  (C++ object is not yet constructed, so wrapped C++ methods are unavailable.)

At:
  build/X86/python/m5/SimObject.py(851): __getattr__
  build/X86/cpu/BaseCPU.py(344): __init__
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeLLVMTraceCPUConfig.py(5): initializeLLVMTraceCPU
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(195): createCPUStandalone
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/GemForgeCPUConfig.py(222): initializeCPUs
  /home/uvxiao/repos/gem-forge-framework/gem5/configs/example/gem_forge/run.py(466): <module>
  build/X86/python/m5/main.py(599): main

I found a similar issue (GEM5-1335). So I guess that it might be a GEM5 bug. How do you like that?

Thanks a lot.

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Thanks sincerely for your selfless help！I've understood more about the methodology, and I'd like to look at the transform for more details.
Thanks again for your debugging and maintaining efforts!

from gem-forge-framework.

seanzw commented on July 26, 2024

BTW I forgot to mention that my master thesis is about this tool, you can take a look if you like.

master thesis

from gem-forge-framework.

uv-xiao commented on July 26, 2024

Sincerely appreciate your exciting work and selfless help!
Sorry for the late reply. I've just recovered from influenza 😭 I followed the git diff to get the code updated, and it seems to work well now! And your thesis helped me considerably to understand this tool.
Thanks again for your help!

from gem-forge-framework.

seanzw commented on July 26, 2024

Take care. Good luck!

from gem-forge-framework.

Request for guidance on trace-based simulation in gem-forge about gem-forge-framework HOT 19 CLOSED

Comments (19)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent