The ioapps's intro from jirihorky

IOapps, an IO profiler and IO traces replayer
written by Jiri Horky <[email protected]>, 2010, 2011
http://code.google.com/p/ioapps/

Quick description
-----------------
IOapps consists of two applications: ioreplay and ioprofiler. Both of them share common code base,
the input parsing and handling.

ioreplay is mainly intended for replaying of recorded traces, whereas ioprofiler is a GUI application
for plotting application's IO access pattern.

Both application can currently read strace output produced by:
strace -q -a1 -s0 -f -tttT -oOUT_FILE -e trace=file,desc,process,socket APPLICATION ARGUMENTS

Both applications can also read binary format of the above data, which is smaller, faster to process and
architecture independent.

Building:
-----------------
To build both application, run:
'make'

and to install it:

'make install'

If you only want to build or install one of them, run:
'make replay' or 'make profiler'
and then:
'make install_replay' or 'make install_profiler'.

Features:
--------
- multiple strace source possible by design
- "raw" strace.file produced by strace -q -a1 -s0 -f -tttT -ostrace.file APPLICATION
- binary format. This is recommended, as it is stored in natural
- other formats possible by design.
- intelligent file descriptor --> file name mapping tracking
- process/threads support (copying, sharing file descriptor tables)
- duplication of fds, pipe fd...
- pipe, socket file descriptor recognition, corresponding reads and writes are not done at all
- multiple options for timing of replaying (

------------------------
Supported syscalls:
-------------------------
open
close
read
write
lseek
_llseek
create
mkdir
access
unlink
rmdir
pipe - this doesn't really call pipe(2) syscall, just keeping track of newly created fds. We don't support IO operations
to pipe.
socket - similary to pipe, we just keep track of fd mappings
dup,dup2,dup3 - this is little bit tricky, as we don't call dup(2) at all. We just keep track of what happened in original
process. This is due to our data structures and because it is not necessary to actually call dup,
as newly created fd is pointing to the same context.
clone - we don't create a new process but just clone/copy it's FD table.

Not supported, but should be:
----------------------------
pipe2
pread !
pwrite !
tee
splice
vmsplice
stat
fstat

-----
Known Limitations:
-----
Single threaded as we were trying to keep the thing simple.

So it can't really replicate IOs from multiple threads, that are asynchronous and reentrant.
It is a big flaw in two views, see them below. But in the practice, one can live with that, if the program he uses is mostly
single threaded.

1. This can cause noticeable difference in IO performance.

2. Unfinished/resumed calls handling:

Imagine this strace:

18599 1271322908.363138 read(3, <unfinished ...>
18600 1271322908.363234 write(1, ""..., 4096) = 4096 <0.000008>
18599 1271322908.363264 <... read resumed> ""..., 8192) = 4096 <0.000118>

If a processing of interrupted syscalls is switched off, this read will not be caught.

If it is enabled (default), there are two approaches for this:
First (the bad) one resulting in:

18599 1271322908.363138 read(3, ""..., 8192) = 4096 <0.000118>
18600 1271322908.363234 write(1, ""..., 4096) = 4096 <0.000008>

Notice that the first read in process 18599 ended later than the write in process 18600 started:

Well, we could loose the exact order of the syscalls, but as we are still single-threaded (and even if we were multi-threaded,
it still would be impossible to do it EXACTLY in the right order), it doesn't matter. It would only matters, if those
syscall were dependent. This could cause deadlocks in our app. For example: The original read in one process blocks as it waits for data. If
everything so far goes as we expected, we would block too. But as we are just single threaded, we will never replicate
another write process that has written the data after we are done with processing of the first one.
This example is especially true for pipes, but IMHO can be spotted in other situations too.

So, we do it other way around:
18600 1271322908.363234 write(1, ""..., 4096) = 4096 <0.000008>
18599 1271322908.363138 read(3, ""..., 8192) = 4096 <0.000118>

Please notice, that the start time of the read is not touched, so although it is listed below the write, it has the true start time.
When replicating, it should not cause any problems. The only problematic scenario I can think of is following:
Process one tries to read from file descriptor, this read is interrupted. Then another process, which shares the descriptor, seeks it somewhere. Then, our
read is resumed. We will then misinterpret the read, as the seek will now be before the read. It's even possible that kernel do not allow such
behaviour. But it so unlikely, that I am too lazy to write a test program for it.

The real solution would be to really be multithreaded, but then more complexity arise...

2. Does not support asynchronous IOs. At all. Yet.

3. Does not support memory mapped files. At all. And it is IMHO not possible without kernel hacking.

4. We do not care about O_CLOEXEC when replicating ... should we?

Notes:
-------------
Close - behaviour on duplicated fds - one would expect that the close() call will close every copy of the file - which
is apparently not true. One needs to call close() for every instance of the previously duplicated fd. So we do it :-)

NOTES:
--------
When replicating, it may happen that not all actions will be replicated as they should be. For example: imagine some file, created by the program, then closed, and then tried to read again.
One would except that open will success as the file should exist. However, some 3rd party application could have deleted the file in the meanwhile. You can not see it from the strace output.
Let's say that the original program tried to open the file, it failed and then it created it again. The problem is, that our open call will not fail and thus the other creat
(or open with O_CREAT flag) will inherently cause error to our execution.

I have decided not to solve it, as it would probably need more time than would be the benefits. Moreover, one can imagine more special cases like that, e.g. with mkdirs or unlinks. This application is not
that perfect.

jirihorky / ioapps Goto Github PK

ioapps's Introduction

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent