gwinel / likwid Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/likwid
License: GNU General Public License v3.0
Automatically exported from code.google.com/p/likwid
License: GNU General Public License v3.0
[deleted issue]
Extend cpuid with the Pentium M variants
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 8:22
Groups can only be set statically for all architectures. This
is inflexible because some architectures have more counters or other events.
Solution:
Implement a dynamic group approach.
Add a routine to ask for the supported groups.
The API can be implemented with function pointers.
These are initialized according to the architecture.
Original issue reported on code.google.com by [email protected]
on 9 Oct 2009 at 6:18
Extend the module for Pentium M.
Probably a good opportunity for a partial rewrite.
Original issue reported on code.google.com by [email protected]
on 4 Oct 2009 at 9:02
The marker API could be simpler in usage.
Analyse the requirements and specify and implement new revised API.
Original issue reported on code.google.com by [email protected]
on 3 Dec 2010 at 9:49
When using markers it should be possible to accumulate a measurement
with multiple calls to the library.
Original issue reported on code.google.com by [email protected]
on 22 Sep 2009 at 5:22
The solution at the moment with Intel MPI is very specific and
does not scale.
Design a method based on a shell wrapper to implmement this independent from
the MPI library used.
Original issue reported on code.google.com by [email protected]
on 28 Apr 2011 at 1:59
For many events in order to count cycles it is necessary apart from eventID and
umask also to initialize the CMASK and INVERT field.
Plan:
Analyze the implications on the code.
Implement for Core 2 and Nehalem.
Original issue reported on code.google.com by [email protected]
on 16 Jun 2010 at 7:50
Group ticket for release 2.0.
Open points:
* Formal review
* Testing
* Final benchmark set for likwid-bench
Original issue reported on code.google.com by [email protected]
on 20 Aug 2010 at 8:08
Convert programming style to WIKI
Original issue reported on code.google.com by [email protected]
on 7 Jun 2010 at 1:23
[deleted issue]
K8 is still a very common architecture.
Implement support for K8 in
* cpuid
* perfCtr
Original issue reported on code.google.com by [email protected]
on 23 Oct 2009 at 1:50
The clock measurement at the beginning of every start takes some time.
On automated runs it should be possible to turn it off.
1. Evaluate how far the duration can be reduced for exact results.
2. Add an option to turn it off.
Original issue reported on code.google.com by [email protected]
on 28 Apr 2011 at 1:57
What steps will reproduce the problem?
1. likwid-pin -c 0,1 ls
What is the expected output? What do you see instead?
Expected output: Message from likwid-pin about pinning, then output from ls.
Actual output:
[likwid-pin] Main PID -> core 0 - OK
ERROR: ld.so: object '/home.local/jan/bin/lib/liblikwidpin.so' from
LD_PRELOAD cannot be preloaded: ignored.
... and then the output from ls.
Please use labels and text to provide additional information.
Original issue reported on code.google.com by [email protected]
on 5 Feb 2010 at 11:45
Intel specifies besides the kernel provided logical numbering
a topological numbering.
Provide a switch that likwid-pin can be also used with such an topological
numbering.
Original issue reported on code.google.com by [email protected]
on 11 May 2010 at 2:25
Compilation on woody fails with
./GCC/magma_dot.s:32: Error: no such instruction: `dpps xmm6,xmm5,0xFF'
Original issue reported on code.google.com by [email protected]
on 29 Jun 2010 at 8:50
[deleted issue]
The asm directives in cpuid module and probably else where do not compile
with gcc 4.4.1.
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 8:46
If counters are zero it apprears that NAN occur because of divide through
zero.
Solution:
Test for zeros before division.
Original issue reported on code.google.com by [email protected]
on 15 Oct 2009 at 11:47
There is a socket lock for turning on and off the Nehalem Uncore events.
Still with the Marker API on multiple runs different cores might akkumulate
the results.
Solution:
Either force one core to turn counters on or off. Or later in the result
presentation sum up all core results.
Original issue reported on code.google.com by [email protected]
on 30 May 2010 at 7:40
Add a make install target.
Put all configurable make settings in a separate file.
Original issue reported on code.google.com by [email protected]
on 4 Nov 2009 at 6:00
A special switch should re-interpret the specified core numbers as logical
numbers.
Original issue reported on code.google.com by [email protected]
on 12 Apr 2010 at 2:26
$ export OMP_NUM_THREADS=4 # dunnington1
$ echo $KMP_AFFINITY
disabled
$ likwid-pin -t intel_omp -c 0,3,6,9 ./stream_omp_NT.exe
[likwid-pin] Main PID -> core 0 - OK
[...]
[pthread wrapper] [pthread wrapper] PIN_MASK: 0->9
[pthread wrapper] SKIP MASK: 0x0
[pthread wrapper 0] Notice: Using libpthread.so.0
threadid 1073809728 -> core 9 - OK
[pthread wrapper 1] Notice: Using libpthread.so.0
threadid 1078008128 -> core 0 - OK
[pthread wrapper 2] Notice: Using libpthread.so.0
threadid 1082206528 -> core 0 - OK
[pthread wrapper 3] Notice: Using libpthread.so.0
threadid 1086404928 -> core 0 - OK
---------------------
In fact, all 4 application threads are running on core 0.
The only one on core 9 is the shepherd thread.
Original issue reported on code.google.com by [email protected]
on 5 Feb 2010 at 2:15
Up to now the sys fs file cpumap was used. I did not found any Documentation
about these files. This will be changed to use the cpulist file which is plain
integer list.
Original issue reported on code.google.com by [email protected]
on 19 Sep 2010 at 2:30
At the moment there is much redundant processor specific code.
Solution:
Reduce the processor specific parts to data configurations which are
processed by generic routines.
Original issue reported on code.google.com by [email protected]
on 25 Mar 2010 at 8:24
The detection of the processor clock does not work reliable with speedstep
active.
Solution:
Add a check if speedstep is enabled and ouput a warning.
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 7:45
Label for enabled SMT is too wide for box.
Solution:
1. Make the label width variable
or Increase the box width
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 8:19
The performance groups should be easily extensible.
A solution is to specify the groups in terms of simple text files which
are parsed by a Perl script which generated the according code.
Original issue reported on code.google.com by [email protected]
on 15 Jun 2010 at 10:42
Implement support for Sandy Bridge: Core and Uncore.
Original issue reported on code.google.com by [email protected]
on 27 Jan 2011 at 9:45
At the moment the skip mask only supports up to 64 threads due to the usage of
64 bit integers.
Solution:
* Provide a solution where everything above 64 threads is truncated.
* Use a bitset of configurable size
Original issue reported on code.google.com by [email protected]
on 26 Nov 2010 at 1:20
Determine and print also NUMA information in likwid-topology
Original issue reported on code.google.com by [email protected]
on 26 Mar 2010 at 2:08
For threaded applications it is tedious to specify the core list
two times. A better solution is to integrate the pinning functionality in
likwid-perfctr.
Original issue reported on code.google.com by [email protected]
on 12 Oct 2010 at 11:09
[deleted issue]
The performance groups in perfCtr are not validated.
This is especially true for Intel Nehalem.
Solution:
Validate the existing groups with the implemented events.
Think about new metrics for performance groups.
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 8:24
The Intel Nehalem EX Uncore is different from Nehalem.
Task:
Implement the Nehalem EX Uncore environments
Original issue reported on code.google.com by [email protected]
on 12 Oct 2010 at 7:41
It is more convenient if this necessary environment variable is set by
likwid-pin. This can be done in the -t switch clause.
Original issue reported on code.google.com by [email protected]
on 11 May 2010 at 2:26
[deleted issue]
likwid-perfCtr should support a mode were the counters run during
execution of the application and in configured time steps read out and
printed. This allows to generate graphs of events or derived metrics over an
applications execution time.
Tasks:
1. Adopt the multiplex module to allow continous measurements.
2. Extend the architectures code by a routine to just read out results.
3. Provide output and computations of raw events and derived metrics.
Original issue reported on code.google.com by [email protected]
on 20 Aug 2010 at 7:39
Just found likwid (i.e. minutes ago) and I am very happy to see it. I need to
measure main memory traffic on a dual socket Nehalem system (i.e. across both
sockets). After investigating all the performance counters, it appears that
some combination of the following is what I want:
UNC_QHL_REQUESTS.LOCAL_(READS|WRITES)
UNC_IMC_NORMAL_READS.ANY
UNC_IMS_WRITES_FULL.ANY
On to my question then -- in reading your wiki pages for Nehalem, I see that
the performance group "MEM" uses UNC_L3_LINES_IN_ANY as part of its bandwidth
measurement. I believe that this counter will count the allocation of a cache
line in any state, i.e., including modified or exclusive. This means that if I
assign a value without first reading it, you would incorrectly count this as
part of the memory traffic.
This is relevant, in (e.g.), my particular case which is sparse matrix vector
multiply (SpMV). The inner loop of SpMV accumulates the dot product of one row
in the matrix with the vector operand. The accumulation is held in a processor
register (ideally). Once finished, the value is simply written into the
destination vector, i.e., so as to avoid a write-miss. Thus, L3 cache should
allocate a line in the modified state without performing any DRAM access, thus
defeating the use of UNC_L3_LINES_IN_ANY as a useful counter for memory traffic.
Finally, I must admit that I am at the moment simply searching for a good
solution. If the above is wrong, then please simply let me know :)
As a related question: can likwid access all of the uncore performance
counters (if yes, then you can add this to your brag sheet as perf cannot do
this AFAIK).
Thank you,
Pete Stevenson
Original issue reported on code.google.com by [email protected]
on 5 Jun 2011 at 11:28
[deleted issue]
This is the ticket keeping track about the TODOs for the V1 release.
There is already a named branch for this release.
Current open issues:
1. Port P6 and Pentium M to new perfmon module
2. Write test procedure document for release
3. Update and Review Documentation (man pages and WIKI)
4. Provide examples in WIKI for marker use case
5. Review, extend and validate the groups for Core 2 and Nehalem
Original issue reported on code.google.com by [email protected]
on 29 Apr 2010 at 12:42
Problem:
At the moment the core ids can only be given as comma sparated list.
For high core counts it is convenient to anter ranges e.g. 0-7
Fix:
Implement ragens in the command line treatment of PErfCtr main routine.
Original issue reported on code.google.com by [email protected]
on 9 Oct 2009 at 6:29
At the moment if using the marker API only one instance of likwid
can run on a node.
It is useful to be able to start several instances of likwid on a node.
Solution:
Include the pid of either the wrapper process or the application in
the filename of the marker results.
Original issue reported on code.google.com by [email protected]
on 22 Mar 2010 at 2:43
To enhance the security of likwid a proper
handling of strings is crucial. bstring provides
a secure, fast and feature rich interface.
Original issue reported on code.google.com by [email protected]
on 14 Jan 2010 at 12:26
What steps will reproduce the problem?
1. Instrument the code as described in:
http://code.google.com/p/likwid/wiki/Introduction
2. Run the command
./likwid-perfCtr -m -c 0 -g L2 ./likwid-pin -c 0 <executable>
3.
What is the expected output? What do you see instead?
Expected output: the hardware counter information
Actual output: The message: "WARNING: Number of threads in marker file unequal
to number of threads in likwid-perfCtr!"
and No output is seen.
What version of the product are you using? On what operating system?
Using 1.0.
On linux
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 13 Jul 2010 at 2:51
Due to multiple applications and many supported architectures in likwid it
is necessary to follow a systematic work instruction to test likwid before
releases.
Task:
Write document with work instructions and checklists to test and review
likwid before releases.
Original issue reported on code.google.com by [email protected]
on 15 Jan 2010 at 7:45
The Uncore events on the Intel Nehalem processors are implemented but
do not yet work reliably.
Solution:
Test and fix the Uncore events.
Add appropriate events to the performance groups.
Original issue reported on code.google.com by [email protected]
on 28 Sep 2009 at 8:25
What steps will reproduce the problem?
1. Set install path with a slash at the end
2. make && make install
3. likwid-pin will result in an error
What is the expected output? What do you see instead?
Usual likwid-pin behavior.
Instead, no pinning and error:
~/helper/likwid-2.2> likwid-pin -c0-1 uname
[likwid-pin] Main PID -> core 0 - OK
ERROR: ld.so: object '~/helper/likwid-2.2-inst' from LD_PRELOAD cannot be
preloaded: ignored.
What version of the product are you using? On what operating system?
2.2, Linux
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 28 Jun 2011 at 6:47
Optionally support perf_event interface
available since Linux version 2.6.31.
Original issue reported on code.google.com by [email protected]
on 8 Feb 2010 at 2:00
A major problem in using the tools is if the msr module is not loaded
or if the rights on the device files are not sufficient.
Solution:
Add a check on startup and issue a informative warning if
problems are detected.
Original issue reported on code.google.com by [email protected]
on 23 Oct 2009 at 2:17
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.