Comments (10)
Reduced version:
import core.sys.posix.stdio;
struct File
{
private struct Impl
{
FILE * handle = null;
}
private Impl * p;
}
// Specialization for strings - a very frequent case
void writeln(T...)(T args)
{
fprintf(.stdout.p.handle, "%.*s\n",
cast() args[0].length, args[0].ptr) ;
}
extern(C) void std_stdio_static_this()
{
// stdout
__gshared File.Impl stdoutImpl;
stdoutImpl.handle = core.stdc.stdio.stdout;
.stdout.p = &stdoutImpl;
}
shared static this()
{
std_stdio_static_this();
}
File stdout;
void main()
{
char[4096*2] buff = 0x30;
writeln(buff);
}
from ldc.
I ran this (it produces native assembly, so a .s
file):
f=$1
# ldc2 compiled using llvm-3.0, binaries also belong to llvm-3.0 installation
time ldc2 "$f.d" -output-ll
time llvm-as "$f.ll"
time llvm-ld -native "$f.bc"
First two steps were really fast, but I got like 3.5s for the last step, so what's taking a lot of time is compiling the llvm bytecode into native assembly.
Ldc2 to to the full compilation takes 8s (much more?) for some reason, and adding verbose to it doesn't really help. All I see is the final linking pass which is really fast (/usr/bin/gcc i49.o -o i49 -Xlinker -L/usr/local/lib -Xlinker -lphobos-ldc -ldl -lpthread -lm -m64
).
from ldc.
Seems like the LLVM instruction scheduler goes crazy on the generated LLVM IR – almost all the time is spent there, and the emitted code is horrible: It unrolls the loop into an endless series of mov
s, instead of just emitting a rep stos
like DMD does…
@dansanduleac: You can get really detailled log output for the LDC »glue code« part with -vv
. In this case, it isn't really helpful, but it would at least tell you that the time is spent after the modules are passed to LLVM for codegen.
from ldc.
Reduced:
alias char[4096*2] Big;
void foo(Big big) {}
void main() {
Big buf = 0x30;
foo(buf);
}
from ldc.
… and some more:
define x86_stdcallcc i32 @foo([8192 x i8] %big_arg) {
entry:
ret i32 1
}
define x86_stdcallcc i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) {
entry:
%buf = alloca [8192 x i8], align 1
%tmp4 = load [8192 x i8]* %buf
%tmp5 = call x86_stdcallcc i32 @foo([8192 x i8] %tmp4)
ret i32 %tmp5
}
Seems like parameter passing is the issue – but why?
from ldc.
According to Duncan Sands at #llvm, we should generate i32 @foo(byval [8192 x i8]* %big_arg)
instead.
from ldc.
What's the purpose of [8192 x i8] as parameter type then if structs and arrays are to be passed via pointer + byval anyways?
Can't really make sense out of http://llvm.org/docs/LangRef.html#byval
from ldc.
@Trass3r: Apparently, it is there for cases like complex numbers, where you really want to have a pair/… of registers.
from ldc.
Apparently we need to pass around pointed types (e.g. [8192 x i8]* byval
) in order to use the byval qualifier (which looks like it might act in a copy-on-write way, but nevertheless does what it should). I managed to make a patch that fixes this issue, and also doesn't allocate any global memory (which I was afraid of if using a pointed value in LLVM). I will submit a pull request in a bit :)
from ldc.
With this fix it seems druntime
benchmarks also became so much faster (aabench/string
now takes 0.31s instead of 3.68s on 64-bit OSX with phobos as shared library) !
from ldc.
Related Issues (20)
- 'Instruction does not dominate all uses' wrt. ternary expressions using array literals HOT 1
- undefined reference to `llvm::SampleContextTracker::getContextString[abi:cxx11](llvm::sampleprof::FunctionSamples const&) const' HOT 2
- LDC-src Build: C++ ABI broken HOT 4
- linkonce-templates bugs HOT 2
- llvm 18 HOT 6
- Tests fail on Gentoo HOT 1
- ASan regression on Windows with MSVC 14.38.33130 HOT 1
- uncaught exception reached top of stack HOT 4
- LDC 1.37.0 is missing FreeBSD binaries HOT 6
- Possible memory corruption in GarbageCollect2Stack pass HOT 6
- Re-enable frame pointers for optimized code
- undefined symbol on hidden function: LDC should detect usage of symbols in inline assembly for LTO builds and keep them HOT 4
- [Feature Request] Non-constant static initialized nested delegate literal expression better error message HOT 13
- Test failures on linux aarch64 HOT 3
- [x86] segmentation fault with aligned struct parameters HOT 5
- GC Memory leak HOT 10
- betterC - alloca - missing __chkstk HOT 1
- ARM/AArch64 cross-compilation targets aren't respected in an MSVC environment on Windows. HOT 2
- Cross-compilation targets aren't respected when preprocessing C files with `clang-cl` (on Windows). HOT 1
- LDC sets both never-inline and always-inline on inline functions containing DMD-style inline assembly. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ldc.