Giter Club home page Giter Club logo

Comments (10)

mewmew avatar mewmew commented on September 22, 2024

@pupiles, I tried compiling the LLVM IR example you provided using Clang (13.0.0), but get the same error:

u@x1 /t/foo [1]> clang -o foo foo.ll
foo.ll:1437:287: error: expected '('
  invoke void (%"class.std::__cxx11::basic_string"*, i32 (i8*, i64, i8*, %struct.__va_list_tag*)*, i64, i8*, ...) @_ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_P13__va_list_tagEmSB_z(%"class.std::__cxx11::basic_string"* nonnull sret align 8 %6, i32 (i8*, i64, i8*, %struct.__va_list_tag*)* nonnull @vsnprintf, i64 32, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.39, i64 0, i64 0), i64 %137)
                                                                                                                                                                                                                                                                                              ^
1 error generated.

So, it seems the official LLVM tools are not able to parse the gdpr_handler.cpp.o LLVM IR file. Try generating a new one using Clang, version 13.0.0.

Cheers,
Robin

from llvm.

dannypsnl avatar dannypsnl commented on September 22, 2024

@pupiles btw, do you know what generates this file? It might be a new feature of llvm

from llvm.

pupiles avatar pupiles commented on September 22, 2024

@mewmew @dannypsnl,
It is generated by clang11, and clang11 can disassemble correctly by llc-11, but I can't parse it using llir either v0.3.3(llvm11) or v0.3.4(llvm12).

from llvm.

dannypsnl avatar dannypsnl commented on September 22, 2024

@mewmew

func main() {
	m := ir.NewModule()
	basic_string_t := m.NewTypeDef("class.std::__cxx11::basic_string", types.NewStruct(types.I8))
	vsn_printf := m.NewFunc("vsnprintf", types.I32,
		ir.NewParam("", types.NewPointer(types.I8)),
		ir.NewParam("", types.I64),
		ir.NewParam("", types.NewPointer(types.I8)),
	)
	vsn_printf.Sig.Variadic = true
	invokee := m.NewFunc("_ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_P13__va_list_tagEmSB_z",
		types.Void,
		ir.NewParam("", basic_string_t),
		ir.NewParam("", vsn_printf.Typ),
		ir.NewParam("", types.I64),
		ir.NewParam("", types.NewPointer(types.I8)),
	)
	mF := m.NewFunc("main", types.I32)
	mB := mF.NewBlock("")
	f := m.NewGlobal("", basic_string_t)
	i := m.NewGlobal("", types.I64)
	p := m.NewGlobal("", types.NewPointer(types.I8))
	mB.NewInvoke(invokee, []value.Value{f, vsn_printf, i, p}, nil, nil)
	println(m.String())
}

A draft start

from llvm.

mewmew avatar mewmew commented on September 22, 2024

It seems there are two primary issues with the LLVM IR file that causes the parsing to fail.

Firstly, the sret parameter attributes (without explicit type) are valid for LLVM 11.0, but not for LLVM 13.0 (see parseRequiredTypeAttr of the official LLVM source code). In LLVM 13.0, an explicit type is needed, e.g.

sret(i8)

This was verified by trying to parse the original gdpr_handler.ll file using opt -S -o foo_13.ll gdpr_handler.ll using opt from LLVM 13.0.

A work-around is simply to remove sret from the input LLVM IR file.

Secondly, there is a known issue with llir/llvm where it is unable to parse align attributes. This is due to a LR(1) shift/reduce ambiguity in the original LLVM IR grammar (as described in #40).

If we remove align and sret attributes, then llir/llvm is able to parse the output produced by opt -S foo_13.ll gdpr_handler.ll using LLVM 13.0, when using the llvm13 branch of llir/llvm. Note, support for the DIFlagExportSymbols enum was added in 4653d58.

Cheers,
Robin

from llvm.

pupiles avatar pupiles commented on September 22, 2024

@mewmew
Thanks for your reply,
For the sret parameter attribute, it indicates the return value of the function,so i think it is the important for Data flow analysis,It may not be a good decision to remove directly.If I don’t care about the explicit type, is there any other solution?
For the align \d+ attribute, it only indicates the specified alignment, so it can be remove.
Someone may not care about strict llvm ir. Is it feasible to provide an option switch when lexical parsing encounters align \d+ ambiguity just to ignore them instead of reporting errors?

from llvm.

mewmew avatar mewmew commented on September 22, 2024

Thanks for your reply,

You are most welcome :)

For the sret parameter attribute, it indicates the return value of the function,so i think it is the important for Data flow analysis,It may not be a good decision to remove directly.If I don’t care about the explicit type, is there any other solution?

The grammar of LLVM 11.0 supported implicit sret, but for LLVM 13.0, an explicit type is required. This is true also for the official LLVM distribution.

Someone may not care about strict llvm ir. Is it feasible to provide an option switch when lexical parsing encounters align \d+ ambiguity just to ignore them instead of reporting errors?

That's a good idea. I'm not sure if it is possible, but definitely worth investigating.

Would you care to take a look @pupiles?

The generated lexer and parser are in llir/ll, and the grammar is at llir/grammar. The tool used to generate the lexer and parser is Textmapper. There are some documentation for Textmapper at https://textmapper.org/

Cheers,
Robin

from llvm.

dannypsnl avatar dannypsnl commented on September 22, 2024

It seems there are two primary issues with the LLVM IR file that causes the parsing to fail.

Firstly, the sret parameter attributes (without explicit type) are valid for LLVM 11.0, but not for LLVM 13.0 (see parseRequiredTypeAttr of the official LLVM source code). In LLVM 13.0, an explicit type is needed, e.g.

sret(i8)

This was verified by trying to parse the original gdpr_handler.ll file using opt -S -o foo_13.ll gdpr_handler.ll using opt from LLVM 13.0.

A work-around is simply to remove sret from the input LLVM IR file.

Secondly, there is a known issue with llir/llvm where it is unable to parse align attributes. This is due to a LR(1) shift/reduce ambiguity in the original LLVM IR grammar (as described in #40).

If we remove align and sret attributes, then llir/llvm is able to parse the output produced by opt -S foo_13.ll gdpr_handler.ll using LLVM 13.0, when using the llvm13 branch of llir/llvm. Note, support for the DIFlagExportSymbols enum was added in 4653d58.

Cheers, Robin

Maybe off-topic, but perhaps we take asm parser source code from llvm source code, compile and link with our Go code? The problem I can see is

  1. Have to check license
  2. We better port this back to the old version of different llvm mapping
  3. We need to convert c/c++ struct back to our Go struct

The benefit I can see is

  1. we get the same behavior as official one
  2. we don't maintain a parser
  3. LR(1) is not enough for IR
  4. more accurate errors

from llvm.

dannypsnl avatar dannypsnl commented on September 22, 2024

@mewmew since 13 just get supported, would this get solved?

from llvm.

mewmew avatar mewmew commented on September 22, 2024

Given that the llvm13 branch has been merged into master, the work-around mentioned in #212 (comment) should be enough to parse the LLVM IR example source.

The align ambiguity still remain, but this issue is already tracked by #40. So we can safely close this issue.

Cheers,
Robin

P.S. feel free to re-open this issue or a new one if there is a parse error related to LLVM 13.0 or LLVM 14.0.

from llvm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.