beevik / etree Goto Github PK
View Code? Open in Web Editor NEWparse and generate XML easily in go
License: BSD 2-Clause "Simplified" License
parse and generate XML easily in go
License: BSD 2-Clause "Simplified" License
xml file:
.....
then Error:
panic: xml: encoding "ISO-8859-1" declared but Decoder.CharsetReader is nil
my code is:
doc := etree.NewDocument()
if err := doc.ReadFromFile("./test.xml"); err != nil {
panic(err)
}
Normally, I would do an Xpath query like this one:
//*[contains(concat(' ', normalize-space(@class), ' '), ' LookForClass ') and text()='TheTextInTheClass']/../..
How should I do this?
In the docs I read:
XPath-like path string. Panics if an invalid path string is supplied.
I get : etree: path has invalid filter [brackets].
So after trying and trying.... This is what I found out:
//*[@class='LookForClass'][text()='TheTextInTheClass']/../..
This perhaps can help others in an example
I need a way to add an etree Element under another etree Element.
Trying to explain in code:
doc := etree.NewDocument()
doc.ReadFromFile("bookstore.xml")
root := doc.SelectElement("bookstore")
Now the root
is an etree Element under which are a bunch of <book>
XML Elements.
Suppose now I have
docMore.ReadFromString(xmlMoreBooks)
The question is how can I add docMore
as new entries under the root
etree Element?
I think such feature would be needed by others as well. Please consider adding it.
Thanks
Is something like this possible:
for _, e := range doc.FindElements("./bookstore/book[1]/*") {
fmt.Printf("%s: %s\n", e.Tag, e.Content())
}
Which would show the content of the given search
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<p:price>30.00</p:price>
When I do e.Text() I only get the text value, but I need it all.
need Addchild and Insert
When '/' is not used as a path separator in a path, like in the example below, etree's xpath compilation will fail.
//http[@internet='web']//url[@pattern='/web/app' ]
This is due to line 192 in path.go
for _, s := range strings.Split(path, "/") {
Any suggestions for a fix?
need getnext and getprovi func when we insert some element to it
Hi,
I'm currently facing an issue where I can't explain the etree behaviour. Following code demonstrates the issue I am facing. I want to parse an HTML string as illustrated below, change the attribute of an element and reprint the HTML when done.
string = "<p><center><code>git clone https://github.com/AlexeyAB/darknet.git</code></center></p>"
test = etree.fromstring(string, parser)
print(etree.tostring(test, pretty_print=True, method="html"))
I get this output:
<html><body>
<p></p>
<center><code>git clone https://github.com/AlexeyAB/darknet.git</code></center>
</body></html>
As you can see (let's ignore the <html>
and <body>
tags etree adds), the order of the nodes has been changed! The <p>
tag that used to wrap the <center>
tag, now loses its content, and that content gets added after the </p>
tag closes. Eh?
When I omit the <center>
tag, all of a sudden the parsing is done right:
string = "<p><code>git clone https://github.com/AlexeyAB/darknet.git</code></p>"
test = etree.fromstring(string, parser)
print(etree.tostring(test, pretty_print=True, method="html"))
With output:
<html><body><p><code>git clone https://github.com/AlexeyAB/darknet.git</code></p></body></html>
Am I doing something wrong here? I have to use the HTML parser because I get a lot of parsing errors when not using it. I also can't change the order of the <p>
and <center>
tags, as I read them this way.
I checked https://godoc.org/github.com/beevik/etree#Path and it seems the (X)Path OR operator for different nodes searching is not there.
Please consider providing that feature, as it'd be very useful, but not too difficult to do.
To recap, from above stackoverflow Q:
The XPath
OR operator of
//bookstore/book/title or //bookstore/city/zipcode/title
is expressed as:
//bookstore/book/title|//bookstore/city/zipcode/title
Thanks!
Currently, when reading XML files with CRLF line endings, these will be converted to LF when writing the XML back to disk. How could I force a different line ending? The software that uses those XML files needs CRLF line endings (it expects line breaks in text blocks with CRLF and nothing else).
Given an XML like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<library>
<!-- Great book. -->
<book id="b0836217462" available="true">
<isbn>0836217462</isbn>
<title lang="en">Being a Dog Is a Full-Time Job</title>
<quote>I'd dog paddle the deepest ocean.</quote>
<author id="CMS">
<name>Charles M Schulz</name>
<born>1922-11-26</born>
<dead>2000-02-12</dead>
</author>
<character id="PP">
<name>Peppermint Patty</name>
<born>1966-08-22</born>
<qualification>bold, brash and tomboyish</qualification>
</character>
<character id="Snoopy">
<name>Snoopy</name>
<born>1950-10-04</born>
<qualification>extroverted</qualification>
<qualification>beagle</qualification>
</character>
</book>
</library>
A query like library/book/character[qualification='beagle']/qualification
would return all qualification
elements of character
for every character with a qualification='beagle'
. It'd be good to allow text()
XPath queries so that a query like library/book/character/qualification[text()='beagle']
only returns the nodes of type qualification
which text is beagle
.
Deprecated: InsertChild is deprecated. Use InsertChildAt instead.
Please don't deprecate InsertChild()
because InsertChildAt
won't work for my case --
The xml file that I'm working on has a rigid format of where things are:
<A attr=... >
<B attr=... />
<C attr=... />
<D attr=... />
</A>
B
comes before C
which comes before D
. I know the order doesn't matter to xml, but I'm tracking the file with version control so, I'd prefer as little change as possible.
Whether I do doc.InsertChildAt(0, c)
or doc.InsertChildAt(1, c)
, C
will always be inserted before B
; whereas I need it after B
but before D
(after I've remove C
beforehand).
Was I using InsertChildAt
incorrectly, or InsertChild()
is just not replaceable for my case? Thx.
func (e *Element) Text() string {
if len(e.Child) == 0 {
return ""
}
text := ""
for _, ch := range e.Child {
if cd, ok := ch.(*CharData); ok {
if text == "" {
text = cd.Data
} else {
text += cd.Data
}
} else {
break
}
}
return text
}
when I used this function to get charData in a tag,a problem happend: an element has two child,and if the first is not CharData,it may not check the second.
This is not like etree.
Nice work with this package!
I have a question regarding this scenario:
doc.FindElement("//This/Element/Does/Not/Exists")
Is there a way to check that this path actually exists? Currently I get:
--- FAIL: TestXMLResp (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x6618b4]
Consider the following document partial (source: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=8f900e8b-d1eb-4834-bd26-f04bd623c3d2 , Appendix I.1)
<?xml version="1.0" ?>
<ADI>
<Metadata>
<AMS Provider="InDemand" Product="First-Run" Asset_Name="The_Titanic" Version_Major="1" Version_Minor="0" Description="The Titanic asset package" Creation_Date="2002-01-11" Provider_ID="indemand.com" Asset_ID="UNVA2001081701004000" Asset_Class="package"/>
<App_Data App="MOD" Name="Provider_Content_Tier" Value="InDemand1" />
<App_Data App="MOD" Name="Metadata_Spec_Version" Value="CableLabsVod1.1" />
</Metadata>
</ADI>
While i can use a Path like //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier']
to get to a desired Element, I am not able to perform an xpath-style path search to extract just the data in the Value attribute for the identified elements as a []string. Most other XPath implementations support a path such as //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier']/@Value
to extract attribute values directly from the Path.
This would be a really great feature to have to allow us to port a legacy app over to Go, without having to refactor our existing paths that perform the attribute extractions.
I'll take a stab at implementing in the coming days.
Given the following xml document
<root>
<a><b><c>...</c></b></a>
<a><b>...</b></a>
<a><b>...</b></a>
<a><b>...</b></a>
</root>
It seems like there is currently no way to specify an Xpath expression so that only the nodes get selected because they have a grandchild somewhere. I'm looking for support for something like this
//a[.//c]
or alternatively //a[b/c]
.
Currently, such Xpath expressions fail with etree: path has invalid filter [brackets].
In order to adhere to some stupid API, I have to provide some XML with a blank space where an element will be placed.
Is it possible to remove an element and leave its whitespace in the document? I've tried removing an element and then inserting etree.NewElement("")
but that inserts </>
.
If there's a way I can do it using this library, awesome, if not I'll have to just manipulate it as a text file
This is the HTML (a piece of material design light)
<body>
<!-- Wide card with share menu button -->
<div class="demo-card-wide mdl-card mdl-shadow--2dp">
<div class="mdl-card__title">
<h2 class="mdl-card__title-text">Welcome</h2>
</div>
<div class="mdl-card__supporting-text">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Mauris sagittis pellentesque lacus eleifend lacinia...
</div>
<div class="mdl-card__actions mdl-card--border">
<a class="mdl-button mdl-button--colored mdl-js-button mdl-js-ripple-effect">
Get Started
</a>
</div>
<div class="mdl-card__menu">
<button class="mdl-button mdl-button--icon mdl-js-button mdl-js-ripple-effect">
<i class="material-icons">share</i>
</button>
</div>
</div>
<!-- Wide card with share menu button -->
<div class="demo-card-wide mdl-card mdl-shadow--2dp">
<div class="mdl-card__title">
<h2 class="mdl-card__title-text">Welcome</h2>
</div>
<div class="mdl-card__supporting-text">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Mauris sagittis pellentesque lacus eleifend lacinia...
</div>
<div class="mdl-card__actions mdl-card--border">
<a class="mdl-button mdl-button--colored mdl-js-button mdl-js-ripple-effect">
Get Started
</a>
</div>
<div class="mdl-card__menu">
<button class="mdl-button mdl-button--icon mdl-js-button mdl-js-ripple-effect">
<i class="material-icons">share</i>
</button>
</div>
</div>
</body>
and I want to check the number of "cards" on the page.
I can do that with the following xpath
//*[contains(concat(' ', normalize-space(@class), ' '), ' mdl-card ')]
(I've tested this with https://www.freeformatter.com/xpath-tester.html)
It works... and shows the 2 elements, and I can see in the html it has 2.
And now for my code: (memHtml holds the HTML in memory...)
docLoc := etree.NewDocument()
if err := docLoc.ReadFromString(memHtml); err != nil {
panic(err)
}
counter := len(docLoc.FindElements(element))
fmt.Println("counter", counter)
And it prints 0 (zero)
btw:
I also did: (the same xpath with "(" ")" around it. Also valid xpath
(//*[contains(concat(' ', normalize-space(@class), ' '), ' mdl-card ')])
But then, etree complains something about brackets.
Currently, SelectElement doesn't return anything indicating an error while trying to select an element in the XML string.
Trying to do further operations on the returned node causes the program to Panic with SEGFAULT.
For example:
Consider the following XML:
data := "
vmx
"
Now, if try to run the following code:
xmlDoc := etree.NewDocument()
XMLReadError := xmlDoc.ReadFromString(capabilitiesXML)
if XMLReadError != nil {
log.Printf("Unable to read the Capabilities XML: %s", XMLReadError)
return
}
root := xmlDoc.SelectElement("host")
cpu := root.SelectElement("cpu")
The above code causes the program to panic with a SEGFAULT
Returning an error if the concerned element is not found seems to be a better approach to avoid these kinds of issues or this behavior can be documented that a nil is returned in case the element is not found and hence this needs to be checked.
About the TestAddChild(),
Line 316 to 318 is,
testdoc := `<book lang="en">
<t:title>Great Expectations</t:title>
<author>Charles Dickens</author>
`
I.e., the <t:title>
and <author>
are not enclosed within <book>
but parallel to it. Right? (Ref: L333)
If so, the FindElements("//book/*")
should return empty right? I.e., the root.AddChild(e)
result seems incorrect to me.
What I'm missing? Thx.
The test suite currently uses Fail instead of Errorf (or similar). This is problematic as test failures provide no indication on what went wrong.
I.e. currently a test failure looks like this:
$ go test .
--- FAIL: TestDocument (0.00 seconds)
FAIL
FAIL github.com/felixge/etree 0.017s
When using Errorf
, it would look like this:
$ go test .
--- FAIL: TestDocument (0.00 seconds)
etree_test.go:58: custom error message printed here
FAIL
FAIL github.com/felixge/etree 0.018s
I'd be happy to submit a patch that replaces all occurrences of Fail()
with more sensible error reporting - let me know.
The reverse side of question #28, is it possible to select those specific nodes who has no child nodes? Thx.
In etree Attr.NamespaceURI
, always returns the containing element's namespace. This is wrong for two reasons:
Unprefixed attributes get no namespace assigned. This is different from elements. See XML Names 6.2:
The namespace name for an unprefixed attribute name always has no value.
For prefixed attributes, the prefix should be resolved into an URI.
Using a shortend etree test case:
<root xmlns="http://root.example.com"
xmlns:attrib="http://attrib.example.com"
a="foo"
attrib:b="bar" />
Let's use xmlstarlet
, which is just a neat libxml2 CLI frontend.
$ xmlstarlet sel -N root=http://root.example.com -N attrib=http://attrib.example.com -N notattrib=http://attrib.example.com \
-t -m '/root:root' \
-o '@a[namespace-uri()=""]: ' -v '@a[namespace-uri()=""]' -nl \
-o '@root:a[namespace-uri()="http://root.example.com"]:' -v '@root:a[namespace-uri()="http://root.example.com"]' -nl \
-o '@b[namespace-uri()=""]:' -v '@b[namespace-uri()=""]' -nl \
-o '@root:b[namespace-uri()="http://root.example.com"]:' -v '@root:b[namespace-uri()="http://root.example.com"]' -nl \
-o '@attrib:b[namespace-uri()="http://attrib.example.com"]: ' -v '@attrib:b[namespace-uri()="http://attrib.example.com"]' -nl \
-o '@notattrib:b[namespace-uri()="http://attrib.example.com"]: ' -v '@notattrib:b[namespace-uri()="http://attrib.example.com"]' -nl \
test.xml
Which produces the following output:
@a[namespace-uri()=""]: foo
@root:a[namespace-uri()="http://root.example.com"]:
@b[namespace-uri()=""]:
@root:b[namespace-uri()="http://root.example.com"]:
@attrib:b[namespace-uri()="http://attrib.example.com"]: bar
@notattrib:b[namespace-uri()="http://attrib.example.com"]: bar
Note that without the test for namespace-uri()
all of the result except the last one would be correct in etree, since etree does not resolve prefixes for attribute and element selection (a different bug).
EDIT: messed up shell quoting.
Compare
package main
import (
"fmt"
"github.com/beevik/etree"
)
const xmlData = `<root xmlns:b='foo'><a /><b:b /><b:b xmlns:b='bar' /></root>`
func main() {
doc := etree.NewDocument()
doc.ReadFromString(xmlData)
fmt.Printf("%+v\n", doc.FindElements("//b"))
fmt.Printf("%+v\n", doc.FindElements("//b:b"))
}
which produces
$ go run showcase.go
[0xc0000b4240 0xc0000b42a0]
[0xc0000b4240 0xc0000b42a0]
to
import xml.etree.ElementTree as ET
import io
XML_DATA = "<root xmlns:b='foo'><a /><b:b /><b:b xmlns:b='bar' /></root>"
doc = ET.parse(io.StringIO(XML_DATA))
print(doc.findall('./b'))
# Fails, prefix b not defined
# print(doc.findall('.//b:b'))
print(doc.findall('./b:b', {'b': 'foo'}))
print(doc.findall('./b:b', {'b': 'bar'}))
# different prefix, still finds the same element!
print(doc.findall('./c:b', {'c': 'bar'}))
which results in
$ python showcase.py
$ python showcase.py
[]
[<Element '{foo}b' at 0x7f0701e43e90>]
[<Element '{bar}b' at 0x7f0701e43ef0>]
[<Element '{bar}b' at 0x7f0701e43ef0>]
Note that in the Go version, both queries return both elements that have b
as local name and prefixes are only compared as text string. The python version is correct regarding to namespaces since:
It would be nice if your etree package would offer similar features. Searching by prefix only is a blocker when receiving XML documents, where prefixes are unknown (Like the output of Go's XML Encoder that uses strange, but correct, prefix names and placement).
How would you search for an XML element by namespace at all?
As I understand, ReadFromFile stores the entire file's content in memory before we can do whatever we have to do with the nodes.
This is not ideal when dealing with very large files.
Is there currently a way to process the nodes as we walk through the file, therefore avoiding the need to store it all in memory at once? Hope that makes sense
Hi,
I have xml like:
<nodes>
<node>
<nodeID>2</nodeID>
<args>
<arg0>
<source>1</source>
</arg0>
<arg1>
<source>2</source>
</arg1>
</args>
</node>
</nodes>
I am trying to get "arg*" elements. Not sure how to phrase this. Awesome library by the way. I'm just having to deal with some bad xml at the moment.
doc := etree.NewDocument()
if err := doc.ReadFromFile(filename); err != nil {
panic(err)
}
nodes := doc.SelectElement("nodes")
for _, node := range nodes.SelectElements("node") {
args := node.SelectElement("args")
for i, arg := range args.FindElements("arg*") {
// blah
}
}
Thanks!
The doc.WriteTo
is adding an extra new line after BOM. I've illustrate it with et_dump.go and et_dump.xml, which you can find under https://github.com/suntong/lang/blob/master/lang/Go/src/xml/.
Here is the result:
$ go run et_dump.go | diff -wU 1 et_dump.xml -
--- et_dump.xml 2016-03-08 16:40:41.667010100 -0500
+++ - 2016-03-08 16:40:57.842603083 -0500
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?>
+
+<?xml version="1.0" encoding="utf-8"?>
<bookstore xmlns:p="urn:schemas-books-com:prices">
-
<book category="COOKING">
@@ -9,3 +9,2 @@
</book>
-
<book category="CHILDREN">
...
@@ -34,3 +31,2 @@
</book>
-
</bookstore>
\ No newline at end of file
I.e., an extra new line is added after BOM. This seems to be a trivial issue, but will cause my Microsoft Visual Studio failed to recognize the webtest file such dump creates. :-(
Please consider removing the added extra new line.
Thanks
When reading from file (via ReadFrom()
or ReadFromFile()
), is it possible to skip the BOM
(https://en.wikipedia.org/wiki/Byte_order_mark) char?
Every file created by MS under Windows has that witched char, which is very hard to get rid of.
So it'll be great that etree
can skip them when reading from file.
The following file will fail:
$ cat et_example.xml | hexdump -C
00000000 ff fe 3c 00 62 00 6f 00 6f 00 6b 00 73 00 74 00 |..<.b.o.o.k.s.t.|
00000010 6f 00 72 00 65 00 3e 00 0d 00 0a 00 20 00 3c 00 |o.r.e.>..... .<.|
...
with the following error
panic: XML syntax error on line 1: invalid UTF-8
Hmm, wait, is it because of BOM or the UTF16 encoding?
thx
When outputting xml, the indentation is not the only control necessary -- prefix string is necessary as well, for the xml chunks that are not starting at the root level. Ref: https://godoc.org/encoding/xml#MarshalIndent, which has two controls: prefix, indent string
.
Sometimes I have to deal with non-standard implementations and I don't always know what Namespaces are going to be in there. Is there a possibility of adding support of finding elements with a namespace of *?
In https://golang.org/pkg/encoding/xml/ there is support for a tag ",innerxml", where the field is written verbatim. Is there support in etree for something similar?
For e.g.
doc.CreateInnerXML("<users/>")
From https://golang.org/pkg/encoding/xml/#Marshal
The XML element for a struct contains marshaled elements for each of the exported fields of the struct, with these exceptions:
- the XMLName field, described above, is omitted.
- a field with tag "-" is omitted.
- a field with tag "name,attr" becomes an attribute with
the given name in the XML element.
- a field with tag ",attr" becomes an attribute with the
field name in the XML element.
- a field with tag ",chardata" is written as character data,
not as an XML element.
- a field with tag ",cdata" is written as character data
wrapped in one or more <![CDATA[ ... ]]> tags, not as an XML element.
- a field with tag ",innerxml" is written verbatim, not subject
to the usual marshaling procedure.
- a field with tag ",comment" is written as an XML comment, not
subject to the usual marshaling procedure. It must not contain
the "--" string within it.
- a field with a tag including the "omitempty" option is omitted
if the field value is empty. The empty values are false, 0, any
nil pointer or interface value, and any array, slice, map, or
string of length zero.
- an anonymous struct field is handled as if the fields of its
value were part of the outer struct.
XML structure:
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier opf:scheme="uuid" id="uuid_id"></dc:identifier>
<dc:title></dc:title>
<dc:creator opf:role="aut"></dc:creator>
<dc:language>eng</dc:language>
</metadata>
</package>
Go code:
doc.FindElement("/package/metadata/*[1]")).SetText("test") //fails
doc.FindElement("/package/metadata/dc:identifier").SetText("test") //fails
this is my code:
<if test="page < 0" >limit #{page}, #{size}</if>
xml can not load '<' , in element Property, it should be parsed
will panic: XML syntax error on line 30: unescaped < inside quoted string [recovered]
The valid character range for XML is specified as:
Char := #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
When producing XML using WriteTo
a set of strings.Replacer
objects in etree.go are used to escape strings.
However, these replacers fail to replace some characters that are not valid in XML (e.g. 0xB
vertical tab).
This leads to invalid XML being produced.
The go standard package xml
replaces invalid characters with \uFFFD
(unicode replacement character) which appears to be common practice across other XML libraries.
I propose using xml.EscapeText
instead of the current approach based on strings.NewReplacer(...)
.
Hi @beevik,
Can you give an example using RemoveChild()
please?
Also, I searched those _test
files, and seems that there is no test coverage for RemoveChild()
either.
thx
Hi, I have some code where I am using FindElements
to get a slice of response
elements, then calling FindElement
on those to get their children. I am getting the same result for each iteration of the loop, i.e. the href is always "/1" instead of "/1" for the first response, then "/2" for the next, etc.. Have I done something wrong or misunderstood the API? Or is this a bug?
package example
import (
"testing"
"fmt"
"github.com/beevik/etree"
)
func TestXML(t *testing.T) {
xml := `<multistatus xmlns="DAV:">
<response>
<href>/1</href>
<propstat>
<prop>
<getetag>A</getetag>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
<response>
<href>/2</href>
<propstat>
<prop>
<getetag>B</getetag>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
<response>
<href>/3</href>
<propstat>
<prop>
<getetag>C</getetag>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
<response>
<href>/4</href>
<propstat>
<prop>
<getetag>D</getetag>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
<response>
<href>/5</href>
<propstat>
<prop>
<getetag>E</getetag>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
</multistatus>`
doc := etree.NewDocument()
if err := doc.ReadFromString(xml); err != nil {
t.Fatalf("Failed to parse xml: %s", err.Error())
}
responseElements := doc.FindElements("//response")
if len(responseElements) != 5 {
t.Fatalf("Expected 5 response elements, got %d", len(responseElements))
}
for n, el := range responseElements {
hrefEl := el.FindElement("//href")
etagEl := el.FindElement("//getetag")
if hrefEl != nil && etagEl != nil {
href := hrefEl.Text()
expectedHref := fmt.Sprintf("/%d", 1+n)
if href != expectedHref {
t.Fatalf("Expected href %s, got %s", expectedHref, href)
}
etag := etagEl.Text()
expectedEtag := fmt.Sprintf("%c", 'A'+n)
if etag != expectedEtag {
t.Fatalf("Expected etag %s, got %s", expectedEtag, etag)
}
} else {
t.Fatalf("Missing href and/or etag")
}
}
}
Hi @beevik
After I done some tests I got same error presented in issue #48
Here is a sample of code + xml data :
var xmlDoc = `<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE broadWorksCDR>
<broadWorksCDR version="19.0">
<cdrData>
<headerModule>
<recordId>
<eventCounter>0002183384</eventCounter>
<systemId>CGRateSaabb</systemId>
<date>20160419210000.104</date>
<systemTimeZone>1+020000</systemTimeZone>
</recordId>
<type>Start</type>
</headerModule>
</cdrData>
</broadWorksCDR>
`
doc := etree.NewDocument()
if err := doc.ReadFromBytes([]byte(xmlDoc)); err != nil {
t.Error(err)
}
First I try to use doc.ReadFromString after doc.ReadFromBytes but got error both times.
Thanks,
TeoV
For instance the CardDAV prefix is urn:ietf:params:xml:ns:carddav
. This doesn't play well with spaceDecompose
:
Line 144 in 23e6ba8
The standard Go library is using a space character, probably because of this. Thoughts?
Given this document:
<?xml version="1.0" encoding="utf-8" ?>
<D:propfind xmlns:D="DAV:">
<D:prop xmlns:R="http://ns.example.com/boxschema/">
<R:bigbox/>
<R:author/>
<R:DingALing/>
<R:Random/>
<D:prop>
</D:propfind>
doc.Root().SelectElements("DAV:prop")
returns zero elementdoc.Root().SelectElements("D:prop")
returns one elementHowever the D
prefix is arbitrary, it could be set to any token really (e.g. xmlns:myawesomeprefix="DAV:"
). SelectElements
shouldn't care about it, it should resolve the prefix depending on xmlns
attributes.
Thanks a ton for this package - super useful for my work.
I'm parsing some RSS feeds that contain HTML contained in <!CDATA[ ... ]> tags with formatted HTML for post descriptions, content, etc. It looks like when the CDATA tag is preceded by a newline, the text can't be parsed out:
workingCDATAString := `
<rss>
<channel>
<item>
<summary><![CDATA[Sup]]></summary>
</item>
</channel>
</rss>
`
doc := etree.NewDocument()
doc.ReadFromString(workingCDATAString)
spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
// Output: (string) (len=3) "Sup"
brokenCDATAString := `
<rss>
<channel>
<item>
<summary>
<![CDATA[Sup]]>
</summary>
</item>
</channel>
</rss>
`
doc = etree.NewDocument()
doc.ReadFromString(brokenCDATAString)
spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
// Output: (string) (len=7) "\n\t\t\t \t\t"
I'm not familiar with XML parsing enough to say that this isn't the intended behavior, but I would expect these two code blocks to output the same thing ("Sup"). Any ideas?
Thanks for this library. Would you be interested in a patch that adds a Copy() method to Document / Element?
In the XMPP server I'm working on, one sometimes has to duplicate an incoming XML message and send out slightly modified versions of it.
I realize that I can implement Copy() in userland, but since it would probably be useful to others, I hope you're interested in a patch.
I'm creating a converter that modifies a bunch of exported XML files to be imported into a different software.
Currently, I'm using etree to read each file, add or change elements, then write it again to a different folder. But the destination software seems to need the <?xml...
header which is missing from the source. But CreateProcInst
would not prepend but append it.
How could I prepend the ProcInst?
Hi,
I have a need to retrieve all text tokens of an element and its descendant, i.e. a path looking something like //something/text()
.
Since Element.FindElementsPath()
and friends returns an Element
and not a Token
, it would alternatively be fine if I get a list of elements with non nil text.
I'm ready to make a PR to add this functionality, but I would like to know before how you would see it.
Element
to retrieve Tokens from a path or should we stick to the existing functions Element.FindElementsPath()
?//something/text()
syntax or a different one (since we are actually retrieving the parent of the text nodes) ? In this case, what would it be ?Thanks
Is that a bug Text() only return first child?
Below only show half of content.
<script type="text/javascript">
polymer.define('web.csrf', function (require) {
var token = "<t t-esc="request.csrf_token(None)"></t>";
require('web.core').csrf_token = token;
require('qweb').default_dict.csrf_token = token;
});
</script>
i want to modify the encoding from gb2312 to utf-8
but I only find the doc.CreateProcInst() method ,did this package can do it?
<?xml version="1.0" encoding="gb2312"?>
<configuration>
<section name="CommissionManagerConfig" requirePermission="false"/>
</configuration>
Any way to query an attribute with xpath?
Was porting some code from python, mixed content was difficult/impossible to handle - see PR #60
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.