Giter Club home page Giter Club logo

xmljson's Introduction

xmljson

This library is not actively maintained. Alternatives are xmltodict and untangle. Use only if you need to parse using specific XML to JSON conventions.

xmljson converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.

About

XML can be converted to a data structure (such as JSON) and back. For example:

<employees>
    <person>
        <name value="Alice"/>
    </person>
    <person>
        <name value="Bob"/>
    </person>
</employees>

can be converted into this data structure (which also a valid JSON object):

{
    "employees": [{
        "person": {
            "name": {
                "@value": "Alice"
            }
        }
    }, {
        "person": {
            "name": {
                "@value": "Bob"
            }
        }
    }]
}

This uses the BadgerFish convention that prefixes attributes with @. The conventions supported by this library are:

  • Abdera: Use "attributes" for attributes, "children" for nodes
  • BadgerFish: Use "$" for text content, @ to prefix attributes
  • Cobra: Use "attributes" for sorted attributes (even when empty), "children" for nodes, values are strings
  • GData: Use "$t" for text content, attributes added as-is
  • Parker: Use tail nodes for text content, ignore attributes
  • Yahoo Use "content" for text content, attributes added as-is

Convert data to XML

To convert from a data structure to XML using the BadgerFish convention:

>>> from xmljson import badgerfish as bf
>>> bf.etree({'p': {'@id': 'main', '$': 'Hello', 'b': 'bold'}})

This returns an array of etree.Element structures. In this case, the result is identical to:

>>> from xml.etree.ElementTree import fromstring
>>> [fromstring('<p id="main">Hello<b>bold</b></p>')]

The result can be inserted into any existing root etree.Element:

>>> from xml.etree.ElementTree import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('root'))
>>> tostring(result)
'<root><p id="main"/></root>'

This includes lxml.html as well:

>>> from lxml.html import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('html'))
>>> tostring(result, doctype='<!DOCTYPE html>')
'<!DOCTYPE html>\n<html><p id="main"></p></html>'

For ease of use, strings are treated as node text. For example, both the following are the same:

>>> bf.etree({'p': {'$': 'paragraph text'}})
>>> bf.etree({'p': 'paragraph text'})

By default, non-string values are converted to strings using Python's str, except for booleans -- which are converted into true and false (lower case). Override this behaviour using xml_fromstring:

>>> tostring(bf.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>true</y><x>1.23</x></root>'
>>> from xmljson import BadgerFish              # import the class
>>> bf_str = BadgerFish(xml_tostring=str)       # convert using str()
>>> tostring(bf_str.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>True</y><x>1.23</x></root>'

If the data contains invalid XML keys, these can be dropped via invalid_tags='drop' in the constructor:

>>> bf_drop = BadgerFish(invalid_tags='drop')
>>> data = bf_drop.etree({'$': '1', 'x': '1'}, root=Element('root'))    # Drops invalid <$> tag
>>> tostring(data)
'<root>1<x>1</x></root>'

Convert XML to data

To convert from XML to a data structure using the BadgerFish convention:

>>> bf.data(fromstring('<p id="main">Hello<b>bold</b></p>'))
{"p": {"$": "Hello", "@id": "main", "b": {"$": "bold"}}}

To convert this to JSON, use:

>>> from json import dumps
>>> dumps(bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')))
'{"p": {"b": {"$": "bold"}, "@id": "main", "$": "Hello"}}'

To preserve the order of attributes and children, specify the dict_type as OrderedDict (or any other dictionary-like type) in the constructor:

>>> from collections import OrderedDict
>>> from xmljson import BadgerFish              # import the class
>>> bf = BadgerFish(dict_type=OrderedDict)      # pick dict class

By default, values are parsed into boolean, int or float where possible (except in the Yahoo method). Override this behaviour using xml_fromstring:

>>> dumps(bf.data(fromstring('<x>1</x>')))
'{"x": {"$": 1}}'
>>> bf_str = BadgerFish(xml_fromstring=False)   # Keep XML values as strings
>>> dumps(bf_str.data(fromstring('<x>1</x>')))
'{"x": {"$": "1"}}'
>>> bf_str = BadgerFish(xml_fromstring=repr)    # Custom string parser
'{"x": {"$": "\'1\'"}}'

xml_fromstring can be any custom function that takes a string and returns a value. In the example below, only the integer 1 is converted to an integer. Everything else is retained as a float:

>>> def convert_only_int(val):
...     return int(val) if val.isdigit() else val
>>> bf_int = BadgerFish(xml_fromstring=convert_only_int)
>>> dumps(bf_int.data(fromstring('<p><x>1</x><y>2.5</y><z>NaN</z></p>')))
'{"p": {"x": {"$": 1}, "y": {"$": "2.5"}, "z": {"$": "NaN"}}}'

Conventions

To use a different conversion method, replace BadgerFish with one of the other classes. Currently, these are supported:

>>> from xmljson import abdera          # == xmljson.Abdera()
>>> from xmljson import badgerfish      # == xmljson.BadgerFish()
>>> from xmljson import cobra           # == xmljson.Cobra()
>>> from xmljson import gdata           # == xmljson.GData()
>>> from xmljson import parker          # == xmljson.Parker()
>>> from xmljson import yahoo           # == xmljson.Yahoo()

Options

Conventions may support additional options.

The Parker convention absorbs the root element by default. parker.data(preserve_root=True) preserves the root instance:

>>> from xmljson import parker, Parker
>>> from xml.etree.ElementTree import fromstring
>>> from json import dumps
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>')))
'{"a": 1, "b": 2}'
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>'), preserve_root=True))
'{"x": {"a": 1, "b": 2}}'

Installation

This is a pure-Python package built for Python 2.7+ and Python 3.0+. To set up:

pip install xmljson

Simple CLI utility

After installation, you can benefit from using this package as simple CLI utility. By now only XML to JSON conversion supported. Example:

$ python -m xmljson -h
usage: xmljson [-h] [-o OUT_FILE]
            [-d {abdera,badgerfish,cobra,gdata,parker,xmldata,yahoo}]
            [in_file]

positional arguments:
in_file               defaults to stdin

optional arguments:
-h, --help            show this help message and exit
-o OUT_FILE, --out_file OUT_FILE
                        defaults to stdout
-d {abdera,badgerfish,...}, --dialect {...}
                        defaults to parker

$ python -m xmljson -d parker tests/mydata.xml
{
  "foo": "spam",
  "bar": 42
}

This is a typical UNIX filter program: it reads file (or stdin), processes it in some way (convert XML to JSON in this case), then prints it to stdout (or file). Example with pipe:

$ some-xml-producer | python -m xmljson | some-json-processor

There is also pip's console_script entry-point, you can call this utility as xml2json:

$ xml2json -d abdera mydata.xml

Roadmap

  • Test cases for Unicode
  • Support for namespaces and namespace prefixes
  • Support XML comments

xmljson's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xmljson's Issues

Unable to specify encoding

test = tostring(test01, encoding='utf-8')

encoding='utf-8' is invalid

def tostring(element, encoding=None, method=None, *, short_empty_elements=True):
stream = io.StringIO() if encoding == 'unicode' else io.BytesIO()
ElementTree(element).write(stream, encoding, method=method, short_empty_elements=short_empty_elements)
return stream.getvalue()

How to Convert Back to XML?

Sorry if I missed this, but how do I convert my json back to XML?

from xmljson import BadgerFish              # import the class
bf = BadgerFish(dict_type=dict)      # pick dict class
data = bf.data(fromstring(xml_string))
print(data)

This prints JSON.

Let's say I modify the JSON and want to dump it back to XML. Can I do that?

mapping to list of lists

Hi. Thanks for a great module.

I notice when one does a round-trip from a dictionary containing a list of lists, the result is a dictionary containing a list of strings. This is using the Parker convention.

from xml.etree.ElementTree import Element, tostring, fromstring
from xmljson import parker, Parker

xml_str = tostring(parker.etree({'A':[[1,2],[3,4]]}, root=Element('data')))
Parker(dict_type=dict).data(fromstring(xml_str))
{'A': ['[1, 2]', '[3, 4]']}

Parker convention is not implemented correctly

It seems like the way of how the library converts XML using Parker convention differs from how it's defined originally in this repository:
https://github.com/doekman/xml2json-xslt

Here is the comparison:

Input data

<root>
    <person>
        <age>12</age>
        <height>1.73</height>
    </person>
    <person>
        <age>12</age>
        <height>1.73</height>
    </person>
</root>

Using xmltodict

Output:

{
    "person": [
        {
            "age": 12,
            "height": 1.73
        },
        {
            "age": 12,
            "height": 1.73
        }
    ]
}

Using XSLT document from the original Google repository

Output:

{
    "root":[
        {
            "age":12,
            "height":1.73
        },
        {
            "age":12,
            "height":1.73
        }
    ]
}

Single element array

Hello,
great job with this python module !!

However, i came across a simple problem.
Say i wish to convert the following xml to json along yahoo convention

i get
{"commandes": {"commande": [{}, {}]}}
which is ok for me.

But if i convert the same xml with only one object

i get
{"commandes": {"commande": {}}}
which is Good but not the behavior I expect. I'd like to get the single element within an array as if several objects where present. That is to say :
{"commandes": {"commande": [{}]}}

Thanks

XML namespaces

Hi.

Does this module actually have namespaces support?

For example i have such XML file with custom namespace prefix v8msg:

<?xml version="1.0" encoding="UTF-8"?>
<v8msg:Message xmlns:v8msg="http://v8.1c.ru/messages">
  <v8msg:Header>
    <v8msg:ExchangePlan>МобТорговля</v8msg:ExchangePlan>
    <v8msg:To>Моб1</v8msg:To>
    <v8msg:From>ЦУ</v8msg:From>
    <v8msg:MessageNo>83</v8msg:MessageNo>
    <v8msg:ReceivedNo>0</v8msg:ReceivedNo>
  </v8msg:Header>
  <v8msg:Body>
    <CatalogObject.ТипыЦенНоменклатуры>
      <Ref>ec986086-5ba8-11e4-ae81-74f06d6aab21</Ref>
      <DeletionMark>false</DeletionMark>
      <Code>000000001</Code>
      <Description>Роздріб</Description>
    </CatalogObject.ТипыЦенНоменклатуры>
  </v8msg:Body>
</v8msg:Message>

After conversion XML - to - JSON and backwards i get this:

<?xml version="1.0" encoding="UTF-8"?>
<ns0:Message xmlns:ns0="http://v8.1c.ru/messages>
  <ns0:Header>
    <ns0:ExchangePlan>МобТорговля</v8msg:ExchangePlan>
    <ns0:To>Моб1</v8msg:To>
    <ns0:From>ЦУ</v8msg:From>
    <ns0:MessageNo>83</v8msg:MessageNo>
    <ns0:ReceivedNo>0</v8msg:ReceivedNo>
  </ns0:Header>
  <ns0:Body>
    <CatalogObject.ТипыЦенНоменклатуры>
      <Ref>ec986086-5ba8-11e4-ae81-74f06d6aab21</Ref>
      <DeletionMark>false</DeletionMark>
      <Code>000000001</Code>
      <Description>Роздріб</Description>
    </CatalogObject.ТипыЦенНоменклатуры>
  </ns0:Body>
</ns0:Message>

So is there any way to preserve real prefix for namespaces?

Node values (text) broken by sub-nodes

Hi,

I'm using xml2json to parse publications and have the following problem:

If the text in the abstract node contains XML-tags, the text following these seub-nodes is just ignored in the JSON output, so the xml-example:

     <Abstract>
          <AbstractText>As photosynthetic prokaryotes, cyanobacteria can directly convert CO<sub>2</sub> to organic compounds and grow rapidly using sunlight as the sole source of energy. The direct biosynthesis of chemicals from CO<sub>2</sub> and sunlight in cyanobacteria is therefore theoretically more attractive than using glucose as carbon source in heterotrophic bacteria. To date, more than 20 different target chemicals have been synthesized from CO<sub>2</sub> in cyanobacteria. However, the yield and productivity of the constructed strains is about 100-fold lower than what can be obtained using heterotrophic bacteria, and only a few products reached the gram level. The main bottleneck in optimizing cyanobacterial cell factories is the relative complexity of the metabolism of photoautotrophic bacteria. In heterotrophic bacteria, energy metabolism is integrated with the carbon metabolism, so that glucose can provide both energy and carbon for the synthesis of target chemicals. By contrast, the energy and carbon metabolism of cyanobacteria are separated. First, solar energy is converted into chemical energy and reducing power via the light reactions of photosynthesis. Subsequently, CO<sub>2</sub> is reduced to organic compounds using this chemical energy and reducing power. Finally, the reduced CO<sub>2</sub> provides the carbon source and chemical energy for the synthesis of target chemicals and cell growth. Consequently, the unique nature of the cyanobacterial energy and carbon metabolism determines the specific metabolic engineering strategies required for these organisms. In this chapter, we will describe the specific characteristics of cyanobacteria regarding their metabolism of carbon and energy, summarize and analyze the specific strategies for the production of chemicals in cyanobacteria, and propose metabolic engineering strategies which may be most suitable for cyanobacteria.</AbstractText>
        </Abstract>

is converted to JSON as:

('Abstract',
                                          OrderedDict([('AbstractText',
                                                        OrderedDict([('$',
                                                                      'As photosynthetic prokaryotes, cyanobacteria can directly convert CO'),
                                                                     ('sub',
                                                                      [OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)])])]))]))

Due to the sub-tags.

Is there a way to fix this problem?

Thanks!
Lea

"NAN" to invalid json

>>> from lxml import etree
>>> from xmljson import XMLData
>>> XMLData(attr_prefix=None, dict_type=dict).data(etree.XML('<ROOT test="NAN"/>'))
{'ROOT': {'test': nan}}

tail nodes not covered

when parsing a node, the 'tail' text is lost e.g. 'text2' in <a><span>text</span>text 2</a>, it would be nice if the library used the .tail attribute of lxml nodes. By the way, as a result of the lack of this feature I moved to the xmltodict library.

int in _fromstring method removing leading zero

If value starts with 000045535 will render 45535
should use json.loads(value) insted try int and float here

import json
@staticmethod
def _fromstring(value):
    '''Convert XML string value to None, boolean, int or float'''
    # NOTE: Is this even possible ?
    if value is None:
        return None

    # FIXME: In XML, booleans are either 0/false or 1/true (lower-case !)
    if value.lower() == 'true':
        return True
    elif value.lower() == 'false':
        return False

    # FIXME: Using int() or float() is eating whitespaces unintendedly here
    try:
        return json.loads(value)
    except ValueError:
        return value

  

typing issue

I'm having an issue where strings that appear to be numeric are being cast into integers.

For example:

<zip>01234</zip>

converts to:

"zip": 1234

This could be correctable with a simple test, shown below, or by passing an option to the constructor to bypass the _convert() method.

str(int('01234')) == '01234'

Is it possible to convert & into & not &amp; symbol?

I tried different parsers but when I convert JSON to XML and string contains special symbols such as &#xD; it converts to &amp;#xD;.

Is it possible to prevent this? I tried different parsers but couldn't overcome this issue.

append() argument must be xml.etree.ElementTree.Element, not lxml.etree._Element

Get the following error when trying to insert a Python dictionary to element tree.

  ...
    xml_data = xmljson.gdata.etree(robot.json, root=xml.etree.ElementTree.Element("root"))
  File "D:\Documents\Python310\lib\site-packages\xmljson\__init__.py", line 135, in etree
    result.append(elem)
TypeError: append() argument must be xml.etree.ElementTree.Element, not lxml.etree._Element

robot.json is formatted with GData format. The data in this particular case is

robot.json = {'link': {'name': 'part_2', '$t': {}}}

Spaces are trimmed

I have an XML string like this
<enable_itf>

<source_phrase> Seleccionar todo</source_phrase>

</enable_itf>

When I parse it the string I receive is "Seleccionar todo" instead of " Seleccionar todo"

Is there a way to preserve the blank spaces?

Abdera child nodes incorrectly placed in same dictionaries

I seemed to have encountered a bug with the Abdera conversion logic, specifically how it converts children:

for child in children:
            child_data = self.data(child)
            if (count[child.tag] == 1
                    and len(children_list) > 1
                    and isinstance(children_list[-1], dict)):
                # Merge keys to existing dictionary
                children_list[-1].update(child_data)
            else:
                # Add additional text
                children_list.append(self.data(child))

When given the following XML data:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Data
   version="9.0"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="comp.xsd">
    <Airport
      country="Samoa"
      city="Apia"
      name="Faleolo Intl"
      lat="-13.8296668231487"
      lon="-171.997166723013"
      alt="17.678M"
      ident="NSFA"
      >
      <Services>
    	 <Fuel
    		type="JETA"
    		availability="YES"/>
      </Services>
      <Tower
    	 lat="-13.8320958986878"
    	 lon="-171.998676359653"
    	 alt="0.0M">
      </Tower>
      <Runway
    	 lat="-13.8300792127848"
    	 lon="-172.008545994759"
    	 alt="17.678M"
    	 surface="ASPHALT"
    	 heading="89.3199996948242"
    	 length="2999.23M"
    	 width="45.11M"
    	 number="08"
    	 designator="NONE">
      </Runway>
    </Airport>
</Data>

I receive the following output from ab.data(xml_file):

{
    "Data": {
        "attributes": {
            "version": 9.0,
            "{http://www.w3.org/2001/XMLSchema-instance}noNamespaceSchemaLocation": "comp.xsd"
        },
        "children": [
            {
                "Airport": {
                    "attributes": {
                        "country": "Samoa",
                        "city": "Apia",
                        "name": "Faleolo Intl",
                        "lat": -13.8296668231487,
                        "lon": -171.997166723013,
                        "alt": "17.678M",
                        "ident": "NSFA"
                    },
                    "children": [
                        {
                            "Services": {
                                "Fuel": {
                                    "attributes": {
                                        "type": "JETA",
                                        "availability": "YES"
                                    }
                                }
                            }
                        },
                        {
                            "Tower": {
                                "attributes": {
                                    "lat": -13.8320958986878,
                                    "lon": -171.998676359653,
                                    "alt": "0.0M"
                                }
                            },
                            "Runway": {
                                "attributes": {
                                    "lat": -13.8300792127848,
                                    "lon": -172.008545994759,
                                    "alt": "17.678M",
                                    "surface": "ASPHALT",
                                    "heading": 89.3199996948242,
                                    "length": "2999.23M",
                                    "width": "45.11M",
                                    "number": 8,
                                    "designator": "NONE"
                                }
                            }
                        }
                    ]
                }
            }
        ]
    }
}

The first child node, Services, is placed into a separate dictionary, while the rest of the children are placed in the same dictionary. Because multiple tags with the same keys can be present in XML, all children should probably be placed in separate dictionaries.

I'm working on a PR for this, but if you happen to no what the problem is, let me know.

Parker convention

If my understanding is right, for parker convention

# xml structure as 
 <root><item>1</item><item>2</item><item>three</item></root>
# gives json
 {"item":["1","2","three"]} NOT ["1","2","three"]

right? (More details)
However, in the current implementation

parker.data(fromstring('<root><item>1</item><item>2</item><item>three</item></root>'))
# gives
{'root': {'item': [{}, {}, {}]}}

Gives empty sets.

Given JSON example does not convert to XML correctly

The code runs as:

>>> temp = {'employees': [{'person': {'name': {'@value': 'Alice'}}}, {'person': {'name': {'@value': 'Bob'}}}]}

>>> tostring(bf.etree(temp, root=Element('root')))
'<root><employees><person><name value="Alice" /></person></employees><employees><person><name value="Bob" /></person></employees></root>'

While the example says it should convert to:

'<root><employees><person><name value="Alice" /></person><person><name value="Bob" /></person></employees></root>'

Converting this using tostring(bf.etree...) results in the correct XML:

{
  "employees": {
    "person": [
      {
        "name": {
          "@value": "Alice"
        }
      },
      {
        "name": {
          "@value": "Bob"
        }
      }
    ]
  }
}

First example in README does not match xmljson behavior version 0.2.1

The README offers the following sample XML input:

<employees>
    <person>
        <name value="Alice"/>
    </person>
    <person>
        <name value="Bob"/>
    </person>
</employees>

and JSON output:

{
    "employees": [{
        "person": {
            "name": {
                "@value": "Alice"
            }
        }
    }, {
        "person": {
            "name": {
                "@value": "Bob"
            }
        }
    }]
}

But I cannot reproduce that JSON output with version 0.2.1 (the latest), not with any of the -d options ('abdera', 'badgerfish', 'cobra', 'gdata', 'parker', 'xmldata', 'yahoo'). The option that yields the most similar output seems to be badgerfish, that produces the following list tagged "person" and containing objects that have the person data, which is a very different structure than what's shown in the README:

{
  "employees": {
    "person": [
      {
        "name": {
          "@value": "Alice"
        }
      },
      {
        "name": {
          "@value": "Bob"
        }
      }
    ]
  }
}

I didn't submit a PR correcting this because I don't know the real intent. I think the actual output is pretty confusing. Naively I expected minimal changes, with a list of "person" objects. That is arguably poor JSON (repeating the name every time) but it faithfully preserves the ugly XML input.

When the very first example is wrong, that makes the library fairly difficult to use or trust. Please reply, thanks.

Float silent NaN/Inf conversion

Hi.
Python float function silently convert string like "NAN" or "Infinity" to nan or inf.
I would like to keep those value as string. If I fork your project and add a function parameter (as strick_number for instance) would you be interested ?
By the way, regarding the JSON specification true NaN or Inf should be converted to Null, I can also do that if you wish (strick_json?).

Grégory

Remove prefiexes

I am aware that the default prefix is "@". but can we remove that?

XML namespaces getting injected into attribute names - parker covention

Hi, after reading issues #1 , #8 and PR #9 I think I have a different issue.

Essentially this:

<EML xmlns="urn:oasis:names:tc:evs:schema:eml" >
  <TransactionId>B313CBA4-BCA9-4B8F-8C50-BADE9423A020</TransactionId>
</EML>

becomes this:

{
  "{urn:oasis:names:tc:evs:schema:eml}TransactionId": "B313CBA4-BCA9-4B8F-8C50-BADE9423A020"
}

where as I was actually expecting this:

{
  "TransactionId": "B313CBA4-BCA9-4B8F-8C50-BADE9423A020"
}

Am I missing something?

Arrangement of children not preserved

How do you preserve the arrangement of children with same name?

XML To JSON:
Code:

dumps(badgerfish.data(eTree.fromstring('<alice><david>edgar</david><bob>charlie</bob><david>edgar</david></alice>')))

Output:

{"alice": {"david": [{"$": "edgar"}, {"$": "edgar"}], "bob": {"$": "charlie"}}}

Back to XML:
Code:

eTree.tostring(badgerfish.etree(loads({"alice": {"david": [{"$": "edgar"}, {"$": "edgar"}], "bob": {"$": "charlie"}}}))[0]).decode('utf-8')

Output:

<alice><david>edgar</david><david>edgar</david><bob>charlie</bob></alice>

unclosed token: line 191784

Hi!

I am getting an error when parsing a file:

$ xml2json -o test.json lod1_33358-5804_geb.gml

Traceback (most recent call last):
  File "/usr/local/bin/xml2json", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/xmljson/__main__.py", line 39, in main
    json.dump(dialect.data(parse(in_file).getroot()), out_file, indent=2)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: unclosed token: line 191784, column 8

The suspected Line is this(the middle one):

<gen:stringAttribute name="MittlereTraufHoehe">
        <gen:value>40.65</gen:value>
</gen:stringAttribute>

The other 16810 files from the Provider are parsed without errors.

Add (Apache) Abdera and (Cisco) Cobra convention

There is at least one other convention for handling XML attributes in JSON, which is also in use by Cisco for the ACI product.

<employees>
    <person>
        <name value="Alice"/>
    </person>
    <person>
        <name value="Bob"/>
    </person>
</employees>

becomes

{ "employees": [
    { "person": {
        "name": {
          "attributes": {"value": "Alice"}
        }
    } },
    { "person": {
        "name": {
          "attributes": {"value": "Alice"}
        }
    } }
] }

It would be nice if this could be supported as well.

You can find more information related to this at: http://wiki.open311.org/JSON_and_XML_Conversion/

ImportError: cannot import name 'badgerfish'

Traceback (most recent call last):
  File "D:\Codes\TPN\xmljson.py", line 1, in
    from xmljson import badgerfish as bf
  File "D:\Codes\TPN\xmljson.py", line 1, in
    from xmljson import badgerfish as bf
ImportError: cannot import name 'badgerfish'

I am using Python 3.6

None becomes "None"

return self._fromstring(root.text)

This line should be:

            if root.text is None:
                return ''
            else:
                return self._fromstring(root.text)

Not good:
<empty/>{"empty": null}<empty>None</empty>

Better:
<empty/>{"empty": null}<empty></empty>

travis failing for tests

https://travis-ci.org/sanand0/xmljson/jobs/80951823


======================================================================
ERROR: test_etree (tests.test_xmljson.TestBadgerFish)
BadgerFish conversion from data to etree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 77, in test_etree
    eq({'x': 'a'}, '<x><a/></x>')
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 42, in assertEqual
    self.assertEqual(decode(tostring(left)), ''.join(right))
  File "lxml.etree.pyx", line 3236, in lxml.etree.tostring (src/lxml/lxml.etree.c:71960)
TypeError: Type 'Element' cannot be serialized.
======================================================================
ERROR: test_etree (tests.test_xmljson.TestGData)
GData conversion from etree to data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 232, in test_etree
    eq({'x': 'a'}, '<x><a/></x>')
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 42, in assertEqual
    self.assertEqual(decode(tostring(left)), ''.join(right))
  File "lxml.etree.pyx", line 3236, in lxml.etree.tostring (src/lxml/lxml.etree.c:71960)
TypeError: Type 'Element' cannot be serialized.
======================================================================
ERROR: test_etree (tests.test_xmljson.TestParker)
Parker conversion from data to etree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 151, in test_etree
    eq({'x': 'a'}, '<x>a</x>')
  File "/home/travis/build/sanand0/xmljson/tests/test_xmljson.py", line 42, in assertEqual
    self.assertEqual(decode(tostring(left)), ''.join(right))
  File "lxml.etree.pyx", line 3236, in lxml.etree.tostring (src/lxml/lxml.etree.c:71960)
TypeError: Type 'Element' cannot be serialized.
----------------------------------------------------------------------
Ran 15 tests in 0.010s
FAILED (errors=3)

Data mismatch

Look at this example:

    from xml.etree.ElementTree import fromstring
    import xmljson, json
    bf=xmljson.BadgerFish(dict_type=xmljson.OrderedDict)
    q=bf.data(fromstring('<a p="1">x<b r="2">y</b>z</a>'))
    print json.dumps(q,indent=2) # note this item ^ (z)!

Output will be:

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        }
      }
    }

Where is z value?

Tested with

  • Python 2.7.12 (default, Nov 19 2016, 06:48:10)
  • IPython 2.4.1 -- An enhanced Interactive Python
  • Ubuntu 16.04.1 LTS (4.4.0-57-generic) x86_64

The xmljson was installed via pip.

I'd expect something like

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        },
        "$$": "z"
      }
    }

gdata to json to xml

Hi, In one of your stackoverflow answers referencing this library. You mention that:

It's possible to convert from XML to JSON and from JSON to XML using the same conventions

But how to best do that is a bit non-obvious to me. I'm dumping a gdata object to json using this library, but how do I convert back from json to an xml string. I assume I json.loads into an ordered dict ? Is there a method that will then convert that back into an xml string I can feed atom.core.parse() or the like ?

My python's a bit rusty so this might be kind of a dumb question. As an aside, you should setup Gratipay or Paetron, this is a pretty useful library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.