greedo / python-xbrl Goto Github PK

xbrl parser written in Python :bulb:

Home Page: https://pypi.python.org/pypi/python-xbrl

License: Apache License 2.0

Python 100.00%

python-xbrl's Introduction

python-xbrl is a library for parsing xbrl documents providing output as both a basic model object and serialized objects thur marshmallow for rendering into standards formats like JSON or HTTP API

Installation

The easiest way to install python-xbrl is with pip

pip install python-xbrl

Or install the latest dev version from github (or replace @master with a release vergitsion like @v1.1.0)

pip install git+https://github.com/greedo/python-xbrl.git@master

git clone https://github.com/greedo/python-xbrl.git

Copy the python-xbrl directory into your python path. Zip here

Made sure your sys.path is correct.

Requirements

Python >= 2.6 or >= 3.3

python-xbrl relies on beautifulsoup4 which sits on top of the python XML parser lxml. It also requires marshmallow for serializing objects. For more details see requirements.txt

For PyPI support it is recommended you use https://github.com/amauryfa/lxml for lxml as this is a fork that uses cffi instead of the python capi.

Initialization

To start using the library, first import the XBRLParser

from xbrl import XBRLParser, GAAP, GAAPSerializer

Simple Parsing Workflow

First parse the incoming XRBL file into a new XBRL basic object

xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse(open("sam-20131228.xml"))

Then you can parse the document using different parsers

gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date="20131228", context="current", ignore_errors=0)

Now we have a GAAP model object that has the GAAP parsed elements from the document.

This model object supports the several different features including:

context current, year, and instant contexts are supported. If available you can also get previous quarter information by number of days from doc date. Example: 90, 180, etc.
Error handling. 0 raise exception for all parsing errors and halt parsing, 1 Supress all parsing errors and continue parsing, 2 Log all parsing errors and continue parsing

You can serialize the GAAP model object into a serialized object acceptable for rending into a standard format such as JSON or HTTP API.

serializer = GAAPSerializer()
result = serializer.dump(gaap_obj)

You can also just view the data in the serialized object

print result.data

You can apply various parsers to the base XBRLParser object to get different data than just GAAP data from the document. In addition as expected you can also create different serialized objects on the resulting parsed data object.

Extracting DEI Data

dei_obj = xbrl_parser.parseDEI(xbrl)
serializer = DEISerializer()
result = serializer.dump(dei_obj)

Extracting Custom Data

custom_obj = xbrl_parser.parseCustom(xbrl)
print custom_obj()

Testing

To run the unit tests, you need pytest

pip install pytest

Once you have that, cd into the root directory of this repo and

py.test --tb=line -vs

Bugs

For any bugs you encounter please open a Github issue

Contribute

Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
If you feel uncomfortable or uncertain about an issue or your changes, feel free to email @greedo and he will happily help you via email, Skype, remote pairing or whatever you are comfortable with.
Fork the repository on GitHub to start making your changes to the master branch (or branch off of it).
Write a test which shows that the bug was fixed or that the feature works as expected.
Send a pull request and bug the maintainer until it gets merged and published. :) Make sure to add yourself to AUTHORS.

License

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the license.

python-xbrl's People

Contributors

Stargazers

Watchers

python-xbrl's Issues

PEP8 standards

PEP8 Standards need to be applied for cleaner code

fix test

Unit test failed. I'm using python provided by Debian's system.
I fixed and going to send a patch as pull request. Can you review it ?

$ cat /etc/os-release  | grep VERSION
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster

$ which python3
/usr/bin/python3

$ apt-cache show python3 | grep -i version
Version: 3.7.3-1

Test Result

- yosuke@x250:~/Data/debian/python_xbrl$ pytest-3 
- ============================= test session starts ==============================
- platform linux -- Python 3.7.3, pytest-3.10.1, py-1.7.0, pluggy-0.8.0
- rootdir: /home/yosuke/Data/debian/python_xbrl, inifile:
- collected 15 items                                                             
- 
- python-xbrl/tests/test_parse.py F.FFFFFFFFFFFFF                          [100%]

context_ids

I have slightly changed xbrl.py to search for tags/values I am interested in. I am using aapl-20150627.xml, link
I was getting zeros for some tags even though they exist in the xbrl file. So I decided to print out context_ids and got

['eol_PE2035----1510-Q0008_STD_91_20150627_0', 'eol_PE2035----1510-Q0008_STD_0_20150627_0']

I then manually checked context id in the xbrl file for the values that I am getting zero and figured out that I need one more context id eol_PE2035----1510-Q0008_STD_273_20150627_0.
If I hard code it like

context_ids = ['eol_PE2035----1510-Q0008_STD_273_20150627_0',
                       'eol_PE2035----1510-Q0008_STD_91_20150627_0',
                       'eol_PE2035----1510-Q0008_STD_0_20150627_0']

It solves the issue with missing values, but it obviously only works for aapl-20150627.xml.
So the issue is that the code doesn't see all the context id.
I have manually checked xbrl file, and here is xbrl format for 2 out of 3 context id's

#THE CODE SEES THIS ONE
<context id="eol_PE2035----1510-Q0008_STD_91_20150627_0_1116875x1116381">
<entity>
<identifier scheme="http://www.sec.gov/CIK">0000320193&gt;&gt;</identifier>
<segment>&gt;&gt;<xbrldi:explicitmember dimension="us-gaap:ConsolidationItemsAxis">us-gaap:MaterialReconcilingItemsMember</xbrldi:explicitmember>&gt;&gt;</segment>
</entity>
<period>
      &gt;&gt;<startdate>2015-03-29&gt;&gt;</startdate>
<enddate>2015-06-27&gt;&gt;</enddate>
</period>

#DOESN'T SEE THIS ONE
<context id="eol_PE2035----1510-Q0008_STD_273_20150627_0_1116875x1116381">
<entity>
<identifier scheme="http://www.sec.gov/CIK">0000320193&gt;&gt;</identifier>
<segment>&gt;&gt;<xbrldi:explicitmember dimension="us-gaap:ConsolidationItemsAxis">us-gaap:MaterialReconcilingItemsMember</xbrldi:explicitmember>&gt;&gt;</segment>
</entity>
<period>
      &gt;&gt;<startdate>2014-09-28&gt;&gt;</startdate>
<enddate>2015-06-27&gt;&gt;</enddate>
</period>

Common Stock Shares Outstanding

An enhancement providing access to the

<us-gaap:CommonStockSharesOutstanding>

tag when doing:

gaap_obj.CommonStockSharesOutstanding

Parsing for different document types

Parsing is needed for different kinds of documents

Ex:
10-K 10-Q

Can this be shared on conda-forge

Can you please share this on conda-forge? Let me know if I can assist.

ImportError: cannot import name 'Serializer'

After importing:
from xbrl import XBRLParser
I get error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
 in ()
----> 1 from xbrl import XBRLParser

/usr/local/lib/python3.5/site-packages/xbrl/__init__.py in ()
      6 version = (1, 1, 0)
      7 
----> 8 from .xbrl import XBRLParser, GAAP, GAAPSerializer, XBRLParserException

/usr/local/lib/python3.5/site-packages/xbrl/xbrl.py in ()
      3 
      4 import re
----> 5 from marshmallow import Serializer, fields
      6 import datetime
      7 import collections

ImportError: cannot import name 'Serializer'

Remove ordereddict dependency

Ordereddict was used before collections was introduced at Python 2.7. Since Python 2 is no longer supported I feel it's probably safe to remove. Python 2.6 or older users may have trouble. If you are happy, I will send a pull request.

To change following should be enough.

diff --git a/setup.py b/setup.py
index 9b1bada..ec9ecc3 100644
--- a/setup.py
+++ b/setup.py
@@ -19,7 +19,7 @@ setup(
     keywords='xbrl, Financial, Accounting, file formats',
     packages=['xbrl'],
     install_requires=['pytest', 'pep8', 'marshmallow',
-    'beautifulsoup4', 'ordereddict', 'lxml', 'six'],
+    'beautifulsoup4', 'lxml', 'six'],
     classifiers=[
         'Intended Audience :: Developers',
         'Natural Language :: English',
diff --git a/xbrl/xbrl.py b/xbrl/xbrl.py
index b93f8e8..ff4087a 100644
--- a/xbrl/xbrl.py
+++ b/xbrl/xbrl.py
@@ -4,7 +4,7 @@
 import re
 from marshmallow import Schema, fields
 import datetime
-import collections
+import collections as odict
 import six
 import logging
 
@@ -13,12 +13,6 @@ try:
 except ImportError:
     from io import StringIO
 
-if 'OrderedDict' in dir(collections):
-    odict = collections
-else:
-    import ordereddict as odict
-
-
 def soup_maker(fh):
     """ Takes a file handler returns BeautifulSoup"""
     try:

Add Python 3 support

Support should be added for Python 3

What is the way to put in a requests.text into the xbrl_parser.parse() method/'

It is easy for me to get filenames on sec website for xml docs and I can then do

requests.get("url_file_location)
I would then want to file requests.text which is the contents of that file rather than downloading and then opening the file.

Thank you.

Investigate integration with pandas

A lot of this data can be read into pandas dataframes easily. This could be a fun an useful integration.

Error running provided example (gaap.py)

When I run gaap.py I am met with the following error:

File "/Users/xx/Coding/financial_fundamentals/venv/lib/python2.7/site-packages/xbrl/xbrl.py", line 144, in parseGAAP
context_tags = xbrl.find_all(name=re.compile(doc_root + "context",
AttributeError: 'file' object has no attribute 'find_all'

I can circumvent this error by simply deleting that line in the code, but I am still met with another error (shown below):

File "/Users/xx/Coding/python-xbrl/examples/xbrl.py", line 147, in parseGAAP
context_tags = xbrl.find_all(name=re.compile(doc_root + "context",
AttributeError: 'file' object has no attribute 'find_all'

Any idea what is going on here or how to resolve? I have attempted reinstall and manual instillation, but no solution yet.

XBRLParser.Parse() Doesn't Parse 10-Q or 10-K

I kind of new to Python, so forgive me if I miss something.

I am using python-xbrl with python 2.7. I updated python-xbrl from git with ddc1d69 to fix the marshmallow Schema problem. I have tried parsing 10-K and 10-Q files, but I have yet to have one work. parse() takes a really long time, and then always errors out saying that that file is empty.

Dependencies versions

marshmallow versions which are equal or greater than 2.0.0 don't support - from marshmallow import Serializer. Therefore need to at least add the version numbers in the relevant places.

add CI by Github actions

Unit tests become available after #53 is solved. Since 2020, travis CI changed their service plan and became OSS unfriendly (I personally think). Then can I work on Github actions ?

in python-xbrl v1.1.0, marshmallow version 3.0.0.b7, error: has not object "Serializer"

get_xbrl_data.py:Line #3 > from xbrl import XBRLParser, GAAP, GAAPSerializer

xbrl/init.py: Line #8 > from .xbrl import XBRLParser, GAAP, GAAPSerializer, XBRLParserException

Runtime ERROR:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/.../PyCharmprojects/xbrlpy/get_xbrl_data.py
Traceback (most recent call last):
File "/Users/.../PyCharmprojects/xbrlpy/get_xbrl_data.py", line 3, in
from xbrl import XBRLParser, GAAP, GAAPSerializer
File "/usr/local/lib/python2.7/site-packages/xbrl/init.py", line 8, in
from .xbrl import XBRLParser, GAAP, GAAPSerializer, XBRLParserException
File "/usr/local/lib/python2.7/site-packages/xbrl/xbrl.py", line 5, in
from marshmallow import Serializer, fields
ImportError: cannot import name Serializer

Parse "custom" xbrl files

I'm unsure if this is a bug or just me who is to stupid to get this lib working.
I have tried to work with danish xbrl files from the danish site: virk.dk
Look at this f.ex:
https://datacvr.virk.dk/data/visenhed?enhedstype=virksomhed&id=27400744&soeg=danmark&type=Alle
Under "regnskaber" you should find a link to xbrl-files.
F.ex. this: https://datacvr.virk.dk/data/offentliggorelse?dl_ref=Y3ZyLmRrOi8veGJybHMvWC03RENCRUIxMi0yMDE1MDEzMF8xNjA5NTJfNzA0
When i look into the file, the xml is in this format:

I see tags like: e:ProfitLoss
But when i look into your parser, it looks for:
income_loss += xbrl.find_all(name=re.compile("(us-gaap:profitloss)", re.IGNORECASE | re.MULTILINE))

So is the only xbrl files supported, the files with: us-gaap: infront of the "tags" ?

parseGAAP() got an unexpected keyword argument 'doc_type'

Python 3.6.0 (default, Feb 11 2017, 14:54:45)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.

from xbrl import XBRLParser, GAAP, GAAPSerializer
xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse('aapl-20150627.xml')
gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date='20150627', doc_type='10-Q', context='current')
Traceback (most recent call last):
File "", line 1, in
TypeError: parseGAAP() got an unexpected keyword argument 'doc_type'

How to fix it?

CVE-2018-17175 Fix

Based on CVE-2018-17175 the version of Marshmallow in use has a security vulnerability

Installed on OSX 10.9 using "sudo pip" but GAAPSerializer not found

sudo pip install --upgrade python-xbrl
Requirement already up-to-date: python-xbrl in /Library/Python/2.7/site-packages
Requirement already up-to-date: pytest in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: pep8 in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: marshmallow in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: beautifulsoup4 in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: ordereddict in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: lxml in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: six in /Library/Python/2.7/site-packages (from python-xbrl)
Requirement already up-to-date: py>=1.4.25 in /Library/Python/2.7/site-packages (from pytest->python-xbrl)
Cleaning up...

from xbrl import XBRLParser, GAAPSerializer

xbrl_parser = XBRLParser()

xbrl = XBRLParser.parse(file("~/Desktop/csco-20141025.xml"))

gaap_obj = XBRLParser.parseGAAP(xbrl, doc_date="20141025", doc_type="10-Q", context="current")
serialized = GAAPSerializer(gaap_obj)
print serialized.data

from xbrl import XBRLParser, GAAPSerializer

ImportError: cannot import name GAAPSerializer

maximum recursion depth exceeded in instancecheck

Is there anyway to increase the recursion depth? I keep running into this issue, I don't mind if it takes a bit longer to process a file but I'd prefer for the action to finish rather than erroring out.

Add tests for all the xbrl document creation software

Tests are needed for correct parsing of XBRL documents created by all the major creation software platform

Sample run returns 0 on all data


# coding: utf-8

# In[1]:

from xbrl import XBRLParser, GAAP, GAAPSerializer

response = urllib2.urlopen("https://www.sec.gov/Archives/edgar/data/909832/000090983216000032/cost-20160828.xml")
html = response.read()
with open("xbrl.xml","w") as f:
    f.write(html)


# In[9]:

xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse(file("xbrl.xml"))


# In[13]:

gaap_obj=xbrl_parser.parseGAAP(xbrl)
gaap_obj.revenues

#returns 0

Serialized values all 0

I've run through the setup and tried parsing this google 10-K form:

http://www.sec.gov/Archives/edgar/data/1288776/000128877614000020/goog-20131231.xml

However when I serialized the data, all values are 0, as seen below. Any idea what's happening here?

MarshalResult(data={u'liabilities': 0.0, u'net_cash_flows_financing_continuing': 0.0, u'revenue': 0.0, u'income_tax_expense_benefit': 0.0, u'income_from_equity_investments': 0.0, u'preferred_stock_dividends': 0.0, u'redeemable_noncontrolling_interest': 0.0, u'extraordary_items_gain_loss': 0.0, u'temporary_equity': 0.0, u'costs_and_expenses': 0.0, u'non_current_assets': 0.0, u'net_cash_flows_discontinued': 0.0, u'net_cash_flows_investing_discontinued': 0.0, u'liabilities_and_equity': 0.0, u'other_operating_income': 0.0, u'operating_income_loss': 0.0, u'income_before_equity_investments': 0.0, u'net_income_parent': 0.0, u'equity': 0.0, u'income_loss': 0.0, u'cost_of_revenue': 0.0, u'operating_expenses': 0.0, u'noncurrent_liabilities': 0.0, u'current_liabilities': 0.0, u'net_cash_flows_investing': 0.0, u'stockholders_equity': 0.0, u'net_income_loss': 0.0, u'net_cash_flows_investing_continuing': 0.0, u'nonoperating_income_loss': 0.0, u'net_cash_flows_financing': 0.0, u'net_income_shareholders': 0.0, u'comprehensive_income': 0.0, u'equity_attributable_interest': 0.0, u'commitments_and_contingencies': 0.0, u'comprehensive_income_parent': 0.0, u'net_cash_flows_operating_discontinued': 0.0, u'comprehensive_income_interest': 0.0, u'other_comprehensive_income': 0.0, u'equity_attributable_parent': 0.0, u'assets': 0.0, u'gross_profit': 0.0, u'net_cash_flows_operating_continuing': 0.0, u'current_assets': 0.0, u'interest_and_debt_expense': 0.0, u'net_income_loss_noncontrolling': 0.0, u'net_cash_flows_operating': 0.0}, errors={})

Parsing of only data from this quarter

Many documents will contain multiple elements from different quarters.

Example
StockholdersEquity from both the current quarters and previous years quarters is usually listed

ImportError: cannot import name Serializer

Hi,
I've imported the latest updates from git and checked all of the requirements but I'm still getting this error:

Traceback (most recent call last):
File "C:\Users\kate\Anaconda2\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 326, in RunScript
exec codeObject in main.dict
File "C:\Users\kate\Dropbox\CVR\Regnskab.py", line 15, in
import xbrl
File "C:\Users\kate\Anaconda2\lib\site-packages\xbrl__init__.py", line 8, in
from .xbrl import XBRLParser, GAAP, GAAPSerializer, XBRLParserException
File "C:\Users\kate\Anaconda2\lib\site-packages\xbrl\xbrl.py", line 5, in
from marshmallow import Serializer, fields
ImportError: cannot import name Serializer

Do you have any idea how I can fix this? I'd like to use the library. Thanks.

Allow parsing under different contexts

Also parsing under different contexts. To save working memory we will only allow one context a time

Parsing the first matching value

I am using Arelle app to open xml files and double check the output from the code below

from xbrl import XBRLParser, GAAP, GAAPSerializer

xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse('aapl-20150627.xml')
gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date='20150627', context='current', ignore_errors = 0)
serializer = GAAPSerializer()
result = serializer.dump(gaap_obj)

print result

Output:

MarshalResult(data={u'liabilities': 65285.0, u'net_cash_flows_financing_continuing': 0.0, u'revenue': 0.0, u'income_tax_expense_benefit': 3796.0, u'common_shares_authorized': 0.0, u'income_from_equity_investments': 0.0, u'preferred_stock_dividends': 0.0, u'redeemable_noncontrolling_interest': 0.0, u'extraordary_items_gain_loss': 0.0, u'temporary_equity': 0.0, u'costs_and_expenses': 0.0, u'non_current_assets': 4081.0, u'net_cash_flows_discontinued': 0.0, u'net_cash_flows_investing_discontinued': 0.0, u'liabilities_and_equity': 273151.0, u'other_operating_income': 0.0, u'operating_income_loss': 0.0, u'income_before_equity_investments': 0.0, u'net_income_parent': 0.0, u'equity': 0.0, u'income_loss': 14083.0, u'cost_of_revenue': 0.0, u'operating_expenses': 5598.0, u'noncurrent_liabilities': 0.0, u'current_liabilities': 0.0, u'net_cash_flows_investing': 0.0, u'stockholders_equity': 125677.0, u'net_income_loss': 10677.0, u'net_cash_flows_investing_continuing': 0.0, u'nonoperating_income_loss': 0.0, u'common_shares_outstanding': 0.0, u'net_cash_flows_financing': 0.0, u'net_income_shareholders': 0.0, u'comprehensive_income': 9065.0, u'equity_attributable_interest': 0.0, u'commitments_and_contingencies': 0.0, u'comprehensive_income_parent': 9065.0, u'net_cash_flows_operating_discontinued': 0.0, u'comprehensive_income_interest': 0.0, u'other_comprehensive_income': 0.0, u'equity_attributable_parent': 0.0, u'assets': 3991.0, u'common_shares_issued': 0.0, u'gross_profit': 19681.0, u'net_cash_flows_operating_continuing': 0.0, u'current_assets': 0.0, u'interest_and_debt_expense': 0.0, u'net_income_loss_noncontrolling': 0.0, u'net_cash_flows_operating': 0.0}, errors={})

The problem is that every value is the first matching value in the xml file. So liabilities = 65285.0, is actually us-gaap:LiabilitiesCurrent, which comes before us-gaap:Liabilities.
Same thing with assets = 3991.0 is actually
us-gap:FiniteLivedIntangibleAssetsAccumulatedAmortization, which comes before us-gaap:Assets = 273 151 000 000.

I believe it can be solved by slightly changing part of def parseGAAP() in xbrl.py where xbrl.find_all is used for every value (assets, current_assets, etc)

handle error from improperly formed XBRL contentRef

A content error is generating when parsing xbrl from

INTERCLOUD SYSTEMS, INC. CIK#: 0001128725

do pep8 and add it to the requirements

Format code to match pep8 standards and add it to the requirements list and build

Proper Unit Test

Needs some proper unit tests, mostly of parsing different files with various errors

name file is not defined error showing up.

xbrl = xbrl_parser.parseGAAP(file(r"C:\Users\anant\NETRA\https---archives.nseindia.com-corporate-xbrl-INDAS_90168_794349_14022023041135.xml.url"))

I ran this code, and its giving the error, name file is not defined, kindley let me know how can I resolve it.

cannot import name Serializer

Hi there, I have installed the latest version of marshmallow and when trying to run python-xbrl, I get this error:

import xbrl
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/xbrl/init.py", line 8, in
from .xbrl import XBRLParser, GAAP, GAAPSerializer, XBRLParserException
File "/usr/local/lib/python2.7/dist-packages/xbrl/xbrl.py", line 5, in
from marshmallow import Serializer, fields
ImportError: cannot import name Serializer

Incorrect parsing of current_assets and possibly others

I noticed that in the tests in many cases the assert for current_assets was comparing to 0.0 and in other cases was comparing to a value found in the xml file associated with NoncurrentAssets

The GAAP property name for current assets is us-gaap:AssetsCurrent
The search regexp in xbrl.py line 206 was searching for currentassets, the wrong properyname also due to the [^s]* in the regexp this was also matching on noncurrentassets, precisely the wrong thing.
The tests also appeared to perpetuate this, for example:
- The test_parse_GAAP10Q_QXInteractive() this was incorrectly checking for an
  assert result.data['current_assets'] == 46431.0
  -The only tag containing a value of 46431 in the file "tests/rsh20131231.xml" is that for Noncurrentassets!
A similar issue exists in the current_assets assertion for test_parse_GAAP10K_ThomsonReuters():

I'm working on a fix for these current_asset issues and just wanted to give a "heads up". I will fix the parsing and the tests.

i recognize that FASB are constantly updating their definitions, I'm using the 2016 recommendations,

As a follow up I propose we add some assertions in the tests to check that the balance sheer balances e.g that gaap:LiabilitiesAndStockholdersEquity == gaap:Assets

Regards

Rob Rennison

parsing an xml URL? [question]

Is there anyway supported way for parsing through an url instead of a local file?

So something like this:

url = "http://regnskaber.virk.dk/17438023/eGJybHN0b3JlOi8vWC1DOUZFREM2OC0yMDE0MDUyMV8xMzU2MDdfMTA2L3hicmw.xml"

xbrl_parser = XBRLParser(precision=0)

xbrl = xbrl_parser.parse(file(url))

(probably with out the file())

e1

DEISerializer is not defined NameError

Running the example gaap.py results in the following error:

Traceback (most recent call last):
File "gaap.py", line 29, in
serializer = DEISerializer()
NameError: name 'DEISerializer' is not defined

Add logging support and catch parsing errors

Maybe times while parsing a document parsing errors occur due to incorrect tags. We should catch these errors and log them.

Cannot fetch common shares outstanding

Hello
great library! just what i need to monitor historical data for corporates. Unfortunately, out of the gaap_object i cannot get out the common_shares_outsanding, which is what i desperately need.
Would you be able to tell me why
here's my sample code, i'll attach the XBRL i m trying to parse

    >>> from xbrl import *
    >>> xbrl_parser = XBRLParser()
    >>> xbrl = xbrl_parser.parse(file('brn-20171231.xml'))
    >>> gaap_obj = xbrl_parser.parseGAAP(xbrl, context="current", ignore_errors=0)
    >>>gaap_obj.__dict__
    {'assets': 31231.0,

'commitments_and_contingencies': 0,
'common_shares_authorized': 0,
'common_shares_issued': 0,
'common_shares_outstanding': 0,
'comprehensive_income': -999.0,
'comprehensive_income_interest': 0,
'comprehensive_income_parent': -999.0,
'cost_of_revenue': 0.0,
'costs_and_expenses': 3240.0,
'current_assets': 0,
'current_liabilities': 0,
'equity': 0,
'equity_attributable_interest': 408.0,
'equity_attributable_parent': 0,
'extraordary_items_gain_loss': 0,
'gross_profit': 0,
'income_before_equity_investments': -1384.0,
'income_continuing_operations_tax': 0,
'income_discontinued_operations': 0,
'income_from_equity_investments': -153.0,
'income_loss': -1537.0,
'income_tax_expense_benefit': -503.0,
'interest_and_debt_expense': 0,
'liabilities': 227.0,
'liabilities_and_equity': 31231.0,
'net_cash_flows_discontinued': 0,
'net_cash_flows_financing': 0,
'net_cash_flows_financing_continuing': 0,
'net_cash_flows_investing': 0,
'net_cash_flows_investing_continuing': 0,
'net_cash_flows_investing_discontinued': 0,
'net_cash_flows_operating': 0,
'net_cash_flows_operating_continuing': 0.0,
'net_cash_flows_operating_discontinued': 0,
'net_cash_operating_continuing': 0,
'net_income_loss': -1017.0,
'net_income_loss_noncontrolling': 0,
'net_income_parent': 0.0,
'net_income_shareholders': 0,
'non_current_assets': -31231.0,
'noncurrentLiabilities': 0.0,
'noncurrent_liabilities': 0,
'nonoperating_income_loss': 0,
'operating_expenses': 0,
'operating_income_loss': 0,
'other_comprehensive_income': 0,
'other_operating_income': 0,
'preferred_stock_dividends': 0,
'redeemable_noncontrolling_interest': 0,
'revenue': 0.0,
'revenues': 821.0,
'stockholders_equity': 16254.0,
'temporary_equity': 0}

If u dont have time, pls give me some hints and i'll dig in the code myself
thanks a lot!
Damn, i cant attach any files. the XBRL i am tyring to parse is at this location

https://www.sec.gov/Archives/edgar/data/10048/000001004818000002/

Change object serialization for marshmallow 1.0

As noted marshmallow 1.0 has changed to provide a cleaner API to how objects get serialized. Update code to use the new API.

Expose custom XBRL tags

Expose the custom XBRL tags using a blanket tag catcher

'NoneType' object has no attribute 'name'

Running the example in gaap.py returns the error:

File "C:\Python27\lib\site-packages\xbrl\xbrl.py", line 80, in parse
re.IGNORECASE | re.MULTILINE)).name
AttributeError: 'NoneType' object has no attribute 'name'

doc_type Parameter

What is the point of including this parameters throughout the project? From what I can tell it's not being used anywhere, and doesn't do anything to further limit parsing of 10K or 10Q document types anyway. Might be worth considering it's removal.

Parsing error on AAPL's most recent filing.

Parsing error is occurring with
http://www.sec.gov/Archives/edgar/data/320193/000119312514383437/aapl-20140927.xml

xbrl = XBRLParser.parse(file("aapl-20140927.xml"))

And this is the error I get back:
AttributeError Traceback (most recent call last)
in ()
----> 1 xbrl = XBRLParser.parse(file("aapl-20140927.xml"))

C:\Users\PJE\AppData\Local\Enthought\Canopy32\User\lib\site-packages\xbrl\xbrl.pyc in parse(self, file_handle)
76 # lookahead to see if we need a custom leading element
77 lookahead = xbrl.find(name=re.compile("context",
---> 78 re.IGNORECASE | re.MULTILINE)).name
79 if ":" in lookahead:
80 self.xbrl_base = lookahead.split(":")[0] + ":"

AttributeError: 'NoneType' object has no attribute 'name'

Module rather than a script

Why is the module installed as a script and not as a module? Instead of

scripts=['python-xbrl/xbrl.py'],

in the setup.py I would have 'py_modules' argument set to the module name. At the moment the module gets installed in /usr/local/bin/ on my computer.

TypeError: '_io.TextIOWrapper' object is not callable

Great library, but I need help with the following error when executing my code
for filename in reversed(filelist): print("filename", filename) # print("index", index) this_file = {} thisfilename = os.path.join(this_path, filename) with open(thisfilename, 'r', encoding='utf-8') as file: print(thisfilename) this10q = file.read() xbrl_parser = XBRLParser() xbrl = xbrl_parser.parse(file(thisfilename))

Error message is detailed below :
Traceback (most recent call last): File "C:/Users/sagar/Documents/DEAR/analysis-automation-master/my-analyzer/secfilingextraction.py", line 21, in <module> xbrl = xbrl_parser.parse(file(thisfilename)) TypeError: '_io.TextIOWrapper' object is not callable