Comments (5)
Hey there,
the goal of this tool is certainly not to standardize financial data. This is basically the goal of the XBRL Standard itself. How well the data is standardized solely depends on the financial regulators and the creator of the XBRL document.
I guess your question is probably: "How can I use this tool to collect and compare data from different companies".
With py-xbrl you can basically extract any information that is tagged in an XBRL or iXBRL document. If you are not familiar with XBRL, maybe have a look at this iXBRL viewer. All values that are "clickable" are tagged with XBRL and can be read in with py-xbrl
https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm
i.e.: The following code extracts "Earning per share" from apple and Microsoft.
import logging
from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser, XbrlInstance
cache: HttpCache = HttpCache('./cache')
xbrlParser = XbrlParser(cache)
subs = {
"AAPL": "https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm",
"MSFT": "https://www.sec.gov/Archives/edgar/data/789019/000156459022035087/msft-10q_20220930.htm"
}
for ticker in subs.keys():
inst: XbrlInstance = xbrlParser.parse_instance(subs[ticker])
for fact in inst.facts:
if fact.concept.name == 'EarningsPerShareBasic':
print(f"On {fact.context.end_date} {ticker} had an EPS of {fact.value}")
output:
On 2022-09-24 AAPL had an EPS of 6.15
On 2021-09-25 AAPL had an EPS of 5.67
On 2020-09-26 AAPL had an EPS of 3.31
On 2022-09-30 MSFT had an EPS of 2.35
On 2021-09-30 MSFT had an EPS of 2.73
On 2020-09-26 AAPL had an EPS of 3.31
With py-xbrl you can extract thousands of different facts from thousand of companies directly from the source (the actual financial report from the company) instead of going through an API.
from py-xbrl.
Pretty damn cool, what would be the difference between what you are doing and what Ties de kok did with https://github.com/TiesdeKok/fast_xbrl_parser
It seems that you are parsing the htm file https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm
And that he is parsing the xml file: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000016/goog-20221231_def.xml
Do you know if these datasets are meant to contain the same information (facts/concepts). I wonder what would be the advantage, disadvantage of using one over the other.
from py-xbrl.
from py-xbrl.
@rayniervanegmond Thank you for the great explanation! I can only agree entirely with what @rayniervanegmond said!
It is true that the SEC also provides XBRL files for iXBRL submissions. However these are converted from the original iXBRL filings, this is a service the SEC provides for compatibility reasons.
But I would always prefer to parse iXBRL since it has several benefits.
Regarding your second question (@firmai ):
TBH, I did not try the "fast_xbrl_parser" from "TiesdeKok". Seems like it is coded in RUST while 'py-xbrl' is purely python based.
Another great open-source library for parsing XBRL is Arelle. It offers many functionalities, way more than 'py-xbrl'. However, this vast range of functionalities also increases complexity. The goal of 'py-xbrl' was always to parse filings and get all of the data as easily as possible, never XBRL validation which is also a huge part of a proper XBRL processor.
from py-xbrl.
from py-xbrl.
Related Issues (20)
- Need path or reference to source file of a Linkbase HOT 2
- Equals method for all fact classes HOT 1
- Solution to frequently missing taxonomy specifications in UK submissions HOT 21
- Parsing Failures for Empty Fact Values and 'nil' Text in XBRL Documents HOT 2
- Support a New Taxonomy? HOT 1
- Parsing filings with empty imports HOT 3
- Issue with dateMonthDayYearEN function in Transformation __init__ file HOT 1
- xbrl.TaxonomyNotFound HOT 2
- parse_ixbrl does not close the file it opens HOT 4
- No support for embedded/inline schemas
- Elements missing after parsing HOT 3
- maximum recursion depth exceeded in __instancecheck__ HOT 4
- Difference between pypi and github releases? HOT 2
- Infinite recursion loops with cyclic taxonomy imports. HOT 2
- Check differences between transformation registry version 4 and 5
- TaxonomyNotFound: The taxonomy with namespace http://xbrl.sec.gov/dei/2024 could not be found. Please check if it is imported in the schema file
- Unclear +/- sign of some facts HOT 17
- Space in url creates issues when requesting a taxonomy
- Potential arg bug in transformations __init__ HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from py-xbrl.