Comments (5)
Thank you for your issue!
No, the Http cache is not optional at this time. Even if you have downloaded the instance file and/or the files from the extension taxonomy, the parser must also download all the taxonomies and their files that are imported by the XBRL instance file.
For submissions from the SEC this includes for example the US-GAAP taxonomy, the DEI Taxonomy and the SRT Taxonomy.
These standard taxonomies can be pretty huge (i.e: US-GAAP 2020 has about 18 MB of xml files) thus caching is required when parsing multiple taxonomies. (you don't want do download the same standard taxonomy again and again for every of your 1000 submissions).
I got your example running with the following code:
logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./cache/')
# parse from path
instance_path = './data/TSLA/tsla-10k_20201231_htm.xml'
inst1 = parse_xbrl(instance_path, cache, 'https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/')
Currently you have to define the base url to the submission because the Taxonomyschema is imported with a relative path in the instance file.
i.e:
<link:schemaRef xlink:href="./tsla-20201231.xsd" xlink:type="simple"/>
But you are correct, this is very inconvenient if you have already downloaded the files of the extension taxonomy. The parser should at least try to find the schema file in the current directory or the instance file you want to parse.
I will implement this in the next days.
from py-xbrl.
Will do some further testing and documentation and then upload a new package version to pypi in the next 2-3 days.
from py-xbrl.
The parser should at least try to find the schema file in the current directory or the instance file you want to parse.
Awesome, thank you. It'd be great if the parser could find the schema locally. Great job on the project, I'm finding it super helpful!
from py-xbrl.
It should now work with the new package version 1.2.0.
I used the following code to get your example running:
from xbrl_parser.instance import parse_xbrl
from xbrl_parser.cache import HttpCache
import logging
logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./../cache/')
# cache.set_headers({'From': '', 'User-Agent': 'py-xbrl/1.1.4'})
# parse from path
instance_path = './data/TSLA/10-k/20201231/tsla-10k_20201231_htm.xml'
inst1 = parse_xbrl(instance_path, cache)
print(inst1)
I also tested on ~100 other SEC EDGAR submissions, both XBRL and iXBRL and it worked pretty reliabily.
Nevertheless, I would be happy if you give me feedback if it works for you.
from py-xbrl.
Thanks @manusimidt. I'll try using this weekend and reopen if I have any issues.
from py-xbrl.
Related Issues (20)
- Need path or reference to source file of a Linkbase HOT 2
- Standardised Financial Data HOT 5
- Equals method for all fact classes HOT 1
- Solution to frequently missing taxonomy specifications in UK submissions HOT 21
- Parsing Failures for Empty Fact Values and 'nil' Text in XBRL Documents HOT 2
- Missing fact from ixbrl HOT 8
- Date parsing fails
- "Explicit Member"s missing HOT 3
- Add support for Datetime in context duration. HOT 3
- KeyError: 'Unit_sqft' HOT 2
- Add support for the ixt-sec transformations. HOT 1
- unresolved schemas HOT 12
- Not well-formed (invalid token) error for ixblr. HOT 11
- parse_ixbrl should add encoding argument HOT 2
- Be nicer to submissions that do not follow the XBRL standard 100% HOT 6
- New 2022 taxonomies HOT 4
- Bug: instance.json('my-file.json') HOT 1
- Unclear +/- sign of some facts HOT 17
- Space in url creates issues when requesting a taxonomy
- Potential arg bug in transformations __init__ HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from py-xbrl.