mwaldstein / edgarwebr Goto Github PK
View Code? Open in Web Editor NEWR package for interacting with the SEC's EDGAR filing search and retrieval system
Home Page: https://mwaldstein.github.io/edgarWebR/
License: Other
R package for interacting with the SEC's EDGAR filing search and retrieval system
Home Page: https://mwaldstein.github.io/edgarWebR/
License: Other
Hello,
I'm running into an issue with the company_filings function in the EdgarWebR package. Specifically, the browse_edgar subfunction spits out an error sporadically when trying to find a company's filings:
Error in browse_edgar(x, ownership = ownership, type = type, before = before, :
Could not find company: XXXXXXX
...where XXXXXX stands for a company's CIK.
Sometimes the function will work and return the desired results, but most times it fails with the error message above. I'm currently running the function in a loop with several CIKs and each time a different CIK causes the function to error out.
I'm running R 3.6.2 in RStudio 1.2.1335.
Any help you can provide is greatly appreciated.
Thank you,
-Mike
Hi, I love your package. This worked on Feb 18th, but doesn't now. Not sure if this is an xml2 problem, it looks like some updates for that package came out subsequently.
filing_list <-
company_filings(
as.character('AAPL'),
ownership = FALSE,
type = '10-K',
before = "2020207",
count = 40,
page = 1)
Error in xml2::url_absolute(res[[ref]], xml2::xml_url(doc)) :
Base URL must be length 1
Any thoughts appreciated.
Hi and thanks for the package! Looks very useful. Just a quick editor nit -- the vignette on CRAN still says this:
How to Download
edgarWebR is not yet of CRAN, as the API hasn’t stabilized yet. In the meantime, you can get a copy from github by using devtools:
edgarWebR functions are not vectorized which causes unexpected and unclear errors.
To fix this, functions should warn on multiple inputs.
I noticed the company search page was down Friday night 7/3/2020. Appears and update has been done to the RSS feeds.
aapl <- company_filings(
x = "AAPL"
)
Returning the attached screenshot for me.
Hi,
When I used parse_filing for the below URLs, here are the errors:
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
Excessive depth in document: 256 use XML_PARSE_HUGE option [1]
Here are a few sample URLs:
https://www.sec.gov/Archives/edgar/data/1065648/000106564809000009/form_10k.htm
https://www.sec.gov/Archives/edgar/data/1010247/000101024709000005/form10k.htm
https://www.sec.gov/Archives/edgar/data/861459/000086145909000013/form10-q.htm
Again, thanks very much for contributing this package! It's fantastic.
Best regards
Edgar website allows to search by either the name (regular search) or by CIK/Ticker symbol (Fast Search).
https://www.sec.gov/edgar/searchedgar/companysearch.html
I would love to have company_search() extended with a CIK=TRUE/FALSE parameter, and something like the following logic:
ifelse(CIK,
ifelse(file_number,
paste0("&company=&filenum=", x),
paste0("&company=", URLencode(x, reserved = TRUE))),
ifelse(file_number,
paste0("&CIK=&filenum=", x),
paste0("&CIK=", URLencode(x, reserved = TRUE)))
)
I was not able to get it testable, so I don't know:
Again, would love to have this implemented!
Hello and thank you for the great package
I wonder if it is possible to parse the Complete submission of the K-8 filing
Regards
snvv
Fails to extract item lines for combined items like:
Items 1 and 2. | BUSINESS AND PROPERTIES.
Example url: https://www.sec.gov/Archives/edgar/data/1031093/000107997317000033/svbl_10k-103116.htm
Getting an error while trying to use full_text:
full_text(type="D",count=100)
Error in full_text(type = "D", count = 100) :
Unable to reach the SEC full text search endpoint (https://searchwww.sec.gov/EDGARFSClient/jsp/EDGAR_MainAccess.jsp)
Looks like support for the legacy full text ended October 1st.
It isn't always clear just from the metadata if a filing is HTML or text only.
parse_filing
already checks for html-wrapped text, should extend to check for exclusively text files.
Currently tests are run against a local cache using httptest - vignettes however hit the sec server, adding a bit of overhead in package testing which isn't helpful. New version of httptest has some features to help use a http cache for vignettes.
Reference doc: https://enpiar.com/r/httptest/articles/vignettes.html
Hi, Micah
i detected another issue, in parse_filing function, i understand it will split the content mainly based on parent nodes such as
Here is the example:
https://www.sec.gov/Archives/edgar/data/1424844/000092290708000774/form10k_122308.htm
thanks in advance!
Regards
Derek
I'm using edgarWebR well. Thank you very much~!
It's really beneficial for me!!
But I want to know the way I can get the DEF 14A original URL.
I tried to get URLs using company_filings function below, But I can get only master index URL.
` db_def <- company_filings(db_cik$Ticker[i], type = "DEF 14A", count = 1)
accession_number act file_number filing_date accepted_date href type
1 34 001-07463 2022-12-13 2022-12-13 https://www.sec.gov/Archives/edgar/data/52988/000119312522303804/0001193125-22-303804-index.htm DEF 14A
2 34 001-07463 2021-12-10 2021-12-10 https://www.sec.gov/Archives/edgar/data/52988/000119312521354013/0001193125-21-354013-index.htm DEF 14A`
I find all the sources including GitHub, StackOverflow, and lots of tech blogs, but I can't find the way that I want.
T-T
in the function , parse_filings, some words are still connected to each other such as "Weightedaverageexerciseprice" after parsing. This means it cannot recognize this as a proper word.
In the original html document, it is expressed as Weighted
average
exercise
price, this indicates that parse_filings function cannot recognize the
properly.
pls fix this and many thx!
I am trying to retrieve SC 13G filings for a given company. My issue is that there is more than 100 filings on certain dates. Hence when I am fetching the data from Edgar I am missing some filings. Is there a way to deal with that?
balthasar's workaround appears to work great! Thanks for the help. I'm not too familiar with APIs, so it took a while to figure out what was going wrong. This might be helpful to someone that isn't too family with API keys:
install.packages("usethis")
require(usethis)
usethis::edit_r_environ()
# A .Renvrion window will open. Add the following to the .Renviron and don't forget to click save:
EDGARWEBR_USER_AGENT = "XXXX"
#Run the rest of balthasars' code to access EDGAR
Example:
parse_filing(paste0('https://www.sec.gov/Archives/edgar/data/1048911/000119312515252494/d48165d10k.htm'))
In this case, Item 1A. Risk Factors is included by reference. It is on pages 81-86.
Section that is labelled "Risk Factors" only identifies where to look for the actual text.
SEC regulations will almost surely require more reference filings (contact me for details if explanation is needed) so the issue will be more acute in the near future.
When retrieving filings, I receive an error for being an Undeclared Automated Tool. For example, when using
latest_filings()
I receive error
No encoding supplied: defaulting to UTF-8.
Error in check_result(res) :
EDGAR request blocked from Undeclared Automated Tool.
Please visit https://www.sec.gov/developer for best practices.
See https://mwaldstein.github.io/edgarWebR/index.html#ethical-use--fair-access for your responsibilities
Consider also setting the environment variable 'EDGARWEBR_USER_AGENT
I found in the README the following information:
Because of abusive use of this library, the SEC is likely to block its use “as is” without setting a custom ‘User Agent’ identifier. Details for setting a custom agent are below.
However, no details were given below. Could anyone help me on setting the user agent?
I have a code that extracts the url to exhibit 21, by using the url to the 10-K filing. However, I am now experiencing problems with the parse_submission function.
I receive this error message:
Error in charToText(x) :
Unable to reach the SEC endpoint (https://www.sec.gov/Archives/edgar/data/835011/000117184312000904/0001171843-12-000904.txt)
I am using the function in a loop. The problem occurs on different links if I rerun the code. If I run the code manually, it works. If I run the code enough times, I get output in the end, but never for all 3370 observations in my data set.
Any help would be useful, thanks.
Hello
I have find many url that does not return part.name and item.name
Below is a minimal example to reproduce the problem
library(edgarWebR)
filing_doc = "https://www.sec.gov/Archives/edgar/data/1560385/000155837019010666/lmca-20190930x10q1a7cb4.htm"
doc <- parse_filing(filing_doc, include.raw = TRUE)
doc$part.name
doc$item.name
Thank you
snvv
Currently when cleaning text filings, tags should get stripped out. The current code expects the page markers to have a page number, eg.
<PAGE> 10
The regex needs to be altered to look for and remove the marker when there is no page number.
I get this error every time I try to run the full_text function:
Error in curl::curl_fetch_memory(url, handle = handle) :
Could not resolve host: searchwww.sec.gov
Does anyone have a similar issue?
Thanks!
Because of #11, if default HTML parsing fails, it attempts to repeat using the HUGE option passed to read_html
.
This option should be used all the time for consistency, but most of the existing parsing tests started failing if the option was used universally, leading to the conditional so it is only used as needed. We need to better understand the parsing difference and do a bunch of testing to understand the impact of making the option universal...
Bug report from Mohan:
Filing: https://www.sec.gov/Archives/edgar/data/104938/0000950131-94-000440.txt
Fix:
chomp empty lines in parse_text_filing:
filing_doc <- gsub("\\n +\\n", "\n\n", filing_doc)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.