Giter Club home page Giter Club logo

edgarwebr's People

Contributors

mwaldstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

edgarwebr's Issues

Parse all HTML with `option = "HUGE"`

Because of #11, if default HTML parsing fails, it attempts to repeat using the HUGE option passed to read_html.

This option should be used all the time for consistency, but most of the existing parsing tests started failing if the option was used universally, leading to the conditional so it is only used as needed. We need to better understand the parsing difference and do a bunch of testing to understand the impact of making the option universal...

Documentation: Docuement how 'Fast Search' works and is implemented in edgarWebR

Edgar website allows to search by either the name (regular search) or by CIK/Ticker symbol (Fast Search).
https://www.sec.gov/edgar/searchedgar/companysearch.html

I would love to have company_search() extended with a CIK=TRUE/FALSE parameter, and something like the following logic:

ifelse(CIK,
      ifelse(file_number,
             paste0("&company=&filenum=", x),
             paste0("&company=", URLencode(x, reserved = TRUE))),
      ifelse(file_number,
             paste0("&CIK=&filenum=", x),
             paste0("&CIK=", URLencode(x, reserved = TRUE)))
       )

I was not able to get it testable, so I don't know:

  • if indeed the api would accept the CIK-fie;d
  • how CIK and filenum have to interact, which combinations are compatible

Again, would love to have this implemented!

Incomplete return of parse_filing

Hello
I have find many url that does not return part.name and item.name

Below is a minimal example to reproduce the problem

library(edgarWebR)
filing_doc = "https://www.sec.gov/Archives/edgar/data/1560385/000155837019010666/lmca-20190930x10q1a7cb4.htm"
doc <- parse_filing(filing_doc, include.raw = TRUE)
doc$part.name
doc$item.name

Thank you
snvv

company_filings and filing_details now giving "xml2::url_absolute error"

Hi, I love your package. This worked on Feb 18th, but doesn't now. Not sure if this is an xml2 problem, it looks like some updates for that package came out subsequently.

filing_list <-
company_filings(
as.character('AAPL'),
ownership = FALSE,
type = '10-K',
before = "2020207",
count = 40,
page = 1)

Error in xml2::url_absolute(res[[ref]], xml2::xml_url(doc)) :
Base URL must be length 1

Any thoughts appreciated.

Parse Filings fails if the section is included by reference

Example:
parse_filing(paste0('https://www.sec.gov/Archives/edgar/data/1048911/000119312515252494/d48165d10k.htm'))

In this case, Item 1A. Risk Factors is included by reference. It is on pages 81-86.
Section that is labelled "Risk Factors" only identifies where to look for the actual text.

SEC regulations will almost surely require more reference filings (contact me for details if explanation is needed) so the issue will be more acute in the near future.

balthasars' workaround appears to work

balthasar's workaround appears to work great! Thanks for the help. I'm not too familiar with APIs, so it took a while to figure out what was going wrong. This might be helpful to someone that isn't too family with API keys:

   install.packages("usethis")
   require(usethis)
   usethis::edit_r_environ()

   # A .Renvrion window will open. Add the following to the .Renviron and don't forget to click save:
   EDGARWEBR_USER_AGENT = "XXXX"
   #Run the rest of balthasars' code to access EDGAR
      

Vignette info

Hi and thanks for the package! Looks very useful. Just a quick editor nit -- the vignette on CRAN still says this:

How to Download

edgarWebR is not yet of CRAN, as the API hasn’t stabilized yet. In the meantime, you can get a copy from github by using devtools:

Unable to find documentation to set User Agent

When retrieving filings, I receive an error for being an Undeclared Automated Tool. For example, when using

latest_filings()

I receive error

No encoding supplied: defaulting to UTF-8.
Error in check_result(res) : 
  EDGAR request blocked from Undeclared Automated Tool.
Please visit https://www.sec.gov/developer for best practices.
See https://mwaldstein.github.io/edgarWebR/index.html#ethical-use--fair-access for your responsibilities
Consider also setting the environment variable 'EDGARWEBR_USER_AGENT

I found in the README the following information:

Because of abusive use of this library, the SEC is likely to block its use “as is” without setting a custom ‘User Agent’ identifier. Details for setting a custom agent are below.

However, no details were given below. Could anyone help me on setting the user agent?

Could I get the DEF 14A link using company_filings function?

I'm using edgarWebR well. Thank you very much~!
It's really beneficial for me!!

But I want to know the way I can get the DEF 14A original URL.

I tried to get URLs using company_filings function below, But I can get only master index URL.

` db_def <- company_filings(db_cik$Ticker[i], type = "DEF 14A", count = 1)

accession_number act file_number filing_date accepted_date href type
1 34 001-07463 2022-12-13 2022-12-13 https://www.sec.gov/Archives/edgar/data/52988/000119312522303804/0001193125-22-303804-index.htm DEF 14A
2 34 001-07463 2021-12-10 2021-12-10 https://www.sec.gov/Archives/edgar/data/52988/000119312521354013/0001193125-21-354013-index.htm DEF 14A`

I find all the sources including GitHub, StackOverflow, and lots of tech blogs, but I can't find the way that I want.
T-T

company_filings function sporadically locates company filings

Hello,

I'm running into an issue with the company_filings function in the EdgarWebR package. Specifically, the browse_edgar subfunction spits out an error sporadically when trying to find a company's filings:

Error in browse_edgar(x, ownership = ownership, type = type, before = before, :
Could not find company: XXXXXXX

...where XXXXXX stands for a company's CIK.

Sometimes the function will work and return the desired results, but most times it fails with the error message above. I'm currently running the function in a loop with several CIKs and each time a different CIK causes the function to error out.

I'm running R 3.6.2 in RStudio 1.2.1335.

Any help you can provide is greatly appreciated.

Thank you,

-Mike

some words are not parsed correctly due to tag <BR>

in the function , parse_filings, some words are still connected to each other such as "Weightedaverageexerciseprice" after parsing. This means it cannot recognize this as a proper word.
In the original html document, it is expressed as Weighted
average
exercise
price, this indicates that parse_filings function cannot recognize the
properly.

pls fix this and many thx!

Unable to use full_text

I get this error every time I try to run the full_text function:

Error in curl::curl_fetch_memory(url, handle = handle) :
Could not resolve host: searchwww.sec.gov

Does anyone have a similar issue?

Thanks!

Make parse_filing function support html-wrapped text filings

Hi, Micah
i detected another issue, in parse_filing function, i understand it will split the content mainly based on parent nodes such as


, however, it cannot parse the children nodes such as , then the item and part cannot be recognized correctly. So the ideal solution would be parse all nodes (including children nodes) to make the parse function as loose as possible otherwise we could miss quite some information.

Here is the example:
https://www.sec.gov/Archives/edgar/data/1424844/000092290708000774/form10k_122308.htm

thanks in advance!

Regards
Derek

Unable to reach the SEC endpoint

I have a code that extracts the url to exhibit 21, by using the url to the 10-K filing. However, I am now experiencing problems with the parse_submission function.

I receive this error message:
Error in charToText(x) :
Unable to reach the SEC endpoint (https://www.sec.gov/Archives/edgar/data/835011/000117184312000904/0001171843-12-000904.txt)

I am using the function in a loop. The problem occurs on different links if I rerun the code. If I run the code manually, it works. If I run the code enough times, I get output in the end, but never for all 3370 observations in my data set.

Any help would be useful, thanks.

Excessive depth in document

Hi,

When I used parse_filing for the below URLs, here are the errors:

Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

Here are a few sample URLs:
https://www.sec.gov/Archives/edgar/data/1065648/000106564809000009/form_10k.htm
https://www.sec.gov/Archives/edgar/data/1010247/000101024709000005/form10k.htm
https://www.sec.gov/Archives/edgar/data/861459/000086145909000013/form10-q.htm

Again, thanks very much for contributing this package! It's fantastic.

Best regards

Warning on vector inputs

edgarWebR functions are not vectorized which causes unexpected and unclear errors.

To fix this, functions should warn on multiple inputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.