Giter Club home page Giter Club logo

python-edgar's Introduction

Build Status PyPI PyPI - License PyPI - Python Version

Build a master index of SEC filings since 1993 with python-edgar

The SEC filings index is split in quarterly files since 1993 (1993-QTR1, 1993-QTR2...). By using python-edgar and some scripting, you can easily rebuild a master index of all filings since 1993 by stitching quarterly index files together. The master index file can be then feed to a database, a pandas dataframe, stata, etc...

An index file is a csv-like (pipe | separated) file that contains the following information:

  • Company name (eg. TWITTER, INC)
  • Company CIK (eg. 0001418091)
  • Filling date (eg. 2013-10-03)
  • Filling type (eg. S1)
  • Filling URL on EDGAR (edgar/data/1418091/0001193125-13-390321.txt)

Once python-edgar is finished downloading index files, you can open an index file with csv.csvreader or pandas.read_csv to have the data programmatically usable. Remember that the delimiter character is |!

python-edgar can be used as a library called from another python script, or as a standalone script.

Features

  • Compliant: Follows fair access guidelines established by the SEC at https://www.sec.gov/os/accessing-edgar-data
  • Efficient: retrieve compressed archives instead of raw index file that are 10 times bigger
  • Import as a library in your python project or run as a standalone script
  • Python 3 only with 0 external dependencies (Python 3 only as of v3.0.0)

Usage

Using python-edgar as a library

Install from pip in a virtualenv

pip install python-edgar

Call the library

import edgar
edgar.download_index(dest, since_year, user_agent, skip_all_present_except_last=False)

Output

2018-06-23 12:41:46,451 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o
2018-06-23 12:41:46,451 - DEBUG - downloading files since 2017
2018-06-23 12:41:46,451 - INFO - 6 index files to retrieve
2018-06-23 12:41:46,465 - DEBUG - worker count: 4
2018-06-23 12:41:48,359 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR3.tsv
2018-06-23 12:41:48,611 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR2.tsv
2018-06-23 12:41:48,649 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR4.tsv
2018-06-23 12:41:48,935 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR1.tsv
2018-06-23 12:41:49,750 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR2.tsv
2018-06-23 12:41:50,237 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR1.tsv
2018-06-23 12:41:50,376 - INFO - complete
2018-06-23 12:41:50,377 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o

Using python-edgar as a standalone script

  • Download this repository as a zip ("Clone or Download" green button, > Download as zip.)
  • Open your terminal inside that directory and run python run.py -h. You can specify a destination directory for downloaded index files like -d edgar-idx (defaults to a temporary directory) and/or specify the year from which you want to build the index with -y 2017 (defaults to current year).
 $ python run.py -y 2017 -ua "MyCompany [email protected]"
2018-06-23 12:41:46,451 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o
2018-06-23 12:41:46,451 - DEBUG - downloading files since 2017
2018-06-23 12:41:46,451 - INFO - 6 index files to retrieve
2018-06-23 12:41:46,465 - DEBUG - worker count: 4
2018-06-23 12:41:48,359 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR3.tsv
2018-06-23 12:41:48,611 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR2.tsv
2018-06-23 12:41:48,649 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR4.tsv
2018-06-23 12:41:48,935 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR1.tsv
2018-06-23 12:41:49,750 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR2.tsv
2018-06-23 12:41:50,237 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR1.tsv
2018-06-23 12:41:50,376 - INFO - complete
2018-06-23 12:41:50,377 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o

Common issues

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

See https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

You will need to wrap your code in main() function to be able to run it on Windows:

def main(): 
    import edgar
    edgar.download_index(".", 2020, skip_all_present_except_last=False)    

if __name__ == '__main__':
    main()    

I am using python 2 ...

Python 2 support has been dropped as of October 2019. See https://pythonclock.org.

Stitch quarterly files to a master file

python-edgar does only one thing and does it well: getting and cleaning uncompressed quarterly index files to your computer. Use command line tools, in the spirit of unix philosophy, to stitch these index files together and create our master index file.

In this example, we called python run.py without arguments. It'll download every quarterly index file since 1993.

 python run.py -y 1993
 
2018-06-23 13:00:16,855 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
2018-06-23 13:00:16,855 - DEBUG - downloading files since 1993
2018-06-23 13:00:16,856 - INFO - 102 index files to retrieve
2018-06-23 13:00:16,879 - DEBUG - worker count: 4
2018-06-23 13:00:18,814 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR4.tsv
2018-06-23 13:00:19,026 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR3.tsv
2018-06-23 13:00:19,157 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2018-QTR2.tsv
2018-06-23 13:00:19,543 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2018-QTR1.tsv
2018-06-23 13:00:20,521 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR2.tsv
2018-06-23 13:00:20,719 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR4.tsv
2018-06-23 13:00:21,016 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR3.tsv
2018-06-23 13:00:21,134 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR1.tsv
2018-06-23 13:00:22,099 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR2.tsv
(...)
dcw07x6zrrr0000gn/T/tmpcF1rx7/1993-QTR2.tsv
2018-06-23 13:00:54,378 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/1993/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/1993-QTR1.tsv
2018-06-23 13:00:54,423 - INFO - complete
2018-06-23 13:00:54,424 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7

Inspect the directory where our files where downloaded:

$ ls -lh /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
total 4964656
drwx------  104 eswiac  staff   3.3K Jun 23 13:00 .
drwxr-xr-x  342 eswiac  staff    11K Jun 23 13:01 ..
-rw-r--r--    1 eswiac  staff   585B Jun 23 13:00 1993-QTR1.tsv
-rw-r--r--    1 eswiac  staff   580B Jun 23 13:00 1993-QTR2.tsv
-rw-r--r--    1 eswiac  staff   1.0K Jun 23 13:00 1993-QTR3.tsv
-rw-r--r--    1 eswiac  staff   2.8K Jun 23 13:00 1993-QTR4.tsv
-rw-r--r--    1 eswiac  staff   2.9M Jun 23 13:00 1994-QTR1.tsv
-rw-r--r--    1 eswiac  staff   2.3M Jun 23 13:00 1994-QTR2.tsv
(...)
-rw-r--r--    1 eswiac  staff    27M Jun 23 13:00 2017-QTR3.tsv
-rw-r--r--    1 eswiac  staff    27M Jun 23 13:00 2017-QTR4.tsv
-rw-r--r--    1 eswiac  staff    41M Jun 23 13:00 2018-QTR1.tsv
-rw-r--r--    1 eswiac  staff    31M Jun 23 13:00 2018-QTR2.tsv

Head to that directory so we can merge these files into a master file using cat

$ cd  /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
$ cat *.tsv > master.tsv
$ du -h master.tsv
2.3G	master.tsv

Now you have this master index file. It's not sorted but that's easy to do (hint: Look into the sort command)

Grab filings from a specific company

Now that we have downloaded the index files it becomes easy, with a bit of command line scripting, to quickly filter by company and extract URLs to the filings we want with grep . In the following example we grep by CIK (1000045), store the output in an intermediate text file, which we re-open with cat and grep again by form 10-K. Prefix the paths with https://www.sec.gov/Archives/ and you'll get the full URL.

eswiac@mbp python-edgar (master) $ grep -h 1000045 /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpvwOzOU/* > 1000045.txt
eswiac@mbp python-edgar (master) $ cat 1000045.txt | grep -h 10-K
1000045|NICHOLAS FINANCIAL INC|10-K|2015-06-15|edgar/data/1000045/0001193125-15-223218.txt|edgar/data/1000045/0001193125-15-223218-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2016-06-14|edgar/data/1000045/0001193125-16-620952.txt|edgar/data/1000045/0001193125-16-620952-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2017-06-14|edgar/data/1000045/0001193125-17-203193.txt|edgar/data/1000045/0001193125-17-203193-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2018-06-27|edgar/data/1000045/0001193125-18-205637.txt|edgar/data/1000045/0001193125-18-205637-index.html

Query the master index with q

https://github.com/harelba/q allows you to run SQL directly on tabular data.

Use with caution: q does not use indexes so running queries against the master index will be very slow since it's rather large. Sorting the master index or narrowing the data to a smaller subset will make search faster. Ultimately you want to load the master index file into a proper database that's able to handle the size.

Some queries you may want to try

  • q "SELECT COUNT(1) FROM 1999-QTR4.tsv"
  • q -d"|" "SELECT * FROM master.tsv where c1 = 1418091 and c3 = '10-Q' order by c4"

License

MIT

python-edgar's People

Contributors

arealseal avatar edouardswiac avatar epogrebnyak avatar gaurang105 avatar jiacheng-liu avatar karthicraghupathi avatar svaningelgem avatar yenchiayi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-edgar's Issues

does not download all quarters

I have noticed that edgar.dowload_index does not download all .tsvfiles for all quarters. Some times i get all 4 quarters for, say 2015, while some other times I get only 2. See the output of my log below (it should have downloaded 26 files, but it downloaded only 20)

2021-06-29 14:57:46,763 - DEBUG - downloads will be saved to ../data/edgar
2021-06-29 14:57:46,763 - DEBUG - downloading files since 2015
2021-06-29 14:57:46,763 - INFO - 26 index files to retrieve
2021-06-29 14:57:46,763 - DEBUG - worker count: 8
2021-06-29 14:57:47,441 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR3/master.zip to ../data/edgar/2019-QTR3.tsv
2021-06-29 14:57:47,629 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR2/master.zip to ../data/edgar/2019-QTR2.tsv
2021-06-29 14:57:47,739 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2021/QTR1/master.zip to ../data/edgar/2021-QTR1.tsv
2021-06-29 14:57:47,785 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2021/QTR2/master.zip to ../data/edgar/2021-QTR2.tsv
2021-06-29 14:57:47,815 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to ../data/edgar/2018-QTR2.tsv
2021-06-29 14:57:47,819 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2020/QTR4/master.zip to ../data/edgar/2020-QTR4.tsv
2021-06-29 14:57:47,860 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR4/master.zip to ../data/edgar/2019-QTR4.tsv
2021-06-29 14:57:47,937 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR1/master.zip to ../data/edgar/2019-QTR1.tsv
2021-06-29 14:57:48,203 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to ../data/edgar/2017-QTR4.tsv
2021-06-29 14:57:48,251 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to ../data/edgar/2018-QTR1.tsv
2021-06-29 14:57:48,389 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to ../data/edgar/2017-QTR3.tsv
2021-06-29 14:57:48,516 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR3/master.zip to ../data/edgar/2016-QTR3.tsv
2021-06-29 14:57:48,578 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to ../data/edgar/2017-QTR2.tsv
2021-06-29 14:57:48,643 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR4/master.zip to ../data/edgar/2016-QTR4.tsv
2021-06-29 14:57:48,736 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR2/master.zip to ../data/edgar/2016-QTR2.tsv
2021-06-29 14:57:48,763 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to ../data/edgar/2017-QTR1.tsv
2021-06-29 14:57:48,914 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR4/master.zip to ../data/edgar/2015-QTR4.tsv
2021-06-29 14:57:48,939 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR3/master.zip to ../data/edgar/2015-QTR3.tsv
2021-06-29 14:57:49,108 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR2/master.zip to ../data/edgar/2015-QTR2.tsv
2021-06-29 14:57:49,217 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR1/master.zip to ../data/edgar/2015-QTR1.tsv
2021-06-29 14:57:49,230 - INFO - complete
2021-06-29 14:57:49,231 - INFO - Files downloaded in ../data/edgar

HTTP Error 404

Hello, when running the command:

edgar.download_index(path, year, user_agent, skip_all_present_except_last=False)

I'm repeatedly getting the following error:

urllib.error.HTTPError: HTTP Error 404: Not Found

It was working fine for several months but suddenly started breaking down yesterday.

Could you please check?

Thanks!

Spelling mistake

In the description of the repo, it says "SEC fillings" instead of "SEC filings".

fail to write to ASCII file due to encoding problem

I found that the downloaded "2011-QTR4.tsv" and "2017-QTR3.tsv" are empty inside. This seems to be related to issue #3 , while the solution in #3 simply ignored the error instead of fixing it thoroughly. After checking with the source code, I found the problem again comes from the encoding problem, as shown in the below picture.
image

Since the error message pops up if we write to ASCII file, I tried to fix it by explicitly indicating that I want to write to UTF8 file, as shown in this stackoverflow page (https://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python). And it works! Specifically, I replace with open(dest+dest_name, 'w+') as idxfile: with with codecs.open(dest+dest_name, 'w+', 'utf-8') as idxfile. I will submit a pull request for this later if you don't mind. Thanks.

retrieve daily updates

Is your feature request related to a problem? Please describe.
Would like to be able to get only new filings without retrieving the old ones

Describe the solution you'd like
Would like to be able to get only new filings without retrieving the old ones

TypeError: expected string or bytes-like object

I am not being able to use the new updated package python-edgar==3.1.3, since it is producing some data type errors. When running

      1 import edgar
      2 edgar.download_index('../data', 1993, USER_AGENT, skip_all_present_except_last=False)

The following error appears:
1 import edgar
----> 2 edgar.download_index('../data', 1993, USER_AGENT, skip_all_present_except_last=False)

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:133, in download_index(dest, since_year, user_agent, skip_all_present_except_last)
131 # naive: 200ms or 5QPS serialized
132 start = _get_millis()
--> 133 _download(file, dest, skip_file, user_agent)
134 elapsed = _get_millis() - start
135 if elapsed < REQUEST_BUDGET_MS:

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:97, in _download(file, dest, skip_file, user_agent)
95 if url.endswith("zip"):
96 with tempfile.TemporaryFile(mode="w+b") as tmp:
---> 97 tmp.write(_url_get(url, user_agent))
98 with zipfile.ZipFile(tmp).open("master.idx") as z:
99 with io.open(dest + dest_name, "w+", encoding="utf-8") as idxfile:

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:69, in _url_get(url, user_agent)
67 hdr = { 'User-Agent' : user_agent }
68 req = urllib.request.Request(url, headers=hdr)
---> 69 content =urllib.request.urlopen(req).read()
70 else:
71 # python 2
72 import urllib2

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
214 else:
215 opener = _opener
--> 216 return opener.open(url, data, timeout)

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
516 req = meth(req)
518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
521 # post-process response
522 meth_name = protocol+"_response"

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:536, in OpenerDirector._open(self, req, data)
533 return result
535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
537 '_open', req)
538 if result:
539 return result

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
494 for handler in handlers:
495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
497 if result is not None:
498 return result

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
1390 def https_open(self, req):
-> 1391 return self.do_open(http.client.HTTPSConnection, req,
1392 context=self._context, check_hostname=self._check_hostname)

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1346 try:
1347 try:
-> 1348 h.request(req.get_method(), req.selector, req.data, headers,
1349 encode_chunked=req.has_header('Transfer-encoding'))
1350 except OSError as err: # timeout error
1351 raise URLError(err)

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1279 def request(self, method, url, body=None, headers={}, *,
1280 encode_chunked=False):
1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1323, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1320 encode_chunked = False
1322 for hdr, value in headers.items():
-> 1323 self.putheader(hdr, value)
1324 if isinstance(body, str):
1325 # RFC 2616 Section 3.7.1 says that text default has a
1326 # default charset of iso-8859-1.
1327 body = _encode(body, 'body')

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1259, in HTTPConnection.putheader(self, header, *values)
1256 elif isinstance(one_value, int):
1257 values[i] = str(one_value).encode('ascii')
-> 1259 if _is_illegal_header_value(values[i]):
1260 raise ValueError('Invalid header value %r' % (values[i],))
1262 value = b'\r\n\t'.join(values)

TypeError: expected string or bytes-like object

Other info:

  • OS: iOS
  • Chip: M1 Pro
  • Environment: miniforge3
  • python 3.10
  • python-edgar 3.1.3

HTTPError: Forbidden

Hello!
I'm not expert, so I hope I'll be clear.

With "edgar.download_index" method, the following error returns:
HTTPError: Forbidden

I can reproduce the error even just by running the code shown in the "Common issues" paragraph of the documentation (adding the user_agent)
def main(): import edgar user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36 Edg/86.0.622.51' edgar.download_index(".", 2015, user_agent, skip_all_present_except_last=False) if __name__ == '__main__': main()

I obtain this
Capture

I used an older version, which started to give some issues, as missing quarters, and so on.
I updated to the latest version, and this is what's happening.
Also updated requests, but still "erroring".

Desktop (please complete the following information):

  • OS: Win 10
  • Python 3.7.11 (the screenshot is with 3.7.3, but with the .11 is the same)

In any way:
thank you for the support, but first, thank you for the package.

Some files fail to download due to encoding errors under python3

I've noticed that specifically 2011-QTR4.tsv and 2017-QTR3.tsv end up having zero byte size after downloading using python3. I've looked through the code and traced this down to encoding error since apparently these files contain something that doesn't fit in utf-8. Running in python2 works fine because apparently it swept some errors under the carpet, which python3 doesn't anymore.

There is a good description here of what changed between python versions:

I've managed to fix it by changing to latin-1, which behaves like python2. Not sure if this is the desired behavior though because errors are ignored and data may be corrupted.
lines = str(lines, "latin-1")

Asking about features this offers vs other Edgar tools

I'm new to Edgar filings so I'm trying to understand the difference between
edgarminers/python-edgar updated 3 months ago, 265 stars, 24 waters, 62 forks

sec-edgar-downloader updated 5 months ago, 265 stars, 11 Watchers, 75 Forks

pyedgar updated 8 months ago 27 stars, 3 Watchers, 11 Forks

Should they work together or are they similar competitors but sibling rivals ?
There are other EDGAR tools and I'm trying to understand how to evaluate them

Describe the solution you'd like
I was looking for a single solution that also worked with pandas, perhaps a scraper, that saves to a database but i'm again still researching edgar and that data impacting my models. I work with a ton of data right now at about 17 tb domestically so I would be a heavy user and would like to contribute if its a fit.

Some of them have search functions others do not
Search by stock ticker
Search by Central Index Key (CIK)

No Search by SEC CIK SEC CIK lookup tool if you cannot find an appropriate ticker

Search for 351 filings - list here

Describe alternatives you've considered
Rolled my own based off what I found online but your script would solve some issues.

Additional context
Pandas integration, database advice

Oops! We can't find this file

Please to input https://www.sec.gov/Archives/ in your browser,。

Oops! We can't find this file!

It seems like that SEC change web diretroy in sec.gov.
How can use your python-edgar ?

Cannot merge tsv files

I downloaded the files but when I tried to merge them into one master file following your instruction, the code kept running and the master file kept getting bigger. I'm not sure what I did wrong 'cause I tried to replicate what you did in the example. Could you please help me with this?

master
code

Runtime error on Windows

Describe the bug

Cannot start python-edgar due to Runtime error, possibly in threading.

To Reproduce

Run

import edgar
edgar.download_index(".", 2020, skip_all_present_except_last=False)

Expected behavior

Downloads files into current directory.

Screenshots

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "D:\Anaconda3\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "D:\Anaconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "D:\Anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\\u0415\u0432\u0433\u0435\u043d\u0438\u0439\Desktop\edgar\edgar1.py", line 2, in <module>
    edgar.download_index(".", 2020, skip_all_present_except_last=False)
  File "D:\Anaconda3\lib\site-packages\edgar\main.py", line 132, in download_index
    pool = multiprocessing.Pool(worker_count)
  File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 212, in __init__
    self._repopulate_pool()
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "D:\Anaconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\Anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Desktop (please complete the following information):

  • Windows 10

Documentation updation

While calling the python-edgar module, we'll get a destination error.

Before:
edgar.download_index(download_directory, since_year, user_agent, skip_all_present_except_last=False)

After:
edgar.download_index(dest, since_year, user_agent, skip_all_present_except_last=False)

It should be like this.

RuntimeError for windows

RuntimeError occurs when runnining the edgar.download_index(dir, year)

This seems to be an error due to the threading on windows. See explanation here: https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

HTTP Error 403: Forbidden

Hello,

My user agent is:

user_agent = "FTC [email protected]"

and the download command is

edgar.download_index(path, 2021, user_agent, skip_all_present_except_last=False)

This is in accordance with the SEC's new fair usage standards.

Yet, we've started to receive the following error:

urllib.error.HTTPError: HTTP Error 403: Forbidden

image

Please help.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.