edgarminers / python-edgar Goto Github PK

Download the SEC filings index from EDGAR since 1993

License: MIT License

Python 96.75% Makefile 3.25%

python edgar 10k cik 8k quarterly-files sec

python-edgar's Introduction

Build a master index of SEC filings since 1993 with `python-edgar`

The SEC filings index is split in quarterly files since 1993 (1993-QTR1, 1993-QTR2...). By using python-edgar and some scripting, you can easily rebuild a master index of all filings since 1993 by stitching quarterly index files together. The master index file can be then feed to a database, a pandas dataframe, stata, etc...

An index file is a csv-like (pipe | separated) file that contains the following information:

Company name (eg. TWITTER, INC)
Company CIK (eg. 0001418091)
Filling date (eg. 2013-10-03)
Filling type (eg. S1)
Filling URL on EDGAR (edgar/data/1418091/0001193125-13-390321.txt)

Once python-edgar is finished downloading index files, you can open an index file with csv.csvreader or pandas.read_csv to have the data programmatically usable. Remember that the delimiter character is |!

python-edgar can be used as a library called from another python script, or as a standalone script.

Features

Compliant: Follows fair access guidelines established by the SEC at https://www.sec.gov/os/accessing-edgar-data
Efficient: retrieve compressed archives instead of raw index file that are 10 times bigger
Import as a library in your python project or run as a standalone script
Python 3 only with 0 external dependencies (Python 3 only as of v3.0.0)

Usage

Using python-edgar as a library

Install from pip in a virtualenv

pip install python-edgar

Call the library

import edgar
edgar.download_index(dest, since_year, user_agent, skip_all_present_except_last=False)

Output

2018-06-23 12:41:46,451 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o
2018-06-23 12:41:46,451 - DEBUG - downloading files since 2017
2018-06-23 12:41:46,451 - INFO - 6 index files to retrieve
2018-06-23 12:41:46,465 - DEBUG - worker count: 4
2018-06-23 12:41:48,359 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR3.tsv
2018-06-23 12:41:48,611 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR2.tsv
2018-06-23 12:41:48,649 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR4.tsv
2018-06-23 12:41:48,935 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR1.tsv
2018-06-23 12:41:49,750 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR2.tsv
2018-06-23 12:41:50,237 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR1.tsv
2018-06-23 12:41:50,376 - INFO - complete
2018-06-23 12:41:50,377 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o

Using python-edgar as a standalone script

Download this repository as a zip ("Clone or Download" green button, > Download as zip.)
Open your terminal inside that directory and run python run.py -h. You can specify a destination directory for downloaded index files like -d edgar-idx (defaults to a temporary directory) and/or specify the year from which you want to build the index with -y 2017 (defaults to current year).

 $ python run.py -y 2017 -ua "MyCompany [email protected]"
2018-06-23 12:41:46,451 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o
2018-06-23 12:41:46,451 - DEBUG - downloading files since 2017
2018-06-23 12:41:46,451 - INFO - 6 index files to retrieve
2018-06-23 12:41:46,465 - DEBUG - worker count: 4
2018-06-23 12:41:48,359 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR3.tsv
2018-06-23 12:41:48,611 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR2.tsv
2018-06-23 12:41:48,649 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR4.tsv
2018-06-23 12:41:48,935 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2018-QTR1.tsv
2018-06-23 12:41:49,750 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR2.tsv
2018-06-23 12:41:50,237 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o/2017-QTR1.tsv
2018-06-23 12:41:50,376 - INFO - complete
2018-06-23 12:41:50,377 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpr2Nk3o

Common issues

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

See https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

You will need to wrap your code in main() function to be able to run it on Windows:

def main(): 
    import edgar
    edgar.download_index(".", 2020, skip_all_present_except_last=False)    

if __name__ == '__main__':
    main()

I am using python 2 ...

Python 2 support has been dropped as of October 2019. See https://pythonclock.org.

Stitch quarterly files to a master file

python-edgar does only one thing and does it well: getting and cleaning uncompressed quarterly index files to your computer. Use command line tools, in the spirit of unix philosophy, to stitch these index files together and create our master index file.

In this example, we called python run.py without arguments. It'll download every quarterly index file since 1993.

 python run.py -y 1993
 
2018-06-23 13:00:16,855 - DEBUG - downloads will be saved to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
2018-06-23 13:00:16,855 - DEBUG - downloading files since 1993
2018-06-23 13:00:16,856 - INFO - 102 index files to retrieve
2018-06-23 13:00:16,879 - DEBUG - worker count: 4
2018-06-23 13:00:18,814 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR4.tsv
2018-06-23 13:00:19,026 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR3.tsv
2018-06-23 13:00:19,157 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2018-QTR2.tsv
2018-06-23 13:00:19,543 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2018-QTR1.tsv
2018-06-23 13:00:20,521 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR2.tsv
2018-06-23 13:00:20,719 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR4/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR4.tsv
2018-06-23 13:00:21,016 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR3/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR3.tsv
2018-06-23 13:00:21,134 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2017-QTR1.tsv
2018-06-23 13:00:22,099 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR2/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/2016-QTR2.tsv
(...)
dcw07x6zrrr0000gn/T/tmpcF1rx7/1993-QTR2.tsv
2018-06-23 13:00:54,378 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/1993/QTR1/master.zip to /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7/1993-QTR1.tsv
2018-06-23 13:00:54,423 - INFO - complete
2018-06-23 13:00:54,424 - INFO - Files downloaded in /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7

Inspect the directory where our files where downloaded:

$ ls -lh /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
total 4964656
drwx------  104 eswiac  staff   3.3K Jun 23 13:00 .
drwxr-xr-x  342 eswiac  staff    11K Jun 23 13:01 ..
-rw-r--r--    1 eswiac  staff   585B Jun 23 13:00 1993-QTR1.tsv
-rw-r--r--    1 eswiac  staff   580B Jun 23 13:00 1993-QTR2.tsv
-rw-r--r--    1 eswiac  staff   1.0K Jun 23 13:00 1993-QTR3.tsv
-rw-r--r--    1 eswiac  staff   2.8K Jun 23 13:00 1993-QTR4.tsv
-rw-r--r--    1 eswiac  staff   2.9M Jun 23 13:00 1994-QTR1.tsv
-rw-r--r--    1 eswiac  staff   2.3M Jun 23 13:00 1994-QTR2.tsv
(...)
-rw-r--r--    1 eswiac  staff    27M Jun 23 13:00 2017-QTR3.tsv
-rw-r--r--    1 eswiac  staff    27M Jun 23 13:00 2017-QTR4.tsv
-rw-r--r--    1 eswiac  staff    41M Jun 23 13:00 2018-QTR1.tsv
-rw-r--r--    1 eswiac  staff    31M Jun 23 13:00 2018-QTR2.tsv

Head to that directory so we can merge these files into a master file using cat

$ cd  /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpcF1rx7
$ cat *.tsv > master.tsv
$ du -h master.tsv
2.3G	master.tsv

Now you have this master index file. It's not sorted but that's easy to do (hint: Look into the sort command)

Grab filings from a specific company

Now that we have downloaded the index files it becomes easy, with a bit of command line scripting, to quickly filter by company and extract URLs to the filings we want with grep . In the following example we grep by CIK (1000045), store the output in an intermediate text file, which we re-open with cat and grep again by form 10-K. Prefix the paths with https://www.sec.gov/Archives/ and you'll get the full URL.

eswiac@mbp python-edgar (master) $ grep -h 1000045 /var/folders/bv/2zbdkyyj14766dcw07x6zrrr0000gn/T/tmpvwOzOU/* > 1000045.txt
eswiac@mbp python-edgar (master) $ cat 1000045.txt | grep -h 10-K
1000045|NICHOLAS FINANCIAL INC|10-K|2015-06-15|edgar/data/1000045/0001193125-15-223218.txt|edgar/data/1000045/0001193125-15-223218-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2016-06-14|edgar/data/1000045/0001193125-16-620952.txt|edgar/data/1000045/0001193125-16-620952-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2017-06-14|edgar/data/1000045/0001193125-17-203193.txt|edgar/data/1000045/0001193125-17-203193-index.html
1000045|NICHOLAS FINANCIAL INC|10-K|2018-06-27|edgar/data/1000045/0001193125-18-205637.txt|edgar/data/1000045/0001193125-18-205637-index.html

Query the master index with `q`

https://github.com/harelba/q allows you to run SQL directly on tabular data.

Use with caution: q does not use indexes so running queries against the master index will be very slow since it's rather large. Sorting the master index or narrowing the data to a smaller subset will make search faster. Ultimately you want to load the master index file into a proper database that's able to handle the size.

Some queries you may want to try

q "SELECT COUNT(1) FROM 1999-QTR4.tsv"
q -d"|" "SELECT * FROM master.tsv where c1 = 1418091 and c3 = '10-Q' order by c4"

License

MIT

python-edgar's People

Contributors

Stargazers

Watchers

python-edgar's Issues

does not download all quarters

I have noticed that edgar.dowload_index does not download all .tsvfiles for all quarters. Some times i get all 4 quarters for, say 2015, while some other times I get only 2. See the output of my log below (it should have downloaded 26 files, but it downloaded only 20)

2021-06-29 14:57:46,763 - DEBUG - downloads will be saved to ../data/edgar
2021-06-29 14:57:46,763 - DEBUG - downloading files since 2015
2021-06-29 14:57:46,763 - INFO - 26 index files to retrieve
2021-06-29 14:57:46,763 - DEBUG - worker count: 8
2021-06-29 14:57:47,441 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR3/master.zip to ../data/edgar/2019-QTR3.tsv
2021-06-29 14:57:47,629 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR2/master.zip to ../data/edgar/2019-QTR2.tsv
2021-06-29 14:57:47,739 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2021/QTR1/master.zip to ../data/edgar/2021-QTR1.tsv
2021-06-29 14:57:47,785 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2021/QTR2/master.zip to ../data/edgar/2021-QTR2.tsv
2021-06-29 14:57:47,815 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR2/master.zip to ../data/edgar/2018-QTR2.tsv
2021-06-29 14:57:47,819 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2020/QTR4/master.zip to ../data/edgar/2020-QTR4.tsv
2021-06-29 14:57:47,860 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR4/master.zip to ../data/edgar/2019-QTR4.tsv
2021-06-29 14:57:47,937 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2019/QTR1/master.zip to ../data/edgar/2019-QTR1.tsv
2021-06-29 14:57:48,203 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.zip to ../data/edgar/2017-QTR4.tsv
2021-06-29 14:57:48,251 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2018/QTR1/master.zip to ../data/edgar/2018-QTR1.tsv
2021-06-29 14:57:48,389 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.zip to ../data/edgar/2017-QTR3.tsv
2021-06-29 14:57:48,516 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR3/master.zip to ../data/edgar/2016-QTR3.tsv
2021-06-29 14:57:48,578 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR2/master.zip to ../data/edgar/2017-QTR2.tsv
2021-06-29 14:57:48,643 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR4/master.zip to ../data/edgar/2016-QTR4.tsv
2021-06-29 14:57:48,736 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2016/QTR2/master.zip to ../data/edgar/2016-QTR2.tsv
2021-06-29 14:57:48,763 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2017/QTR1/master.zip to ../data/edgar/2017-QTR1.tsv
2021-06-29 14:57:48,914 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR4/master.zip to ../data/edgar/2015-QTR4.tsv
2021-06-29 14:57:48,939 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR3/master.zip to ../data/edgar/2015-QTR3.tsv
2021-06-29 14:57:49,108 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR2/master.zip to ../data/edgar/2015-QTR2.tsv
2021-06-29 14:57:49,217 - INFO - > downloaded https://www.sec.gov/Archives/edgar/full-index/2015/QTR1/master.zip to ../data/edgar/2015-QTR1.tsv
2021-06-29 14:57:49,230 - INFO - complete
2021-06-29 14:57:49,231 - INFO - Files downloaded in ../data/edgar

Error Message in Windows with Python 2.7

Getting Error Message

[Error 2] No such file or directory: "tmp/edgar.daily-index.master.20140417.idx"

downloaded files are empty

How to download all 10-k filings for 2019?

failing to install via pip

If i try to install library via pip , in setup.py it it not able to find README.md

HTTP Error 404

Hello, when running the command:

edgar.download_index(path, year, user_agent, skip_all_present_except_last=False)

I'm repeatedly getting the following error:

urllib.error.HTTPError: HTTP Error 404: Not Found

It was working fine for several months but suddenly started breaking down yesterday.

Could you please check?

Thanks!

Spelling mistake

In the description of the repo, it says "SEC fillings" instead of "SEC filings".

fail to write to ASCII file due to encoding problem

I found that the downloaded "2011-QTR4.tsv" and "2017-QTR3.tsv" are empty inside. This seems to be related to issue #3 , while the solution in #3 simply ignored the error instead of fixing it thoroughly. After checking with the source code, I found the problem again comes from the encoding problem, as shown in the below picture.

Since the error message pops up if we write to ASCII file, I tried to fix it by explicitly indicating that I want to write to UTF8 file, as shown in this stackoverflow page (https://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python). And it works! Specifically, I replace with open(dest+dest_name, 'w+') as idxfile: with with codecs.open(dest+dest_name, 'w+', 'utf-8') as idxfile. I will submit a pull request for this later if you don't mind. Thanks.

retrieve daily updates

Is your feature request related to a problem? Please describe.
Would like to be able to get only new filings without retrieving the old ones

Describe the solution you'd like
Would like to be able to get only new filings without retrieving the old ones

TypeError: expected string or bytes-like object

I am not being able to use the new updated package python-edgar==3.1.3, since it is producing some data type errors. When running

      1 import edgar
      2 edgar.download_index('../data', 1993, USER_AGENT, skip_all_present_except_last=False)

The following error appears:
1 import edgar
----> 2 edgar.download_index('../data', 1993, USER_AGENT, skip_all_present_except_last=False)

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:133, in download_index(dest, since_year, user_agent, skip_all_present_except_last)
131 # naive: 200ms or 5QPS serialized
132 start = _get_millis()
--> 133 _download(file, dest, skip_file, user_agent)
134 elapsed = _get_millis() - start
135 if elapsed < REQUEST_BUDGET_MS:

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:97, in _download(file, dest, skip_file, user_agent)
95 if url.endswith("zip"):
96 with tempfile.TemporaryFile(mode="w+b") as tmp:
---> 97 tmp.write(_url_get(url, user_agent))
98 with zipfile.ZipFile(tmp).open("master.idx") as z:
99 with io.open(dest + dest_name, "w+", encoding="utf-8") as idxfile:

File ~/miniforge3/envs/pruebas/lib/python3.10/site-packages/edgar/main.py:69, in _url_get(url, user_agent)
67 hdr = { 'User-Agent' : user_agent }
68 req = urllib.request.Request(url, headers=hdr)
---> 69 content =urllib.request.urlopen(req).read()
70 else:
71 # python 2
72 import urllib2

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
214 else:
215 opener = _opener
--> 216 return opener.open(url, data, timeout)

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
516 req = meth(req)
518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
521 # post-process response
522 meth_name = protocol+"_response"

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:536, in OpenerDirector._open(self, req, data)
533 return result
535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
537 '_open', req)
538 if result:
539 return result

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
494 for handler in handlers:
495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
497 if result is not None:
498 return result

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
1390 def https_open(self, req):
-> 1391 return self.do_open(http.client.HTTPSConnection, req,
1392 context=self._context, check_hostname=self._check_hostname)

File ~/miniforge3/envs/pruebas/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1346 try:
1347 try:
-> 1348 h.request(req.get_method(), req.selector, req.data, headers,
1349 encode_chunked=req.has_header('Transfer-encoding'))
1350 except OSError as err: # timeout error
1351 raise URLError(err)

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1279 def request(self, method, url, body=None, headers={}, *,
1280 encode_chunked=False):
1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1323, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1320 encode_chunked = False
1322 for hdr, value in headers.items():
-> 1323 self.putheader(hdr, value)
1324 if isinstance(body, str):
1325 # RFC 2616 Section 3.7.1 says that text default has a
1326 # default charset of iso-8859-1.
1327 body = _encode(body, 'body')

File ~/miniforge3/envs/pruebas/lib/python3.10/http/client.py:1259, in HTTPConnection.putheader(self, header, *values)
1256 elif isinstance(one_value, int):
1257 values[i] = str(one_value).encode('ascii')
-> 1259 if _is_illegal_header_value(values[i]):
1260 raise ValueError('Invalid header value %r' % (values[i],))
1262 value = b'\r\n\t'.join(values)

TypeError: expected string or bytes-like object

Other info:

OS: iOS
Chip: M1 Pro
Environment: miniforge3
python 3.10
python-edgar 3.1.3

FTP closed on SEC EDGAR as of 2016-12-30

"On December 30, 2016, FTP services for retrieving EDGAR filing documents were permanently retired."

https://www.sec.gov/edgar/searchedgar/accessing-edgar-data.htm

This is unfortunate because this was a useful module.

HTTPError: Forbidden

Hello!
I'm not expert, so I hope I'll be clear.

With "edgar.download_index" method, the following error returns:
HTTPError: Forbidden

I can reproduce the error even just by running the code shown in the "Common issues" paragraph of the documentation (adding the user_agent)
def main(): import edgar user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36 Edg/86.0.622.51' edgar.download_index(".", 2015, user_agent, skip_all_present_except_last=False) if __name__ == '__main__': main()

I obtain this

I used an older version, which started to give some issues, as missing quarters, and so on.
I updated to the latest version, and this is what's happening.
Also updated requests, but still "erroring".

Desktop (please complete the following information):

OS: Win 10
Python 3.7.11 (the screenshot is with 3.7.3, but with the .11 is the same)

In any way:
thank you for the support, but first, thank you for the package.

Some files fail to download due to encoding errors under python3

I've noticed that specifically 2011-QTR4.tsv and 2017-QTR3.tsv end up having zero byte size after downloading using python3. I've looked through the code and traced this down to encoding error since apparently these files contain something that doesn't fit in utf-8. Running in python2 works fine because apparently it swept some errors under the carpet, which python3 doesn't anymore.

There is a good description here of what changed between python versions:

I've managed to fix it by changing to latin-1, which behaves like python2. Not sure if this is the desired behavior though because errors are ignored and data may be corrupted.
lines = str(lines, "latin-1")

last line of downloaded files doesn't end with a new line

Hence command cat *.tsv > master.tsv doesn't treat the last line as a proper line and concatenates it with the first line of the next file.

Asking about features this offers vs other Edgar tools

I'm new to Edgar filings so I'm trying to understand the difference between
edgarminers/python-edgar updated 3 months ago, 265 stars, 24 waters, 62 forks

sec-edgar-downloader updated 5 months ago, 265 stars, 11 Watchers, 75 Forks

pyedgar updated 8 months ago 27 stars, 3 Watchers, 11 Forks

Should they work together or are they similar competitors but sibling rivals ?
There are other EDGAR tools and I'm trying to understand how to evaluate them

Describe the solution you'd like
I was looking for a single solution that also worked with pandas, perhaps a scraper, that saves to a database but i'm again still researching edgar and that data impacting my models. I work with a ton of data right now at about 17 tb domestically so I would be a heavy user and would like to contribute if its a fit.

Some of them have search functions others do not
Search by stock ticker
Search by Central Index Key (CIK)

No Search by SEC CIK SEC CIK lookup tool if you cannot find an appropriate ticker

Search for 351 filings - list here

Describe alternatives you've considered
Rolled my own based off what I found online but your script would solve some issues.

Additional context
Pandas integration, database advice

Oops! We can't find this file

Please to input https://www.sec.gov/Archives/ in your browser,。

Oops! We can't find this file！

It seems like that SEC change web diretroy in sec.gov.
How can use your python-edgar ?

Cannot merge tsv files

I downloaded the files but when I tried to merge them into one master file following your instruction, the code kept running and the master file kept getting bigger. I'm not sure what I did wrong 'cause I tried to replicate what you did in the example. Could you please help me with this?

Runtime error on Windows

Describe the bug

Cannot start python-edgar due to Runtime error, possibly in threading.

To Reproduce

Run

import edgar
edgar.download_index(".", 2020, skip_all_present_except_last=False)

Expected behavior

Downloads files into current directory.

Screenshots

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "D:\Anaconda3\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "D:\Anaconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "D:\Anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\\u0415\u0432\u0433\u0435\u043d\u0438\u0439\Desktop\edgar\edgar1.py", line 2, in <module>
    edgar.download_index(".", 2020, skip_all_present_except_last=False)
  File "D:\Anaconda3\lib\site-packages\edgar\main.py", line 132, in download_index
    pool = multiprocessing.Pool(worker_count)
  File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 212, in __init__
    self._repopulate_pool()
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "D:\Anaconda3\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "D:\Anaconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\Anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Desktop (please complete the following information):

Windows 10

Documentation updation

While calling the python-edgar module, we'll get a destination error.

Before:
edgar.download_index(download_directory, since_year, user_agent, skip_all_present_except_last=False)

After:
edgar.download_index(dest, since_year, user_agent, skip_all_present_except_last=False)

It should be like this.

RuntimeError for windows

RuntimeError occurs when runnining the edgar.download_index(dir, year)

This seems to be an error due to the threading on windows. See explanation here: https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

HTTP Error 403: Forbidden

Hello,

My user agent is:

user_agent = "FTC [email protected]"

and the download command is

edgar.download_index(path, 2021, user_agent, skip_all_present_except_last=False)

This is in accordance with the SEC's new fair usage standards.

Yet, we've started to receive the following error:

urllib.error.HTTPError: HTTP Error 403: Forbidden

Please help.