Add public function for retrieving filing URLs without downloading #32

mksamelson · 2020-03-07T17:56:28Z

Would be nice to be able to access the files on-line for scraping as opposed to downloading them all. A feature for just returning filing URLs would be handy

jadchaar · 2020-03-07T18:11:28Z

Hey @mksamelson, thanks for reaching out and using the tool!

I actually have an internal utility function that does exactly what you are requesting:

env ❯ python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sec_edgar_downloader._utils import get_filing_urls_to_download
>>> get_filing_urls_to_download("10-K", "AAPL", 20, "2010-12-31", "2019-12-31", False)
[FilingMetadata(filename='0000320193-19-000119.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119.txt'), FilingMetadata(filename='0000320193-18-000145.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/0000320193-18-000145.txt'), FilingMetadata(filename='0000320193-17-000070.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019317000070/0000320193-17-000070.txt'), FilingMetadata(filename='0001628280-16-020309.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000162828016020309/0001628280-16-020309.txt'), FilingMetadata(filename='0001193125-15-356351.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351.txt'), FilingMetadata(filename='0001193125-14-383437.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312514383437/0001193125-14-383437.txt'), FilingMetadata(filename='0001193125-13-416534.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312513416534/0001193125-13-416534.txt'), FilingMetadata(filename='0001193125-12-444068.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312512444068/0001193125-12-444068.txt'), FilingMetadata(filename='0001193125-11-282113.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312511282113/0001193125-11-282113.txt'), FilingMetadata(filename='0001193125-10-238044.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312510238044/0001193125-10-238044.txt')]

The function sec_edgar_downloader._utils.get_filing_urls_to_download returns a list of FilingMetadata objects, which contain the URL you are looking for. The parameters and interface are exactly the same as the get method, but all parameters are required. Since this is an internal method, I have not gotten around to putting a docstring on it.

Let me know if this helps, or if you would like to see something different implemented in a future release!

mksamelson · 2020-03-07T18:24:50Z

Thanks this is helpful. It would be great in a future release if you could have a utility that provided URLs of other file formats. Your utility accesses the *.txt document (full filing). If there is a way to 1. list the URLs and 2. download html and xml files that would be great.

The image below show the file you reference (circled in red). The file types highlighted in yellow are also very useful.

jadchaar · 2020-03-07T18:34:31Z

Your request has been noted! This is actually quite related to #31. When I get a free moment, I will work toward adding this feature!

Originally I created this tool for text parsing purposes, but I have seen a nice influx of users requesting the ability to download XML and HTML versions as well, so this will hopefully be the next feature I work on!

mksamelson · 2020-03-07T18:36:59Z

Thanks.

Just for additional clarity, the txt files have html tags but often have a lot of other junk that causes issues when trying to use an html/xml parser. So you usually have to resort to regular expressions to parse. However, the raw html and xml files don't have this issue.

jadchaar · 2020-03-07T18:51:17Z

Thanks for letting me know and thanks for finding a regex workaround in the meantime :).

jadchaar · 2021-01-18T04:03:53Z

v4 of this package will add the ability to download XML and HTML filing details in addition to the full submission TXT: #52. I still need to make a public facing function for obtaining the URLs without downloading, but the utility function can still serve this purpose until a public function on the Downloader class is added.

jadchaar · 2021-05-09T04:58:00Z

Another user requested this functionality in an email to me:

I don't use it to download files. Instead, I use it to generate the full_submission_url, and save the urls. i.e., I modified the Downloader() function so that it returns the filings_to_fetch FilingMetadata object.

As such, I'm wondering, in future versions of sec-edgar-download, can you add an option to return the FilingMetadata object filings_to_fetch?

jadchaar added the enhancement New feature or request label Mar 7, 2020

jadchaar changed the title ~~Ability to Just Pull Filing URLs~~ Ability to download XML and HTML filing data and retrieve corresponding URLs Mar 7, 2020

jadchaar mentioned this issue Apr 1, 2020

Different format report compare CIK 1067983 vs 1541617 #34

Closed

jadchaar changed the title ~~Ability to download XML and HTML filing data and retrieve corresponding URLs~~ Add public function for retrieving filing URLs without downloading Jan 18, 2021

jreed1701 mentioned this issue May 15, 2021

Include downloading of XBRL zip data and return FilingMetaData. #76

Closed

jadchaar added this to the v5 milestone Jan 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add public function for retrieving filing URLs without downloading #32

Add public function for retrieving filing URLs without downloading #32

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020 •

edited

Loading

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020 •

edited

Loading

jadchaar commented Jan 18, 2021

jadchaar commented May 9, 2021

Add public function for retrieving filing URLs without downloading #32

Add public function for retrieving filing URLs without downloading #32

Comments

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020 • edited Loading

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020

mksamelson commented Mar 7, 2020

jadchaar commented Mar 7, 2020 • edited Loading

jadchaar commented Jan 18, 2021

jadchaar commented May 9, 2021

jadchaar commented Mar 7, 2020 •

edited

Loading

jadchaar commented Mar 7, 2020 •

edited

Loading