-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public function for retrieving filing URLs without downloading #32
Comments
Hey @mksamelson, thanks for reaching out and using the tool! I actually have an internal utility function that does exactly what you are requesting: env ❯ python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sec_edgar_downloader._utils import get_filing_urls_to_download
>>> get_filing_urls_to_download("10-K", "AAPL", 20, "2010-12-31", "2019-12-31", False)
[FilingMetadata(filename='0000320193-19-000119.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119.txt'), FilingMetadata(filename='0000320193-18-000145.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/0000320193-18-000145.txt'), FilingMetadata(filename='0000320193-17-000070.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000032019317000070/0000320193-17-000070.txt'), FilingMetadata(filename='0001628280-16-020309.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000162828016020309/0001628280-16-020309.txt'), FilingMetadata(filename='0001193125-15-356351.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351.txt'), FilingMetadata(filename='0001193125-14-383437.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312514383437/0001193125-14-383437.txt'), FilingMetadata(filename='0001193125-13-416534.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312513416534/0001193125-13-416534.txt'), FilingMetadata(filename='0001193125-12-444068.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312512444068/0001193125-12-444068.txt'), FilingMetadata(filename='0001193125-11-282113.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312511282113/0001193125-11-282113.txt'), FilingMetadata(filename='0001193125-10-238044.txt', url='https://www.sec.gov/Archives/edgar/data/320193/000119312510238044/0001193125-10-238044.txt')] The function Let me know if this helps, or if you would like to see something different implemented in a future release! |
Thanks this is helpful. It would be great in a future release if you could have a utility that provided URLs of other file formats. Your utility accesses the *.txt document (full filing). If there is a way to 1. list the URLs and 2. download html and xml files that would be great. The image below show the file you reference (circled in red). The file types highlighted in yellow are also very useful. |
Your request has been noted! This is actually quite related to #31. When I get a free moment, I will work toward adding this feature! Originally I created this tool for text parsing purposes, but I have seen a nice influx of users requesting the ability to download XML and HTML versions as well, so this will hopefully be the next feature I work on! |
Thanks. Just for additional clarity, the txt files have html tags but often have a lot of other junk that causes issues when trying to use an html/xml parser. So you usually have to resort to regular expressions to parse. However, the raw html and xml files don't have this issue. |
Thanks for letting me know and thanks for finding a regex workaround in the meantime :). |
v4 of this package will add the ability to download XML and HTML filing details in addition to the full submission TXT: #52. I still need to make a public facing function for obtaining the URLs without downloading, but the utility function can still serve this purpose until a public function on the |
Another user requested this functionality in an email to me:
|
Would be nice to be able to access the files on-line for scraping as opposed to downloading them all. A feature for just returning filing URLs would be handy
The text was updated successfully, but these errors were encountered: