-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lists: Big ass change #287
Conversation
…eFilters for consistency
…rs for consistency
…to int for consistency
…mprove error handling
…ve error handling; enhance listing expiration checks for BHHS and The Agency
…ve NaN value handling
…n a bunch of other shit fuck this
Add support for new list format, new dataset from 10/21, other fixes
mls_number = str(getattr(row, 'mls_number', '')) | ||
|
||
# Check if the listing is expired on BHHS | ||
if 'bhhscalifornia.com' in listing_url: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
bhhscalifornia.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI about 2 months ago
To fix the problem, we should parse the URL and check the hostname instead of using a substring check. This ensures that the check is performed on the actual host part of the URL, preventing bypasses through embedding the allowed host in an unexpected location.
The best way to fix this is to use the urlparse
function from the urllib.parse
module to extract the hostname from the URL and then check if it matches the allowed host. This change should be made in the remove_inactive_listings
function.
-
Copy modified line R4 -
Copy modified lines R31-R32 -
Copy modified line R38
@@ -3,2 +3,3 @@ | ||
from loguru import logger | ||
from urllib.parse import urlparse | ||
import asyncio | ||
@@ -29,3 +30,4 @@ | ||
# Check if the listing is expired on BHHS | ||
if 'bhhscalifornia.com' in listing_url: | ||
parsed_url = urlparse(listing_url) | ||
if parsed_url.hostname == 'bhhscalifornia.com': | ||
is_expired = check_expired_listing_bhhs(listing_url, mls_number) | ||
@@ -35,3 +37,3 @@ | ||
# Check if the listing is expired on The Agency | ||
elif 'theagencyre.com' in listing_url: | ||
elif parsed_url.hostname == 'theagencyre.com': | ||
is_sold = check_expired_listing_theagency(listing_url, mls_number) |
indexes_to_drop.append(row.Index) | ||
logger.success(f"Removed MLS {mls_number} (Index: {row.Index}) from the DataFrame because the listing has expired on BHHS.") | ||
# Check if the listing is expired on The Agency | ||
elif 'theagencyre.com' in listing_url: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
theagencyre.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI about 2 months ago
To fix the problem, we need to parse the URL and check the hostname to ensure it matches the expected domain. This can be done using the urlparse
function from the urllib.parse
module. Specifically, we will:
- Parse the
listing_url
to extract the hostname. - Check if the hostname matches the expected domain (
bhhscalifornia.com
ortheagencyre.com
).
This approach ensures that the check is performed on the actual hostname, preventing bypasses through substring manipulation.
-
Copy modified lines R30-R31 -
Copy modified line R37
@@ -29,3 +29,4 @@ | ||
# Check if the listing is expired on BHHS | ||
if 'bhhscalifornia.com' in listing_url: | ||
parsed_url = urlparse(listing_url) | ||
if parsed_url.hostname == 'bhhscalifornia.com': | ||
is_expired = check_expired_listing_bhhs(listing_url, mls_number) | ||
@@ -35,3 +36,3 @@ | ||
# Check if the listing is expired on The Agency | ||
elif 'theagencyre.com' in listing_url: | ||
elif parsed_url.hostname == 'theagencyre.com': | ||
is_sold = check_expired_listing_theagency(listing_url, mls_number) |
uv
in the container