-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lists: Big ass change #287
Merged
Merged
Changes from all commits
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
961219a
Initial commit
perfectly-preserved-pie a27f3c8
Add a function to webscrape TLA
perfectly-preserved-pie e7e98a6
Add a function to use The Agency API to get property details
perfectly-preserved-pie 3847346
Fall back to fetching property data from The Agency if it doesn't exi…
perfectly-preserved-pie 72f260f
Refactor fetch_the_agency_data to include row_index and total_rows pa…
perfectly-preserved-pie fa8b0cd
Fetch the first property listing image from TA
perfectly-preserved-pie ed34e4e
fetch and transform the first property listing image from The Agency API
perfectly-preserved-pie 0e839e4
Return an additional None
perfectly-preserved-pie 6193a17
Move webscraping logic into its own function
perfectly-preserved-pie ffbcf59
Add logging
perfectly-preserved-pie 1e16c6b
Rename parking key
perfectly-preserved-pie 987ccbe
Clean up lease dataset. Rename columns to match the new lists'. Drop …
perfectly-preserved-pie 1ea0421
Change column dtypes to the optimal dtype
perfectly-preserved-pie f47d304
Change pet_policy and laundry and subtype to category dtype
perfectly-preserved-pie 718947b
Cast dtypes of new list to (hopefully) match those of the old dataset
perfectly-preserved-pie dae6a9b
Rename "Garage Spaces" to "Parking Spaces" in popup.js
perfectly-preserved-pie c873f51
Convert senior_community into boolean
perfectly-preserved-pie fc9ba9a
Normalize 'Sqft' column name to 'sqft' in LeaseFilters and BuyFilters
perfectly-preserved-pie c8dece3
Rename 'garage_spaces' to 'parking_spaces' in LeaseFilters for consis…
perfectly-preserved-pie 4c48ccf
Rename 'PetsAllowed' to 'pet_policy' in LeaseFilters for consistency
perfectly-preserved-pie 754d66e
Normalize 'Furnished' column name to 'furnished' in LeaseFilters for …
perfectly-preserved-pie b73dc94
Normalize 'DepositSecurity' column name to 'security_deposit' in Leas…
perfectly-preserved-pie e99ef62
Normalize 'DepositPets' column name to 'pets_deposit' in LeaseFilters…
perfectly-preserved-pie f0aa22d
Normalize 'DepositKey' column name to 'key_deposit' in LeaseFilters f…
perfectly-preserved-pie d8dd2f2
Normalize 'DepositOther' column name to 'other_deposit' in LeaseFilte…
perfectly-preserved-pie e3e18f9
Normalize 'Terms' column name to 'terms' in LeaseFilters for consistency
perfectly-preserved-pie 39f42b7
Normalize 'LaundryCategory' column name to 'laundry' in LeaseFilters …
perfectly-preserved-pie cba3864
Change latitude and longitude colum nmnames
perfectly-preserved-pie caa5151
Add .venv/ to .gitignore to exclude virtual environment files
perfectly-preserved-pie b62a828
Change slider parameter types in sqft_radio_button method from float …
perfectly-preserved-pie 71f253a
Normalize 'YrBuilt' column name to 'year_built' in LeaseFilters for c…
perfectly-preserved-pie d658b0c
Change slider parameter types in ppsqft_radio_button method from floa…
perfectly-preserved-pie 9b62e8d
Refactor map update logic to use new column names
perfectly-preserved-pie 8d50813
Refactor the rest of the lease filters
perfectly-preserved-pie 4393e57
Normalize column names and update references in LeaseComponents for c…
perfectly-preserved-pie c66186c
Revert buy components to how they were before
perfectly-preserved-pie 7c7a516
Update column names in LeaseComponents and lease_page for consistency
perfectly-preserved-pie a59d704
Update property details in popup.js to use MLS number, MLS photo, and…
perfectly-preserved-pie 1936ecb
Update popup.js to display total bathrooms instead of individual bath…
perfectly-preserved-pie bfd76a6
Capitalize "Bedrooms" in the bedrooms slider header
perfectly-preserved-pie a35e49d
Remove check for 'ppsqft' column in lease_dataframe.py
perfectly-preserved-pie decac21
Change full street address column to use StringDtype in lease_datafra…
perfectly-preserved-pie 3992052
Refactor zip code retrieval in geocoding_utils.py and update zipcode …
perfectly-preserved-pie b28a36a
Refactor data type handling in lease_dataframe.py: clean numeric colu…
perfectly-preserved-pie af4f3b2
Refactor agency data handling in update_dataframe_with_listing_data: …
perfectly-preserved-pie a73fc34
Improve error handling and logging for JSON parsing in fetch_the_agen…
perfectly-preserved-pie 44b73ee
Enhance JSON parsing and error handling in fetch_the_agency_data: add…
perfectly-preserved-pie d40c936
Enhance MLS number matching in fetch_the_agency_data: implement fuzzy…
perfectly-preserved-pie c31955a
Sort imports alphabetically
perfectly-preserved-pie 1dcabb6
Change log level from debug to info for MLS number matching in fetch_…
perfectly-preserved-pie a2fd701
Fix not being able to find the property image src
perfectly-preserved-pie eb86ce7
Use regex to find the correct property based on street name
perfectly-preserved-pie 422619c
Refactor fetch_the_agency_data function:
perfectly-preserved-pie 55fab2d
DOCSTRINGS BABY!!!!!
perfectly-preserved-pie 9e20dc6
Minor edit in the docstring
perfectly-preserved-pie 455adaa
Minor edit again dammit
perfectly-preserved-pie 7f86b38
Refactor logging in fetch_the_agency_data and update_dataframe_with_l…
perfectly-preserved-pie d013b8a
Update docstring for fetch agency data
perfectly-preserved-pie 21741da
Remove dead comments
perfectly-preserved-pie 17b86e1
Preliminary function to remove expired listings on The Agency
perfectly-preserved-pie d83863a
Sort imports alphabetically
perfectly-preserved-pie 0d72883
Consolidate expired listings check into one function
perfectly-preserved-pie b55a18e
Remove unneeded imports
perfectly-preserved-pie 88231d5
Move The Agency removal check to webscraping_utils and change the log…
perfectly-preserved-pie d53892e
Remove unneeded imports
perfectly-preserved-pie ab2c1c6
Ensure listing_url and mls_number are strings in remove_inactive_list…
perfectly-preserved-pie 0c1861b
Refactor check_expired_listing_bhhs to use synchronous requests and i…
perfectly-preserved-pie 42411cc
Fix wrong expired listing check message on BHHS
perfectly-preserved-pie 74ce223
Use specific domains for removing inactive listings
perfectly-preserved-pie 75f1f6a
Refactor web scraping functions to use synchronous requests and impro…
perfectly-preserved-pie 935910d
Fix categorize_laundry_features to handle NaN values using pd.isna
perfectly-preserved-pie 3805db8
New lease dataset for 10/21
perfectly-preserved-pie 5a2e118
Enhance rental terms checklist to handle 'Unknown' category and impro…
perfectly-preserved-pie 2949f19
Drop old/redundant columns
perfectly-preserved-pie 48abd93
Remove redundant bedrooms_bathrooms field from lease map data
perfectly-preserved-pie c14ae1d
Changing column dtypes
perfectly-preserved-pie ded1f13
Drop a row with wild ass bedrooms/bathrooms. Fuck this i'm not dealin…
perfectly-preserved-pie b30e34e
Drop another row with fucked up sqft
perfectly-preserved-pie 6422e7c
Remove trailing .0 in zipcode
perfectly-preserved-pie 2ea2313
Remove trailing .0 in full_street_address
perfectly-preserved-pie 4dea27a
Fix missing city and zipcode
perfectly-preserved-pie 339618c
fix city
perfectly-preserved-pie faebf07
Cast 'sqft' to UInt32 and update numeric columns to use UInt16Dtype
perfectly-preserved-pie 64a86a8
Copy missing values from their old column counterpart, set dtypes, ma…
perfectly-preserved-pie 48287ef
Merge pull request #285 from perfectly-preserved-pie/new-lists
perfectly-preserved-pie 477de27
Better way of installing uv?
perfectly-preserved-pie eda2cdb
Remove dead code, comment out curl (do i even need this?)
perfectly-preserved-pie 8b17506
Add non-root user to Dockerfile and update permissions
perfectly-preserved-pie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ __pycache__/larentals.cpython-310.pyc | |
*.csv | ||
*.pyc | ||
*.xlsx | ||
.venv/ | ||
env | ||
hdf | ||
larentals-checkpoint.py | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,31 @@ | ||
FROM python:3.11-slim | ||
|
||
COPY requirements.txt . | ||
# Set the working directory | ||
WORKDIR /app | ||
|
||
# Install curl | ||
RUN apt-get update && apt-get install -y curl | ||
# Switch to root user to install dependencies | ||
USER root | ||
|
||
# Using uv to install packages because it's fast as fuck boiiii | ||
# https://www.youtube.com/watch?v=6E7ZGCfruaw | ||
# https://ryxcommar.com/2024/02/15/how-to-cut-your-python-docker-builds-in-half-with-uv/ | ||
ADD --chmod=655 https://astral.sh/uv/install.sh /install.sh | ||
RUN /install.sh && rm /install.sh | ||
RUN /root/.cargo/bin/uv pip install --system --no-cache -r requirements.txt | ||
# Create the nonroot user and set permissions | ||
RUN adduser --disabled-password --gecos "" nonroot && chown -R nonroot /app | ||
|
||
COPY . ./ | ||
# Copy everything into the working directory | ||
COPY . /app | ||
|
||
# Copy uv binary directly from the UV container image | ||
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv | ||
|
||
# Install dependencies directly into the system environment using uv | ||
RUN uv pip install --system --no-cache-dir -r requirements.txt | ||
|
||
# Switch back to non-root user | ||
USER nonroot | ||
|
||
# Install curl (if needed, uncomment this line) | ||
# RUN apt-get update && apt-get install -y curl | ||
|
||
# Run the app using gunicorn. | ||
# Expose the port gunicorn is listening on (80). | ||
# Set the number of workers to 10. | ||
# Preload the app to avoid the overhead of loading the app for each worker. See https://www.joelsleppy.com/blog/gunicorn-application-preloading/ | ||
# Set the app to be the server variable in app.py. | ||
CMD ["gunicorn", "-b", "0.0.0.0:80", "-k", "gevent", "--workers=10", "--preload", "app:server"] | ||
# Preload the app to avoid the overhead of loading the app for each worker. | ||
CMD ["gunicorn", "-b", "0.0.0.0:80", "-k", "gevent", "--workers=10", "--preload", "app:server"] |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,39 +66,39 @@ def fetch_missing_city(address: str, geolocator: GoogleV3) -> Optional[str]: | |
|
||
return city | ||
|
||
def return_postalcode(address: str, geolocator: GoogleV3) -> Optional[Union[int, type(pd.NA)]]: | ||
def return_zip_code(address: str, geolocator: GoogleV3) -> Optional[str]: | ||
""" | ||
Fetches the postal code for a given short address using forward and reverse geocoding. | ||
Fetches the postal code for a given address using geocoding. | ||
|
||
Parameters: | ||
address (str): The short address. | ||
geolocator (GoogleV3): An instance of a GoogleV3 geocoding class. | ||
address (str): The full street address. | ||
geolocator (GoogleV3): An instance of the GoogleV3 geocoding class. | ||
|
||
Returns: | ||
Optional[Union[int, type(pd.NA)]]: The postal code as an integer, or pd.NA if unsuccessful. | ||
Optional[str]: The postal code as a string, or None if unsuccessful. | ||
""" | ||
# Initialize postalcode variable | ||
postalcode = None | ||
|
||
try: | ||
geocode_info = geolocator.geocode(address, components={'administrative_area': 'CA', 'country': 'US'}) | ||
components = geolocator.geocode(f"{geocode_info.latitude}, {geocode_info.longitude}").raw['address_components'] | ||
|
||
# Create a dataframe from the list of dictionaries | ||
components_df = pd.DataFrame(components) | ||
|
||
# Iterate through rows to find the postal code | ||
for row in components_df.itertuples(): | ||
if row.types == ['postal_code']: | ||
postalcode = int(row.long_name) | ||
|
||
logger.info(f"Fetched postal code {postalcode} for {address}.") | ||
except AttributeError: | ||
logger.warning(f"Geocoding returned no results for {address}.") | ||
return pd.NA | ||
geocode_info = geolocator.geocode( | ||
address, components={'administrative_area': 'CA', 'country': 'US'} | ||
) | ||
if geocode_info: | ||
raw = geocode_info.raw['address_components'] | ||
# Find the 'postal_code' | ||
postalcode = next( | ||
(addr['long_name'] for addr in raw if 'postal_code' in addr['types']), | ||
None | ||
) | ||
if postalcode: | ||
logger.info(f"Fetched zip code ({postalcode}) for {address}.") | ||
else: | ||
logger.warning(f"No postal code found in geocoding results for {address}.") | ||
else: | ||
logger.warning(f"Geocoding returned no results for {address}.") | ||
except Exception as e: | ||
logger.warning(f"Couldn't fetch postal code for {address} because {e}.") | ||
return pd.NA | ||
logger.warning(f"Couldn't fetch zip code for {address} because of {e}.") | ||
postalcode = None | ||
|
||
return postalcode | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
Copilot Autofix AI about 2 months ago
To fix the problem, we should parse the URL and check the hostname instead of using a substring check. This ensures that the check is performed on the actual host part of the URL, preventing bypasses through embedding the allowed host in an unexpected location.
The best way to fix this is to use the
urlparse
function from theurllib.parse
module to extract the hostname from the URL and then check if it matches the allowed host. This change should be made in theremove_inactive_listings
function.