Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverse colorbar #65

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

reverse colorbar #65

wants to merge 6 commits into from

Conversation

Makhsuda
Copy link
Collaborator

No description provided.

@Makhsuda Makhsuda requested a review from chendaniely June 25, 2020 00:02
@chendaniely
Copy link
Member

I can get the plot to work, but I think it's best to change up the code so that we put in a place holder for the date. I think the dashboard can go and handle the animation instead of plotly directly. See code comment for changes I made to make the iteration process faster

Copy link
Member

@chendaniely chendaniely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change you code to this, it should at least plot faster...


# plt.show()
# color_map = plt.cm.get_cmap('viridis')
# reversed_viridis = color_map.reversed()


fig = px.choropleth(molten_df,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a dataframe that was just a subset of a particular date, and then used that subseted dataframe to plot the figure

plot_data = molten_df[molten_df.date_iso == '2020-02-01']
fig = px.choropleth(plot_data,
                    geojson=counties,
                    locations=plot_data.fips_str,
                    color='value',
                    #animation_frame='date',
                    hover_data=['State', 'value'],
                    color_continuous_scale='viridis_r',
                    range_color=(0, 300),
                    scope="usa",
                    title='Confirmed cases',
                    labels={'value': 'confirmed cases'}
                    )

@@ -34,7 +34,10 @@
molten_df['date_iso'] = pd.to_datetime(molten_df['date'], format="%m/%d/%y") # change date to ISO8601 standard format

fips = molten_df['fips_str'].tolist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the below changes, you don't need this line anymore since you're passing in the column of values into the plotting function


confirmed_df = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/'
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
pop_df = pd.read_excel(here('./data/db/original/maps/PopulationEstimates.xls')) # population dataset for 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this dataset come from?

Comment on lines 19 to 21
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
pop_df = pd.read_excel(here('./data/db/original/maps/PopulationEstimates.xls')) # population dataset for 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should provide a download link to where you got these datasets from.


molten_pop_df = pd.merge(molten_df, pop_df, on='fips_str') # add population per county
grouped_by = molten_pop_df.groupby(['fips_str', 'date_iso', 'Admin2', 'POP_ESTIMATE_2019'])['value'].sum().reset_index()
grouped_by['value'] = grouped_by['value']/grouped_by['POP_ESTIMATE_2019'] # get per capita value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't overwrite the original 'value' column. you should make a new column (in this case something like 'total_per_cap') that is assigned the per capita value

color_continuous_scale="Viridis",
range_color=(0, 300),
color_continuous_scale='viridis_r',
range_color=(0, 500),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose 500? can we set this to something like max(per_cap) and use a variable instead of hard-coding a value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you are right and I am working on that. I was thinking of putting there the third quartile as 75%, cause when I am taking the max value, which is for New York, it is much higher than other states and that's why it gets a bit wrong coloring. I tried to use quartile's fuction, but range_color didn't accept my input. The same goes with per capita case, but there it shows another state with the highest cases number, which is very strange, so I am assuming that I might be doing wrong calculations

Comment on lines +38 to +46
'''
# ax = sns.lineplot(x="date_iso", y="value", hue='Province_State', data=grouped_counts) # show cases per state monthly
# ax = sns.stripplot(x="date_iso", y="value", hue='Province_State', data=grouped_counts)
# ax = sns.violinplot(x='date_iso', y='value', hue='Province_State', data=grouped_counts, palette="Set2", split=True,
# scale="count", inner="quartile")
# ax = sns.countplot(x="date_iso", hue='Province_State', data=grouped_counts) # works better if there are certain dates
# plt.tight_layout()
# plt.show()
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you comment these out? we could also add general values into the dashboard too

# animation_frame='date',
hover_data=['Admin2', 'value', 'POP_ESTIMATE_2019'],
color_continuous_scale='viridis_r',
range_color=(0, plot_data['value'].max()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you use the new column variable name make sure you change this as well.




'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should end with a new line

Also. might be worth having the raw count, and also the per-capita count as a toggle between the maps.
Since the only real difference between the plotting code is which column you're using to plot, we can make a function that takes a dataframe, and plotting column as input and returns the plot.

Would be able to use the function to return both plots that we would feed into the dashboard.

# TODO: See if rate is changing, counts over time (a 14 day sliding window count)
# Choropleth map with time slider and hover text
# TODO: Try to merge PopulationEstimates.xls to confirmed_df and remove State_FIPS.xlsx

confirmed_df = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/'
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to where you got data from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants