Skip to content

Latest commit

 

History

History
68 lines (43 loc) · 2.68 KB

1.4-histogram-plot.md

File metadata and controls

68 lines (43 loc) · 2.68 KB

3.5 Histogram Chart

A frequency distribution shows how often each different value in a set of data occurs. A histogram is the most commonly used graph to show frequency distributions and can be a great first step in understanding a dataset.

1. Normal distribution

data = np.random.randn(1000)
plt.style.use('ggplot')    # customize the chart style
plt.hist(data)

Figure: Normal Distribution

2. Bins

fig,(ax1,ax2) = plt.subplots(1,2,figsize=(20,6))

ax1.hist(data, alpha=0.5, bins=50,label  ='bins: 50',color = 'r')
ax1.legend()
ax2.hist(data, alpha=0.5, bins=200,  label = 'bins: 100',color = 'r')
ax2.legend()
plt.suptitle('Histogram Chart in different bins')

Figure: Histogram chart in different bins

3. Skewed Distribution

The normal distribution is balanced and beautiful. However, In real life, sometimes it does not follow the law. For example, in the credit card case, while the vast majority of transactions are very low, this distribution is extremely skewed.

Figure 1.4.2 Skewed  Distribution

4. Histogram Plot

Let's use the "tips" dataset as an example. You can download it here, or load it via seaborn package.

import seaborn as sns
tips = sns.load_dataset('tips')

If you are the owner of a restaurant, of course, you may have a fist of things that want to know about your business. For instance, how much money can I earn per day? What price zone can make my customer happy and also make me happy? If you are the waiter or waitress, you probably care about a very similar question. How many tips can I earn per table?

plt.rcParams.update({'font.size': 20}) #  customize the font size

# create a subplot includes one row and two columns, the two itmes share y_axis
fig, (ax1, ax2) = plt.subplots(1, 2, sharey = True, figsize = (12,6))

ax1.hist(df.tip,color ='c',alpha =0.5) # customize color
ax1.set_title('Tips Distribution')     # set title separately

ax2.hist(df.total_bill,color = 'b',alpha =0.5)
ax2.set_title('Bills Distribution')

ax1.set_ylabel('Frequency')          # set the y_axis name
ax1.set_xlabel('USD')             # set the x_axis name

Figure 1.4.3 Tips and Bills Distribution

With the histogram plot, questions can be answered easily. A waiter/waitress may earn 2 - 4 dollars per table, sometime you may have the good luck to earn 6 dollars a table. While the restaurant's price is very customer friendly, a nice meal only costs 15 - 25 dollars.