-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
152 lines (104 loc) · 4.9 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "Median Price Auction"
author: "Mojtaba Tefagh"
date: "4/9/2019"
output:
md_document:
variant: markdown_github
---
We first download the data from the Ethereum blockchain.
```{python}
import pandas as pd
import numpy as np
from web3 import Web3, HTTPProvider
web3 = Web3(HTTPProvider('http://localhost:8545'))
class CleanTx():
"""transaction object / methods for pandas"""
def __init__(self, tx_obj):
self.hash = tx_obj.hash
self.block_mined = tx_obj.blockNumber
self.gas_price = tx_obj['gasPrice']
self.round_gp_10gwei()
def to_dataframe(self):
data = {self.hash: {'block_mined':self.block_mined, 'gas_price':self.gas_price, 'round_gp_10gwei':self.gp_10gwei}}
return pd.DataFrame.from_dict(data, orient='index')
def round_gp_10gwei(self):
"""Rounds the gas price to gwei"""
gp = self.gas_price/1e8
if gp >= 1 and gp < 10:
gp = np.floor(gp)
elif gp >= 10:
gp = gp/10
gp = np.floor(gp)
gp = gp*10
else:
gp = 0
self.gp_10gwei = gp
block_df = pd.DataFrame()
for block in range(5000000, 5000100, 1):
block_obj = web3.eth.getBlock(block, True)
for transaction in block_obj.transactions:
clean_tx = CleanTx(transaction)
block_df = block_df.append(clean_tx.to_dataframe(), ignore_index = False)
block_df.to_csv('tx.csv', sep='\t', index=False)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)
library(tidyverse)
```
```{r}
tx.raw <- as.tbl(read.csv("tx.csv", sep = "\t"))
```
Before we begin, some plots from the raw data (gas price will be normalized later):
```{r}
ggplot(tx.raw,aes(x=block_mined, y=round_gp_10gwei))+geom_point()+labs(title="Gas Price")
ggplot(tx.raw,aes(x=block_mined, y=gas_price))+geom_point()
```
Now, we throw out the `round_gp_10gpwei` column and divide the `gas_price` by $10^8$. Then we group our gas prices data by the blocks and we compute a summary (`min,median,mean,max`). The blocks are consecutive and their numbers are made to start from zero in order to have a bit more visually appealing plots!
```{r}
tx.summary <- tx.raw %>% select(-round_gp_10gwei) %>%
mutate(gas_price=gas_price/10^8,block_mined=block_mined-min(block_mined)) %>%
group_by(block_mined) %>%
summarise_at(.vars=vars(gas_price),.funs=funs(min(.),median(.),mean(.),max(.)))
```
Here are some plots. `geom_smooth()` uses by default the Local Regression (`loess` for short):
*Loess Regression is the most common method used to smoothen a volatile time series. It is a non-parametric methods where least squares regression is performed in localized subsets, which makes it a suitable candidate for smoothing any numerical vector.*
To show the effectiveness/stability of median over other methods, we plot the data points along with the prediction curve of each (`min,median,mean,max`). Notice the scale of `max` is quite different therefore, although it seems stable its prediction curve has much higher errors than median. See the last two plots to compare the scale of their fluctutations.
```{r}
ggplot(tx.summary,aes(x=block_mined,y=min))+geom_smooth()+geom_point()+labs(title="min of gas prices")
```
```{r}
ggplot(tx.summary,aes(x=block_mined,y=mean))+geom_smooth()+geom_point()+labs(title="mean of gas prices")
```
```{r}
ggplot(tx.summary,aes(x=block_mined,y=max))+geom_smooth()+geom_point()+labs(title="max of gas prices")
```
```{r}
ggplot(tx.summary,aes(x=block_mined,y=median))+geom_smooth()+geom_point()+labs(title="median of gas prices")
```
Here is the max, median, and min statistics summary plot:
```{r}
ggplot(data = tx.raw %>% group_by(block_mined)) +
stat_summary(
mapping = aes(x = block_mined, y = gas_price),
fun.ymin = min,
fun.ymax = max,
fun.y = median
)+labs(title="Summary of gas_prices")
```
Here is how the mean and median and minimum curves compare:
```{r}
ggplot(tx.summary,aes(x=block_mined))+geom_smooth(aes(y=min,colour="min"))+
geom_smooth(aes(y=median,colour="median"))+geom_smooth(aes(y=mean,colour="mean"))+labs(title="min_median_mean")+ylab("min_median_mean")+scale_colour_manual(name="legend", values=c("blue", "red","green"))
```
Here is how all curves compare:
```{r}
ggplot(tx.summary,aes(x=block_mined))+geom_smooth(aes(y=mean,colour="mean"))+geom_smooth(aes(y=median,colour="median"))+geom_smooth(aes(y=min,colour="min"))+geom_smooth(aes(y=max,colour="max"))+
labs(title="Summary gas prices")+ylab("min_median_mean_max")+scale_colour_manual(name="legend", values=c("blue", "red","green","black"))
```
<!-- Let us see how the method `gam` errors compare in each case: -->
<!-- Here you will first need to normalized each column to make this meaningful -->
<!-- ```{r} -->
<!-- gam.med <- gam(min_gas_price~block_mined,data=tx.summary) -->
<!-- gam.med$deviance -->
<!-- ``` -->