You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/src/man/categorical.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -45,9 +45,9 @@ cv = categorical(v)
45
45
Or you can edit the columns of a `DataFrame` in-place using the `categorical!` function:
46
46
47
47
```julia
48
-
dt=DataFrame(A = [1, 1, 1, 2, 2, 2],
48
+
df=DataFrame(A = [1, 1, 1, 2, 2, 2],
49
49
B = ["X", "X", "X", "Y", "Y", "Y"])
50
-
categorical!(dt, [:A, :B])
50
+
categorical!(df, [:A, :B])
51
51
```
52
52
53
53
Using categorical arrays is important for working with the [GLM package](https://github.com/JuliaStats/GLM.jl). When fitting regression models, `CategoricalArray` columns in the input are translated into 0/1 indicator columns in the `ModelMatrix` with one column for each of the levels of the `CategoricalArray`. This allows one to analyze categorical data efficiently.
Copy file name to clipboardexpand all lines: docs/src/man/getting_started.md
+19-19
Original file line number
Diff line number
Diff line change
@@ -107,59 +107,59 @@ julia> nulls(Int, 1, 3)
107
107
The `DataFrame` type can be used to represent data tables, each column of which is a vector. You can specify the columns using keyword arguments:
108
108
109
109
```julia
110
-
dt=DataFrame(A =1:4, B = ["M", "F", "F", "M"])
110
+
df=DataFrame(A =1:4, B = ["M", "F", "F", "M"])
111
111
```
112
112
113
113
It is also possible to construct a `DataFrame` in stages:
114
114
115
115
```julia
116
-
dt=DataFrame()
117
-
dt[:A] =1:8
118
-
dt[:B] = ["M", "F", "F", "M", "F", "M", "M", "F"]
119
-
dt
116
+
df=DataFrame()
117
+
df[:A] =1:8
118
+
df[:B] = ["M", "F", "F", "M", "F", "M", "M", "F"]
119
+
df
120
120
```
121
121
122
122
The `DataFrame` we build in this way has 8 rows and 2 columns. You can check this using `size` function:
123
123
124
124
```julia
125
-
nrows =size(dt, 1)
126
-
ncols =size(dt, 2)
125
+
nrows =size(df, 1)
126
+
ncols =size(df, 2)
127
127
```
128
128
129
129
We can also look at small subsets of the data in a couple of different ways:
130
130
131
131
```julia
132
-
head(dt)
133
-
tail(dt)
132
+
head(df)
133
+
tail(df)
134
134
135
-
dt[1:3, :]
135
+
df[1:3, :]
136
136
```
137
137
138
138
Having seen what some of the rows look like, we can try to summarize the entire data set using `describe`:
139
139
140
140
```julia
141
-
describe(dt)
141
+
describe(df)
142
142
```
143
143
144
144
To focus our search, we start looking at just the means and medians of specific columns. In the example below, we use numeric indexing to access the columns of the `DataFrame`:
145
145
146
146
```julia
147
-
mean(Nulls.skip(dt[1]))
148
-
median(Nulls.skip(dt[1]))
147
+
mean(Nulls.skip(df[1]))
148
+
median(Nulls.skip(df[1]))
149
149
```
150
150
151
151
We could also have used column names to access individual columns:
152
152
153
153
```julia
154
-
mean(Nulls.skip(dt[:A]))
155
-
median(Nulls.skip(dt[:A]))
154
+
mean(Nulls.skip(df[:A]))
155
+
median(Nulls.skip(df[:A]))
156
156
```
157
157
158
158
We can also apply a function to each column of a `DataFrame` with the `colwise` function. For example:
159
159
160
160
```julia
161
-
dt=DataFrame(A =1:4, B =randn(4))
162
-
colwise(c->cumsum(Nulls.skip(c)), dt)
161
+
df=DataFrame(A =1:4, B =randn(4))
162
+
colwise(c->cumsum(Nulls.skip(c)), df)
163
163
```
164
164
165
165
## Importing and Exporting Data (I/O)
@@ -191,8 +191,8 @@ a `DataFrame` rather than the default `DataFrame`. Keyword arguments may be pass
191
191
192
192
A DataFrame can be written to a CSV file at path `output` using
193
193
```julia
194
-
dt=DataFrame(x =1, y =2)
195
-
CSV.write(output, dt)
194
+
df=DataFrame(x =1, y =2)
195
+
CSV.write(output, df)
196
196
```
197
197
198
198
For more information, use the REPL [help-mode](http://docs.julialang.org/en/stable/manual/interacting-with-julia/#help-mode) or checkout the online [CSV.jl documentation](https://juliadata.github.io/CSV.jl/stable/)!
Copy file name to clipboardexpand all lines: docs/src/man/reshaping_and_pivoting.md
+7-7
Original file line number
Diff line number
Diff line change
@@ -43,20 +43,20 @@ d = stack(iris)
43
43
`unstack` converts from a long format to a wide format. The default is requires specifying which columns are an id variable, column variable names, and column values:
44
44
45
45
```julia
46
-
longdt=melt(iris, [:Species, :id])
47
-
widedt=unstack(longdt, :id, :variable, :value)
46
+
longdf=melt(iris, [:Species, :id])
47
+
widedf=unstack(longdf, :id, :variable, :value)
48
48
```
49
49
50
50
If the remaining columns are unique, you can skip the id variable and use:
51
51
52
52
```julia
53
-
widedt=unstack(longdt, :variable, :value)
53
+
widedf=unstack(longdf, :variable, :value)
54
54
```
55
55
56
-
`stackdt` and `meltdt` are two additional functions that work like `stack` and `melt`, but they provide a view into the original wide DataFrame. Here is an example:
56
+
`stackdf` and `meltdf` are two additional functions that work like `stack` and `melt`, but they provide a view into the original wide DataFrame. Here is an example:
57
57
58
58
```julia
59
-
d =stackdt(iris)
59
+
d =stackdf(iris)
60
60
```
61
61
62
62
This saves memory. To create the view, several AbstractVectors are defined:
@@ -73,13 +73,13 @@ This repeats the original columns N times where N is the number of columns stack
73
73
For more details on the storage representation, see:
74
74
75
75
```julia
76
-
dump(stackdt(iris))
76
+
dump(stackdf(iris))
77
77
```
78
78
79
79
None of these reshaping functions perform any aggregation. To do aggregation, use the split-apply-combine functions in combination with reshaping. Here is an example:
80
80
81
81
```julia
82
82
d =stack(iris)
83
-
x =by(d, [:variable, :Species], dt->DataFrame(vsum =mean(Nulls.skip(dt[:value]))))
83
+
x =by(d, [:variable, :Species], df->DataFrame(vsum =mean(Nulls.skip(df[:value]))))
0 commit comments