forked from duncantl/RCUDA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Todo.xml
234 lines (194 loc) · 6.27 KB
/
Todo.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
<?xml-stylesheet type="text/xsl" href="XSL/Todo.xsl" ?>
<topics
xmlns:r="http://www.r-project.org"
xmlns:omg="http://www.omegahat.org">
<topic>
<title>
</title>
<items>
<item status="complete">
global option to use double rather than float.
And use it everywhere. Connect to strict for cudaMalloc.
</item>
<item status="done">
copy to device for a numeric comes back with size of the elements being 8
and type double. Should this be 4 and float.
See tests/doubleFloat.R
</item>
<item status="done">
If GPU supports double, then don't use float.
(Change the type in cudaMalloc and add code for copying doubles.)
</item>
<item>
seg faulting when quitting out of R, at least on lipschitz.
<br/>
Could it be the version of gcc, R, etc. No, looks like CUDA library issue.
<br/>
If we allocate memory, we don't get the segfault.
So I've added a C routine we call in .onLoad()
to create the context and allocate a byte.
<br/>
Looks like a clean up issue in libcuruntime.
Similar issue in pycuda but with libcudarand.
</item>
<item status="done">
test synchronization.
See tests/async.R
</item>
<item>
Finalizer on the stream.
</item>
<item>
Copy asynchronously.
Does this make sense in R.
Use pinned memory allocated with cudaMallocHost().
</item>
<item status="working">
streams - queues of tasks.
Example - distances and then clustering. Get the gputools kernels working
for the hierarchical clustering.
</item>
<item status="done">
Events on streams. Do they make sense in R, i.e. single thread?
Yes.
Can use a single routine and a call the R function as user data for callbacks.
See tests/event.R tests/eventSync.R
</item>
<item>
Implement cuCtxGetStreamPriorityRange and any other routines.
Just have the one for the Device, not the Ctx
</item>
<item status="check">
cudaMalloc() etc. should return an object derived from cudaPtrWithLength
so we know the size.
</item>
<item>
Problem when checking tag on a reference.
Specifically void and voidPtr.
See inst/doc/distPitch.R.
<r:code>
library(RCUDA)
m = matrix(as.numeric(1:20), 5, 4)
mem = cudaMallocPitch(ncol(m) * 4L, nrow(m))
ref = convertToPtr(t(m), "float")
cudaMemcpy2D(mem[[1]], mem[[2]], ref, ncol(m)*4L, ncol(m)*4L, nrow(m), cudaMemcpyHostToDevice)
cudaMemcpy2D(ref, mem[[2]], mem[[1]], ncol(m)*4L, ncol(m)*4L, nrow(m), cudaMemcpyDeviceToHost)
</r:code>
<br/>
We can disable this test on the tag but it would be better to get it right and consistent, not a special case
or string comparisons on the tag.
</item>
<item status="done">
cudaMemcpy2D should coerce to a voidPtr, not a void.
Fixed in the makeCoerceArg function in the RCodeGen package.
</item>
<item status="check">
Need a mechanism to convert R object to pointer in cudaMemcpy2D, i.e. the src
See convertToPtr.
Checked for float.
</item>
<item>
AB = matrix(1:(300*299), 300, 299)
mem = cudaMallocPitch( ncol(AB) * 4L, nrow(AB))
RCUDA:::cudaMemcpy2D(mem[[1]], mem[[2]], t(AB), nrow(AB)*4L, nrow(AB)*4L, ncol(AB), RCUDA:::cudaMemcpyHostToDevice)
<br/>
Fails trying to coerce the matrix to a voidPtr.
Should this be a void. But still no method to coerce R object to void.
</item>
<item status="done">
[Check] Higher-level R-like functions for cudaMallocPitch & cudaMemcpy2D
See inst/doc/distPitch.R
</item>
<item status="check">
cudaMemcpy2D should raise an error if the C routine doesn't return 0.
The C routine returns an object of class cudaError_t.
This is different CUresult.
So we have to generate the code differently to understand which
error type it is getting.
</item>
<item>
Make the code that expects a device number consistently us as(, "CUDeviceNum") or as (, "CUdevice").
Auto-generated code such as cudaDeviceGetPCIBusId causes problems as the declaration is integer.
Also, cuDeviceGet() gets back the correct number but then calls as(num, "CUdevice") which decrements
the value so it is wrong. When we get back from C code, we should leave it as is.
So new("CUdevice", value) or have the C code do it.
When the device is an argument, then do the subtraction.
<item status="check">
integer to CUdevice should subtract 1.
And when reutrn a CUdevice, put a class on it
so that we keep it as is when we pass it to an
R function/C routine.
Put this in the typeMap.
</item>
</item>
<item status="optimize">
Subsetting and assigning to parts of a cudaPtrWithLength.
subsetting done - integer, logical indices.
Subset assignment not done
</item>
<item status="low">
Show how to use structs in PTX code and pass them from R as inputs.
</item>
<item status="low">
Allow obj[] to take a routine to copy each element to a SEXP.
Caller specifies a native symbol.
</item>
<item status="low">
cudaMalloc should allow specification of the device.
Implicit in the current context?
</item>
<item status="low">
Allow .device/.gpu argument in .gpu/.cuda function to switch
to a specific device. Probably too much overhead for common use.
</item>
<item status="done">
In cuGetContext(), check if cuCtxGetCurrent returns a NULL pointer, not a NULL object.
<br/>
Move C code and R function to RAutoGenRunTime.
Isn't this already done - isNativeNull?
</item>
<item status="check">
Make certain to clean up so don't run out of memory across sessions.
If quit R, want to release resources.
</item>
<item status="done">
Examples - perhaps taken from gputools or rgpu
but done directly from R code, not with C wrappers.
<br/>
See dist stuff in sampleKernels and Paper/
</item>
<item status="test">
Configure script.
</item>
<item status="InProgress">
Generate bindings via TU and Clang
<br/>
Ignore the deprecated ones. How can we tell in clang.
<br/>
Add default arguments for types such as the device to be 1L or new("CUdevice", 0L).
</item>
<item>
Functions to manipulate module.
Anyway to find names in a module?
In a C++ API - http://adsm.googlecode.com/svn/trunk/libgmac/src/api/cudadrv/Module.h.
</item>
<item status="low">
Write function for reading profiler in key=value form.
</item>
<item status="done">
Profiler
</item>
<item status="check">
Class information on the cudaAlloc() returns so that
we know how to retrieve the result later.
Put length information on it also.
Make them RC++Reference and not just external pointers.
</item>
<item status="done">
Find out what the problem is with cubin files and not being able to load them.
<br/>
Need to get the nvcc flags to generate code for the correct device.
</item>
</items>
</topic>
</topics>