-
Notifications
You must be signed in to change notification settings - Fork 93
/
Copy pathdata-platform.html
356 lines (295 loc) · 12 KB
/
data-platform.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
<!DOCTYPE html>
<html>
<meta charset="utf-8">
<title>W3C Data Platform</title>
<link rel="stylesheet" type="text/css" href="style/base.css" />
<h1>W3C Data Platform</h1>
<section>
<h2 id="introduction">Introduction</h2>
<p>This page is part of W3C project to use a data-centric approach to redesigning its site and services. If you have comments, please write to <a href="http://lists.w3.org/Archives/Public/site-comments/">site comments</a>. Note that this is a public archive of messages.</p>
<p>For particulars about the design of the API, and the implementation of a prototype for search, see data platform design.</p>
</section>
<section>
<h2 id="people">People</h2>
<p>Antonio, Denis, Jean-Gui, Ted, Ian, Daniel Davis</p>
<h2 id="scope">Scope</h2>
<p>At the current time:</p>
<ul>
<li>Access to data about W3C specifications, groups, and events.
<ul>
<li>Out of scope for the first version: data about mail
</ul>
<li>Still in scope but lower priority: individuals, organizations.
<li>JSON representation of data
</ul>
<h2 id="timeline">Timeline</h2>
<p>Depends on:</p>
<ul>
<li>Whether we can leverage some modules/extensions
<li>Changes to CG site based on beta
</ul>
<h3 id="estimate">Rough estimate</h3>
<p>Timeline:</p>
<ul>
<li>August, September 2014: Research into what we can use
<li>September 2014: Work starts in earnest
<li>December 2014: Crisp definition of APIs; templates for lists and metadata pages available
<li>February 2015: APIs available publicly
</ul>
<h2 id="requirements">Requirements</h2>
<h3 id="project_requirements">Project Requirements</h3>
<ul>
<li>The APIs must be developed in public.
<li>The APIs must be publicly documented.
</ul>
<h3 id="design_requirements">Design Requirements</h3>
<ul>
<li>Client-side functionality in key index pages:
<ul>
<li>For specifications: Filtering by group, title, tag
<li>For groups: Filtering by title, tag
<li>For events: filtering by title, tag
</ul>
<li>APIs return JSON
<ul>
<li>To be formatted on the client side
<li>In the future there may be APIs for getting HTML-formatted information (e.g., for inclusion on vertical pages)
</ul>
<li>APIs must be able to take into account the user's credentials.
<ul>
<li>For example, for a given API we might provide Member-only additional information if the user is logged in with an appropriate account.
</ul>
<li>Filter/sort states on the client should be URI addressable
<ul>
<li>We will need general purpose query strings
<li>We may also want some friendlier URIs for common queries.
</ul>
<li>There should be an "all" view of each kind of index (specs, groups, etc.) that does not rely on JavaScript.
</ul>
<h4>Questions</h4>
<ul>
<li>How will we identify individual specifications? Related specifications (e.g., short name)?
<li>How will we identify individual groups? Activities? Domains?
</ul>
<h3 id="server_requirements">Server Requirements</h3>
<ul>
<li>Our data servers must be extremely high performance (e.g., with strong caching in front).
<ul>
<li>Some symfony, some between that and public (like varnish)
</ul>
<li>We need to consider unwanted bot traffic as we will have a larger number of URIs advertised.
</ul>
<h4>Questions</h4>
<ul>
<li>Our current framework sits on top of several databases that have some inconsistencies in field name conventions and other things which if exposed directly could confuse or frustrate developers. There are extensions to our framework that could greatly facilitate in creating this API but not with the best forward facing mapping. How much does this need to be cleaned up to make it more intuitive?
<li>W3C logic on publications workflow, group formation etc would be challenging to permit modification through the Web API (eg POST of revised or new JSON) even with authentication. What would be desirable to allow writes to? Individuals' information, with the exception of organization affiliation as that requires verification, is a candidate.
<li>Should the API URI scheme leverage our current (not visible to public at present) convention for groups, users and specs or establish a new scheme? It would be nice to use the existing for discoverability and consistency but a new URI space could be beneficial for caching, scalability, acl and other considerations.
<li>Do we want to be our own CDN?
</ul>
<h3 id="privacy_requirements">Privacy requirements</h3>
<ul>
<li>Will address when we get to individuals and members.
</ul>
<h2 id="Use_Cases">Use Cases</h2>
<h3 id="specifications">Specifications</h3>
<ul>
<li>A developer wants to write a new app based on our data and requires all the data about all specifications.
<ul>
<li>Includes all Technical Reports of any status and all registered CG and BG reports.
</ul>
<li>TR page views
<ul>
<li>Each view will be a list of Limited_Specification_Data restricted to WG/IG publications.
<li>Each view will allow:
<ul>
<li>Sorting by date (ascending/descending), title (ascending/descending)
<li>Filtering by group, title, tag
</ul>
<li>Views
<ul>
<li>Fifty most recently published specifications, and the ability to request the next 50.
<li>All publications within a date range (e.g., for AC meeting fact sheet)
<li>The most recent draft of every technical report.
</ul>
</ul>
<li>Group wishes to create custom display of information about its publications
<ul>
<li>Specifications published by a list of 1 or more identified groups, all data.
<li>Specifications published by a list of 1 or more identified groups, Limited_Specification_Data
</ul>
<li>Per specification
<ul>
<li>All data for an identified specification; this will be used to construct the "spec view" for each spec in the TR page.
<li>Title query
</ul>
<ul>
<li>Limited_Specification_Data for all specifications whose title at least partially matches (case-insensitive) a string.
</ul>
<li>Tag query
<ul>
<li>Limited_Specification_Data for all specifications whose tags correspond exactly to an input list of tags. We will use this in the "major topic" views (e.g., show me specs of particular interest to the automotive industry).
</ul>
</ul>
<h4>Notes</h4>
<ul>
<li>We are only planning limited query capabilities. For custom queries, use a series of individual queries.
</ul>
<h3 id="groups">Groups</h3>
<ul>
<li>A developer wants to write a new app based on our data and requires all the data about all groups.
<ul>
<li>Includes WGs, IGs, CGs, BGs, Coordination Groups, TAG, and AB
</ul>
<li>List of Groups page
<ul>
<li>A list of Limited_Specification_Data
<li>Allow:
<ul>
<li>Sorting by title (ascending/descending)
<li>Filtering by domain, title, tag
</ul>
</ul>
<li>Per group
<ul>
<li>All data for an identified group; this will be used to construct the "group view" for each group in the list
</ul>
<li>Title query
<ul>
<li>Limited_Specification_Data for all groups whose title at least partially matches (case-insensitive) a string.
</ul>
<li>Tag query
<ul>
<li>Limited_Specification_Data for all groups whose tags correspond exactly to an input list of tags. We will use this in the "major topic" views (e.g., show me groups of particular interest to the automotive industry).
</ul>
</ul>
<h3 id="individuals">Individuals</h3>
<ul>
<li>Group participation
<li>Roles (e.g., editor of which spec)
</ul>
<h3 id="organizations">Organizations</h3>
<ul>
<li>@@
</ul>
<h3 id="events">Events</h3>
Note: "Event" here refers to anything with a date/time stamp. We plan to integrate meeting information, publication information, review end dates, etc. into a unified W3C calendar.
<ul>
<li>A developer wants to write a new app based on our data and requires all the data about all events.
<ul>
<li>Includes meetings and other events, publications, spec/charter review start and end dates, election deadlines, registration deadlines.
</ul>
<li>W3C calendar
<ul>
<li>A list of Limited_Event_Data
</ul>
<li>Title query
<ul>
<li>Limited_Event_Data for all events whose title at least partially matches (case-insensitive) a string.
</ul>
<li>Tag query
<ul>
<li>Limited_Event_Data for all events whose tags correspond exactly to an input list of tags. We will use this in the "major topic" views (e.g., show me events of particular interest to the automotive industry).
</ul>
</ul>
<h2 id="apis">APIs</h2>
<h3 id="api-specs">Specifications</h3>
<h3 id="api-groups">Groups</h3>
<h3 id="api-events">Events</h3>
<h2 id="data">Data</h2>
<h3 id="specs_data">Specifications</h3>
<ul>
<li>@IJ likely to rewrite this as a table with different profiles, including "all" and "limited"
</ul>
<h4>Limited Specification Data</h4>
<ul>
<li>Title
<li>Link to latest WD
<li>Publication date
<li>Status
<li>Review end date (if applicable)
<li>Identifier(s) of responsible group(s)
<li>Specification tags
<li>URI to get full data
</ul>
<h4>Full Public Specification Data</h4>
<h3 id="spec_groups">Groups</h3>
<h4>Limited Group Data</h4>
<ul>
<li>Title
<li>URI of group home
<li>Group type
<li>Identifier of Domain
<li>Group tags
<li>URI to get full data
</ul>
<h4>Full Public Group Data</h4>
<h4>Additional Member Data</h4>
<h3 id="spec_events">Events</h3>
<h4>Limited Event Data</h4>
<ul>
<li>Title
<li>URI of event announcement (e.g., home page or news item)
<li>Event start/end dates
<li>Event location (may be empty)
<li>Event type
<li>Event tags
<li>URI to get full data
</ul>
<h4>Full Public Event Data</h4>
<h4>Additional Member Data</h4>
<h2 id="performance">Notes on Performance</h2>
<ul>
<li>One idea is to create cached copies of data since it does not change that frequently. This can be server-side included (or accessed by URL).
</ul>
<h2 id="without_js">Notes on Pages without Javascript</h2>
<ul>
<li>When JS is turned off, we want to provide access to all the data, meaning we serve an HTML file. One technique for this is to use something like phantom.js to format the all.html page, which gets updated on a regular basis. This would allow us to reuse the same templates for "view some" as for "view all." PLH likes this approach but wants to be sure that this technique will be available in a decade.
</ul>
<h2 id="feeds">Feeds/Notifications</h2>
<ul>
<li>Twitter?
<li>Atom?
</ul>
<h2 id="Daniel_Davis_URI_notes">Daniel Davis URI notes</h2>
(Based on guidelines from "Web API Design" by Brian Mulloy, <a href="http://daniemon.com/blog/web-api-design-a-definitive-guide/">summarised here</a>.)
The structure of API URLs would be:
<pre>
/collection/item
</pre>
To get all children within a parent resource, the resources would be listed hierarchically:
<pre>
/collection/item/sub-collection
</pre>
For filtering and queries beyond the base resources, we can use "?" parameters:
<pre>
/collection/item?filter=value
</pre>
This gives us a couple of benefits:
<ul>
<li>Consistency and predictability (which should make it future-proof)
<li>Uses existing URL strings for specs, groups, etc.
</ul>
<h3 id="examples">Examples</h3>
<pre>
/groups - data for all groups
/groups/dpub - data for DigiPub group
/domains/ink - data for INK domain
</pre>
<pre>
/domains/ink/groups - data for all groups within INK domain
</pre>
<pre>
/domains/ink/groups?type=ig - data for all IGs within the INK domain
/specs/discovery-api - data for the Network Service Discovery API
/groups/dap/specs?status=wd - data for all WD specs in DAP WG
</pre>
<h2>Discussion</h2>
<ul>
<li><a href="https://www.w3.org/2014/09/09-comm-minutes.html">9 September 2014 minutes</a>
</ul>
<h2>Resources</h2>
<ul>
<li><a href="https://www.w3.org/wiki/SpecProd/Restyle">Tantek/Elika work on spec styles</a>
</ul>
</section>
</html>