You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert

321
321
322
322
## 4.3 establishing a baseline: Safe Optimizations.
323
-
323
+
> first two stages in Minerva
324
324
### 4.3.1 Training Space exploration
325
+
to search the training space, and bring training into the hardware design loop, to be able to train for certain network properties intentionally offers more optimization opportunities. but more difficult to for human to reason about parameter settings and inter-parameter relationships.
326
+
e.g. the number of neurons per layer,
327
+
the datatypes to both maximize the model's accuracy and minimize its energy consumption.
328
+
329
+
- Hyperparameter space exploration.
330
+
- number of hidden layers
331
+
- number of nodes per layer
332
+
- L1/L2 weight regularization penalties.
333
+
-*dropout rate*
334
+
**Pareto frontier**: the resource distributed in the best optimized status. For a fixed people group, and resources set to be distributed, if we change the distribution rule from one to another, non of these people in the group getting worse than before, so at least one of them is getting better, this called *Pareto Frontier*
325
335
### 4.3.2 Accelerator design space
336
+
for the selected and trained network from stage1 searches the accelerator microarchitectural design for high-performance designs.
337
+
338
+
Minerva uses Aladdin and ultimately yields a power-performance Pareto frontier. Minerva reduces power by around an order of magnitude
339
+
340
+
Use accelerator as a design base line(not CPU/GPU).
341
+
Aladdin Arch:
342
+

343
+
344
+
*DSE* design space exploration.
345
+
the neural network kernel is embarrassingly parallel within a single layer, the bottleneck to performance- and energy-scaling quickly becomes memory bandwidth. to supply bandwidth, SRAM partitioned heavily into smaller banks. once the minimum SRAM design granularity is reached, additional partition becomes wasteful. (with heavy area penalty against excessively parallel designs and provides little significant energy improvement, as seen in the most parallel designs.)
0 commit comments