Skip to content

Commit eb60403

Browse files
author
ComputeGeneralLab
committed
update chapter4 of dl4ca
1 parent 2de3eaa commit eb60403

File tree

3 files changed

+27
-1
lines changed

3 files changed

+27
-1
lines changed

docs/arch/deepLearning/Deep_Learning_For_Compute_Architects.md

+27-1
Original file line numberDiff line numberDiff line change
@@ -320,10 +320,36 @@ Circuit Level: EDA (silicon process/low power lib)
320320
![five Stages of Minerva](./five_Stages_of_Minerva.png)
321321

322322
## 4.3 establishing a baseline: Safe Optimizations.
323-
323+
> first two stages in Minerva
324324
### 4.3.1 Training Space exploration
325+
to search the training space, and bring training into the hardware design loop, to be able to train for certain network properties intentionally offers more optimization opportunities. but more difficult to for human to reason about parameter settings and inter-parameter relationships.
326+
e.g. the number of neurons per layer,
327+
the datatypes to both maximize the model's accuracy and minimize its energy consumption.
328+
329+
- Hyperparameter space exploration.
330+
- number of hidden layers
331+
- number of nodes per layer
332+
- L1/L2 weight regularization penalties.
333+
- *dropout rate*
334+
**Pareto frontier**: the resource distributed in the best optimized status. For a fixed people group, and resources set to be distributed, if we change the distribution rule from one to another, non of these people in the group getting worse than before, so at least one of them is getting better, this called *Pareto Frontier*
325335
### 4.3.2 Accelerator design space
336+
for the selected and trained network from stage1 searches the accelerator microarchitectural design for high-performance designs.
337+
338+
Minerva uses Aladdin and ultimately yields a power-performance Pareto frontier. Minerva reduces power by around an order of magnitude
339+
340+
Use accelerator as a design base line(not CPU/GPU).
341+
Aladdin Arch:
342+
![Aladdin architecture and pareto frontier and design opt](./aladdin_arch_design_tradeoffs.png)
343+
344+
*DSE* design space exploration.
345+
the neural network kernel is embarrassingly parallel within a single layer, the bottleneck to performance- and energy-scaling quickly becomes memory bandwidth. to supply bandwidth, SRAM partitioned heavily into smaller banks. once the minimum SRAM design granularity is reached, additional partition becomes wasteful. (with heavy area penalty against excessively parallel designs and provides little significant energy improvement, as seen in the most parallel designs.)
346+
347+
![Aladdin microarchitecture](./aladdin_microarch.png)
348+
*the red wires and boxes denote additional logic needed to accommondate the minerva optimization*
326349

350+
- additional safe optimization tricks:
351+
- input batching for increased locality
352+
- different architectures
327353

328354
## 4.4 Low-Power Neural networks accelerators: Unsafe Optimizations
329355
### 4.4.1 Data Type Quantization
Loading
210 KB
Loading

0 commit comments

Comments
 (0)