-
Notifications
You must be signed in to change notification settings - Fork 5.6k
2018 03 07
-
Documentation
-
Performance tuning
- https://github.com/PaddlePaddle/Paddle/pull/8673#pullrequestreview-100614721
- https://github.com/PaddlePaddle/Paddle/pull/8690#pullrequestreview-100617886
- https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-371375683
- https://github.com/PaddlePaddle/Paddle/issues/8729#issuecomment-371374110
- CSP
- Adding more unit tests for ChannelHolder class https://github.com/PaddlePaddle/Paddle/pull/8668
- Redesign channel implementation for Select Op https://github.com/PaddlePaddle/Paddle/pull/8814
- [WIP] Implement non blocking CanSend and CanReceive in Channels for Select OP https://github.com/PaddlePaddle/Paddle/issues/8815
- Issues Opened to Support Select
- Fluid Channel should match Go Channel in Semantics https://github.com/PaddlePaddle/Paddle/issues/8813
- Expose methods to create and add QueueMessage to Send or Receive Queue https://github.com/PaddlePaddle/Paddle/issues/8863
- Add ability to provide callbacks in QueueMEssage Notify https://github.com/PaddlePaddle/Paddle/issues/8864
- Update Authors.md https://github.com/PaddlePaddle/Paddle/pull/8692
- PR Reviews
- https://github.com/PaddlePaddle/Paddle/pull/8666#pullrequestreview-100283573
- https://github.com/PaddlePaddle/Paddle/pull/8848#pullrequestreview-102045995
- https://github.com/PaddlePaddle/Paddle/pull/8844#pullrequestreview-102043939
- https://github.com/PaddlePaddle/Paddle/pull/8801#pullrequestreview-101624463
- https://github.com/PaddlePaddle/Paddle/pull/8760#pullrequestreview-101309898
- https://github.com/PaddlePaddle/Paddle/pull/8800#pullrequestreview-101625047
- Paddle
- Fix Documentation and API links on paddlepaddle.org (https://github.com/PaddlePaddle/Paddle/pull/8760)
- CSP
- Select Operator, Select Operator Unit tests (https://github.com/PaddlePaddle/Paddle/pull/8867)
- VisualDL
- Fix error when scalar data empty https://github.com/PaddlePaddle/VisualDL/pull/298
- Fix issue where download data link points to empty run https://github.com/PaddlePaddle/VisualDL/pull/294
- Reorganize css and fix font style and size https://github.com/PaddlePaddle/VisualDL/pull/287
- Improve distribution GPU fluid
- ISSUE: https://github.com/PaddlePaddle/Paddle/issues/8638
- Locate where the problem: https://github.com/PaddlePaddle/Paddle/pull/8762
- Communication with BRPC: https://github.com/brpc/brpc/issues/255
- Unit test of package request: https://github.com/typhoonzero/grpc_zerocopy_async_example/pull/1
- PaddleCloud
- Document:
- Review:
- Inference Engine
- To-do lists for float16 support: https://github.com/PaddlePaddle/Paddle/issues/8693
- Add float16 GEMM GPU function in math_function: https://github.com/PaddlePaddle/Paddle/pull/8695
- Add float16 support for Mul Op: https://github.com/PaddlePaddle/Paddle/pull/8817
- Fix data_type_transform failure: https://github.com/PaddlePaddle/Paddle/pull/8850
- Review:
Memory optimization on Fluid:
Model | no optimize | reuse memory | release memory | forward memory |
---|---|---|---|---|
Resnet | 170590208 | 92995584(reduce 45.5%) | 78004224(reduce 54.3%) | 77488128 |
- https://github.com/PaddlePaddle/Paddle/pull/8690
- Try to combine scheduler and memory optimization together
Fix and Enhance:
Docs:
- PR review:
- Inception-V4 for Fluid: https://github.com/PaddlePaddle/models/pull/672#pullrequestreview-100702951
- squeeze/unsqueeze opeartor: https://github.com/PaddlePaddle/Paddle/pull/8541#pullrequestreview-100686151
- PR:
- reorganized NMT directory: https://github.com/PaddlePaddle/models/pull/690
- [WIP, not create a PR yet]: add multi-card/multi-thread implementation for Transformer
- need to enhance reshape operator first: https://github.com/PaddlePaddle/Paddle/issues/8781
- finish basic implementation, need to test and clean codes.
- OCR CTC
- Refine training net, final sequence error is 0.286(fluid) VS 0.278(V1) without 'model average'
- Add model average option for fluid API [WIP]
- Add guide for howto documentation:
- Remove
Accuracy
fromEvaluator
: - A new design of model save/load:
- save/load layer: https://github.com/PaddlePaddle/Paddle/pull/8711
- related issue: https://github.com/PaddlePaddle/Paddle/issues/8521
- [WIP] Double buffer for C++ reader:
- Doc update:
- compile:
- fix only shared variables could be declared as static in the device code on cuda 7.5: https://github.com/PaddlePaddle/Paddle/pull/8716
- fix warning: statement is unreachable: https://github.com/PaddlePaddle/Paddle/pull/8740
- refine operators/math/CMakeLists.txt: https://github.com/PaddlePaddle/Paddle/pull/8682
- inference:
- compile and install the static library of fluid inference: https://github.com/PaddlePaddle/Paddle/pull/7827
- doc:
- remove doc_theme: https://github.com/PaddlePaddle/Paddle/pull/8752
- Create new structure for V2 and fluid doc: https://github.com/PaddlePaddle/Paddle/pull/8761
- fix document deployment: https://github.com/PaddlePaddle/Paddle/pull/8772
- code review:
- Fix building error of missing end-group for Android.: https://github.com/PaddlePaddle/Paddle/pull/8680
- Fix fluid distribute build: https://github.com/PaddlePaddle/Paddle/pull/8704
- [Intel] MKLDNN conv2d kernels added: https://github.com/PaddlePaddle/Paddle/pull/8451
- 5 doc guides: https://github.com/PaddlePaddle/Paddle/issues?q=is%3Aissue+label%3Adocumentation+author%3Ashanyi15+is%3Aopen
- distribute training performance boost:
- EDL reviews:
- SSD on Fluid:
- Verify the backward of SSD loss.
- Verify the correctness of detection_output API in SSD.
- Implement detection mAP evaluator wrapper and unify label format between SSD loss and mAP evaluator.
- Enable device automatically switching in mine_hard_examples_op.
- Refine the doc in detection_output API.
- Fix bug in detection mAP evaluator.
- Refine MobileNet SSD model and add mAP evaluator.
- Fix the test reader for SSD.
- Converted a MobileNet-SSD model of Tensorflow based on COCO dataset for pre-training and train MobileNet-SSD.
- [WIP] Fix sparse update bug, https://github.com/PaddlePaddle/Paddle/issues/8678
- document fix: https://github.com/PaddlePaddle/Paddle/pull/8853
- fix nccl version in manylinux Dockerfile, https://github.com/PaddlePaddle/Paddle/pull/8708
-
Profile
-
se_resnet_152
multi-gpu profile https://github.com/PaddlePaddle/Paddle/issues/8661 -
[Speed]speed up python executor in fluid https://github.com/PaddlePaddle/Paddle/issues/8729
-
add se resnet 152 profile script https://github.com/dzhwinter/benchmark/pull/84
-
Add program cache for executor.py https://github.com/PaddlePaddle/Paddle/pull/8744
-
add image resnet profile https://github.com/dzhwinter/benchmark/pull/83
-
fluid vs pytorch profile https://github.com/PaddlePaddle/Paddle/issues/8677
`time_per_batch` | memory ---- | --- 1.95/1.15 = 1.696 | 18341 / 13359.0 = 1.373
-
DeepASR profile https://github.com/PaddlePaddle/Paddle/issues/8750
-
add timeline profile howto https://github.com/PaddlePaddle/Paddle/pull/8844
-
DataReader profile https://github.com/PaddlePaddle/Paddle/issues/8857
-
-
bugfix and code optimize
- fix snappy build https://github.com/PaddlePaddle/Paddle/pull/8804
- add
print_log
to memory_optimize https://github.com/PaddlePaddle/Paddle/pull/8831
-
Review
- Add CPU time and MemCopy to the timeline. https://github.com/PaddlePaddle/Paddle/pull/8679
- Enable device switching automatically for some operators in detection. https://github.com/PaddlePaddle/Paddle/pull/8684
- Enable device automatically switching in mine_hard_examples_op. https://github.com/PaddlePaddle/Paddle/pull/8706
- add inplace to reshape https://github.com/PaddlePaddle/Paddle/pull/8747
- [Speed]Avoid init_nccl for every steps. https://github.com/PaddlePaddle/Paddle/pull/8758
- Improve the timeline profiler https://github.com/PaddlePaddle/Paddle/pull/8775
- [Speed]Refine elementwise_mul_op gradient functor https://github.com/PaddlePaddle/Paddle/pull/8810
- [Speed] Refine elementwise sub,div,min,max gradient functor https://github.com/PaddlePaddle/Paddle/pull/8820
- fix mac build error https://github.com/PaddlePaddle/Paddle/pull/8856
-
Refine doc
-
Survey and Dissusion about API document standard
- Fix the error in Readme.md of mt_with_external_memory.
- Replace paddle.v2.fluid by paddle.fluid in benchmark.
- [WIP]Operators profiling in Transformer model.
- [WIP] Implement rnn_search model with multi-gpu.
- ParallelDo Profiling: https://github.com/PaddlePaddle/Paddle/issues/8719
- VGG net: https://github.com/PaddlePaddle/Paddle/issues/8718
- NCCL: https://github.com/PaddlePaddle/Paddle/pull/8575
- Parallel Executor: https://github.com/helinwang/Paddle/commit/07709351157f30260bff2da12c72a565b021bb70
- Document: https://github.com/PaddlePaddle/Paddle/pull/8656
- ISSUE: https://github.com/PaddlePaddle/cloud/issues/638
- EDL Deploy EDL on Sys Kubernetes Cluster
- NMT:
- [WIP] Add inference and external beam search implementation for Transformer.
- Tune Transformer model achieving comparative training cost with Pytorch.
- Refine document
https://github.com/PaddlePaddle/Paddle/pull/8737 - Refine api design for beam search
https://github.com/PaddlePaddle/models/pull/675 - Enhance data reader for DeepASR [WIP]
- Inference Framework
- [Merged] Add profiling information for inference example
- [Reviewing] Add test for nested RecordEvent
- Review
- Fix nullptr when doing nested profiling: https://github.com/PaddlePaddle/Paddle/pull/8782
- refine operators/math/CMakeLists.txt: https://github.com/PaddlePaddle/Paddle/pull/8682
- Enable is_test attr of batch_norm and drop out op for test program: https://github.com/PaddlePaddle/Paddle/pull/8642
- compile and install the static library of fluid inference: https://github.com/PaddlePaddle/Paddle/pull/7827
- Mobile
- [Merged] Fix building error of missing end-group for Android
- [Merged] Try to add build_android task back to travis
- [Doing] Enable the building of Fluid for Android
- Add guideline for the doc of cmd parameters
- Setup DeepASR training on k8s cluster (thanks to @wuyi @weibao @yanxu), 2x speedup (4 GPUs, P40 vs. K40).
- Model training, find the bad convergence of momentum optimizer
- Some small fix
- Code Review:
- [Reviewing] RecordIO Reader
- [Merged] RecordIO File Utilities
- Profiling GPU/CPU code
- Several internal web page linkes.
- Several enhancements
- Performance optimization of single card single card
- Refine elementwise_mul_op gradient functor
- Refine concat_op
- Refine elementwise sub,div,min,max gradient functor
- Enhancement
- Add log before op Run
- Debug
- For
test_machine_translation.py
, with regularization, the program will be hung out. https://github.com/PaddlePaddle/Paddle/issues/8697
- For
- Review
- Fluid
- add develop standard
- add feature/recordio io format
- refine gpu common
- paddle performance profiling
- paddle performance optimization
- paddle multi-threaded scheduling
Inference:
- TensorRT results: https://github.com/PaddlePaddle/Paddle/issues/8790 (Code has been shared with Kexin and Xreki in another repo)
- Discussion with Nvidia folks for TensorRT integration
PR review:
- Profiling info for inference example: https://github.com/PaddlePaddle/Paddle/pull/8748
- float16 add context wait: https://github.com/PaddlePaddle/Paddle/pull/8850
- float16 data type tranform: https://github.com/PaddlePaddle/Paddle/pull/8619
- Performance tuning:
- https://github.com/PaddlePaddle/Paddle/pull/8720
- https://github.com/PaddlePaddle/Paddle/pull/7814#pullrequestreview-101288703
- https://github.com/PaddlePaddle/Paddle/pull/7814#discussion_r172305161
- https://github.com/PaddlePaddle/Paddle/issues/8724#issuecomment-370894576
- https://github.com/PaddlePaddle/Paddle/pull/8674#pullrequestreview-101689269
- https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-370907258
- Move EDL from PaddlePaddle/cloud to PaddlePaddle/edl with Yi: https://github.com/PaddlePaddle/edl/pull/5
- Integrate PyBind documentation into Sphinx documentation generation: https://github.com/PaddlePaddle/VisualDL/pull/292
- Update Travis-CI setting: https://github.com/PaddlePaddle/VisualDL/pull/295, https://github.com/PaddlePaddle/VisualDL/pull/297
- Fix incorrect documentation link: https://github.com/PaddlePaddle/VisualDL/pull/288
- Add English examples of how to use VisualDL with PyTorch, MXNet, Keras: https://github.com/PaddlePaddle/VisualDL/pull/299
- Publish the new examples to the websites: https://github.com/PaddlePaddle/VisualDL/pull/300
- WIP of CSP Select op and Channels redesign: https://github.com/PaddlePaddle/Paddle/pull/8867
- Collaborate with Jeff on VDL issues