-
Notifications
You must be signed in to change notification settings - Fork 408
Update OpenFPGA #2983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Update OpenFPGA #2983
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The original implementation of APPack was focused on reconstructing a given flat placement. This can cause issues if the given flat placement disagrees with the decisions of the packer. Instead, updated APPack so that it treats the flat placement as a hint to help guide how it performs clustering. Added the following new features: - APPack computes the location of clusters based on the centroid of the molecules packed within. - APPack attenuates the gain terms of candidates based on their distance from the cluster. - APPack drops candidates which are too far from the cluster being created. Remove adding molecules near to the position of the cluster. This had similar affects to unrelated clustering and should be investigated separately later. With these changes to APPack, the AP flow now improves WL of circuits by 1-3% at the expense of up to 15% runtime compared to the default VPR flow.
Remove Redundant print_pb
[APPack] Updated How APPack Adheres to Given Placement
Parsing Initial Placement WL and CPD
[RR Graph] RR Node Indices Value Type
…_packing Remove usage of atom to pb lookup from packing
Updated the partial legalizer to now take into account block types when spreading blocks. This will create windows around overfilled bins that is aware of which block types are overfilled and how large the window needs to be to accomodate them. It also takes these block types into account when spreading to only allow blocks to spread into sub-windows that they can exist in. This improves quality but was detremental to performance, so some performance improvements were needed. To improve the performance of the partial legalizer, I split the problem into groups of models which must be spread together. This allows us to create tighter windows and can make some parts of the legalizer more efficient. Create a model grouper class which forms the model pack patterns into a graph and find disconnected sub-graphs to form the model groups. Also improved the window generation by pre-clustering the overfilled bins before creating the windows. This sped up the window generation code since less windows overlap.
…lizer [AP][GlobalPlacement] Improved Partial Legalizer Legality
When no fixed blocks are provided by the user, the AP flow can still work. Currently, in the first iteration, the solver will put all blocks at 0,0 and use the legalized solution in the next iteration as fixed points. Instead of (0,0), it makes more sense to put the blocks in the center of the device. Also added a guess to the solver to help CG converge faster each iteration. Added a regression test to ensure that not describing the fixed blocks is supported.
[AP][Solver] Supporting Unfixed Blocks
…rilog-to-routing into openfpga_update
The SDF file generated by the post-implementation netlist writer was only using the max delays of timing connections in the timing graph. In the SDF file, it set all values of the rising and falling triples to the max delay. When using this SDF file for external timing analysis, the minimum timing (hold) paths were incorrect. Updated the netlist writer to work with triples instead of bare delays. This allows (minimum, typical, maximum) delays to be passed through the different functions and be printed cleanly. For standard delay signals in the circuit (not setup / hold times) Tatum provides the minimum delays. These are now being printed in the SDF file and the minimum timing paths are being found correctly in the external timing analyzer. Cleaned up some parts of the netlist printing code as well. 1) netlist_writer.cpp declared many functions in the global scope which may cause conflicts at link time in VTR. Put all of these methods in anonymous namespace to prevent this. 2) The code was casting the delays from seconds to picoseconds in strange places. This was tricky to work with since these are both stored as doubles. Changed all of the code to only work with delays in seconds, and only cast to picoseconds when printing. 3) General cleanup of the header file and the include files.
Thank you to Fred Tombs for pointing out this issue!
The old Initial Placer used in the AP flow was constructed within the initial placer of the non-AP flow. This forced the AP flow to try to place blocks one at a time with minimum displacement. This is non-ideal since blocks that were placed earlier were being getting first picks at locations, which may displace a future cluster which may be a better fit for that location. Separated out the AP initial placement code. For AP, initial placement is done in passes. The first pass will try to place clusters exactly at the tile that the centroid of all atoms within the cluster want to be placed (according to the global placement). Any clusters that could not be placed are reserved for the next pass. The second pass will allow clusters to be placed within 1 tile of their centroid. All subsequent passes will allow cluster to be placed exponentially farther from their centroid. The initial placement terminates when all clusters have been placed or if the max displacement is the size of the entire device. The clusters are sorted based on the size of the macro that contains them and the variance of the placement of the atoms within the macro. This allows large macro blocks with low variance to be placed first.
[STA] Updated SDF File Generation to Include Min Delays
[AP][InitialPlacement] Created Isolated AP Flow
Override edge attributes in RR graph
Fixed a couple of small known issues around the AP flow related to how we handle fixed blocks. Offset the fixed block locations by 0.5 such that they are no longer on the edge. Previously, fixed blocks were placed at the root location of tiles. This was a problem since atoms would want to be generally close to the fixed block and may be biased to the bottom/left tiles to the fixed-block tile. This does not handle large tiles, but will help in general. If no fixed blocks are provided, the AP solver will always produce the trivial solution (all blocks placed on top of one another anywhere on the device). We were wasting time running bound2bound to solve this and the solution was probably being put on the bottom-left corner (0,0) which is not ideal. Instead of running bound2bound during the first iteration in this case, just placed all blocks in the center of the device. This greatly speeds up the first iteration when no fixed blocks are provided.
…ve_ctx Remove PlacerMoveContext
…l_packer Remove atom_net global context mutation from packer
[AP] General Fixed/Unfixed Blocks Cleanup
…rilog-to-routing into openfpga_update
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
build
Build system
docs
Documentation
external_libs
infra
Project Infrastructure
lang-cpp
C/C++ code
lang-hdl
Hardware Description Language (Verilog/VHDL)
lang-make
CMake/Make code
lang-python
Python code
lang-shell
Shell scripts (bash etc.)
libarchfpga
Library for handling FPGA Architecture descriptions
liblog
libpugiutil
libvtrutil
Odin
Odin II Logic Synthesis Tool: Unsorted item
Parmys
scripts
Utility & Infrastructure scripts
VPR
VPR FPGA Placement & Routing Tool
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.