Skip to content

Update OpenFPGA #2983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 569 commits into from
Apr 24, 2025
Merged

Update OpenFPGA #2983

merged 569 commits into from
Apr 24, 2025

Conversation

amin1377
Copy link
Contributor

No description provided.

amin1377 and others added 30 commits March 18, 2025 16:24
The original implementation of APPack was focused on reconstructing a
given flat placement. This can cause issues if the given flat placement
disagrees with the decisions of the packer.

Instead, updated APPack so that it treats the flat placement as a hint
to help guide how it performs clustering.

Added the following new features:
- APPack computes the location of clusters based on the centroid of the
  molecules packed within.
- APPack attenuates the gain terms of candidates based on their distance
  from the cluster.
- APPack drops candidates which are too far from the cluster being
  created.

Remove adding molecules near to the position of the cluster. This had
similar affects to unrelated clustering and should be investigated
separately later.

With these changes to APPack, the AP flow now improves WL of circuits by
1-3% at the expense of up to 15% runtime compared to the default VPR
flow.
[APPack] Updated How APPack Adheres to Given Placement
…_packing

Remove usage of atom to pb lookup from packing
Updated the partial legalizer to now take into account block types when
spreading blocks.

This will create windows around overfilled bins that is aware of which
block types are overfilled and how large the window needs to be to
accomodate them. It also takes these block types into account when
spreading to only allow blocks to spread into sub-windows that they can
exist in.

This improves quality but was detremental to performance, so some
performance improvements were needed.

To improve the performance of the partial legalizer, I split the problem
into groups of models which must be spread together. This allows us to
create tighter windows and can make some parts of the legalizer more
efficient. Create a model grouper class which forms the model pack
patterns into a graph and find disconnected sub-graphs to form the model
groups.

Also improved the window generation by pre-clustering the overfilled
bins before creating the windows. This sped up the window generation
code since less windows overlap.
…lizer

[AP][GlobalPlacement] Improved Partial Legalizer Legality
When no fixed blocks are provided by the user, the AP flow can still
work. Currently, in the first iteration, the solver will put all blocks
at 0,0 and use the legalized solution in the next iteration as fixed
points. Instead of (0,0), it makes more sense to put the blocks in the
center of the device.

Also added a guess to the solver to help CG converge faster each
iteration.

Added a regression test to ensure that not describing the fixed blocks
is supported.
[AP][Solver] Supporting Unfixed Blocks
@amin1377 amin1377 requested a review from tangxifan April 20, 2025 16:00
amin1377 and others added 23 commits April 20, 2025 09:05
The SDF file generated by the post-implementation netlist writer was
only using the max delays of timing connections in the timing graph. In
the SDF file, it set all values of the rising and falling triples to the
max delay. When using this SDF file for external timing analysis, the
minimum timing (hold) paths were incorrect.

Updated the netlist writer to work with triples instead of bare delays.
This allows (minimum, typical, maximum) delays to be passed through the
different functions and be printed cleanly. For standard delay signals
in the circuit (not setup / hold times) Tatum provides the minimum
delays. These are now being printed in the SDF file and the minimum
timing paths are being found correctly in the external timing analyzer.

Cleaned up some parts of the netlist printing code as well.
1) netlist_writer.cpp declared many functions in the global scope which
   may cause conflicts at link time in VTR. Put all of these methods in
   anonymous namespace to prevent this.
2) The code was casting the delays from seconds to picoseconds in
   strange places. This was tricky to work with since these are both
   stored as doubles. Changed all of the code to only work with delays
   in seconds, and only cast to picoseconds when printing.
3) General cleanup of the header file and the include files.
Thank you to Fred Tombs for pointing out this issue!
The old Initial Placer used in the AP flow was constructed within the
initial placer of the non-AP flow. This forced the AP flow to try to
place blocks one at a time with minimum displacement. This is non-ideal
since blocks that were placed earlier were being getting first picks at
locations, which may displace a future cluster which may be a better fit
for that location.

Separated out the AP initial placement code. For AP, initial placement
is done in passes.

The first pass will try to place clusters exactly at the tile that the
centroid of all atoms within the cluster want to be placed (according to
the global placement). Any clusters that could not be placed are
reserved for the next pass.

The second pass will allow clusters to be placed within 1 tile of their
centroid.

All subsequent passes will allow cluster to be placed exponentially
farther from their centroid.

The initial placement terminates when all clusters have been placed or
if the max displacement is the size of the entire device.

The clusters are sorted based on the size of the macro that contains
them and the variance of the placement of the atoms within the macro.
This allows large macro blocks with low variance to be placed first.
[STA] Updated SDF File Generation to Include Min Delays
[AP][InitialPlacement] Created Isolated AP Flow
Fixed a couple of small known issues around the AP flow related to how
we handle fixed blocks.

Offset the fixed block locations by 0.5 such that they are no longer on
the edge. Previously, fixed blocks were placed at the root location of
tiles. This was a problem since atoms would want to be generally close
to the fixed block and may be biased to the bottom/left tiles to the
fixed-block tile. This does not handle large tiles, but will help in
general.

If no fixed blocks are provided, the AP solver will always produce the
trivial solution (all blocks placed on top of one another anywhere on
the device). We were wasting time running bound2bound to solve this and
the solution was probably being put on the bottom-left corner (0,0)
which is not ideal. Instead of running bound2bound during the first
iteration in this case, just placed all blocks in the center of the
device. This greatly speeds up the first iteration when no fixed blocks
are provided.
…l_packer

Remove atom_net global context mutation from packer
[AP] General Fixed/Unfixed Blocks Cleanup
@tangxifan tangxifan merged commit cab1db1 into openfpga Apr 24, 2025
36 checks passed
@tangxifan tangxifan deleted the openfpga_update branch April 24, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system docs Documentation external_libs infra Project Infrastructure lang-cpp C/C++ code lang-hdl Hardware Description Language (Verilog/VHDL) lang-make CMake/Make code lang-python Python code lang-shell Shell scripts (bash etc.) libarchfpga Library for handling FPGA Architecture descriptions liblog libpugiutil libvtrutil Odin Odin II Logic Synthesis Tool: Unsorted item Parmys scripts Utility & Infrastructure scripts VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants