Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push result from FUs #792

Merged
merged 8 commits into from
Jan 25, 2025
Merged

Push result from FUs #792

merged 8 commits into from
Jan 25, 2025

Conversation

tilk
Copy link
Member

@tilk tilk commented Jan 23, 2025

This is my third attempt at changing the result direction in FUs from pull to push, and I think this is finally it. The change is very incremental - just enough to achieve the goal and nothing more. The change reduces the number of lines of code and even improves performance a little, thanks to removing some FIFOs. This is a good start for tuning the announcement part of the core.

Depends on #784. Merged.

@tilk tilk added refactor Doesn't change functionality, but makes stuff nicer benchmark Benchmarks should be run for this change labels Jan 23, 2025
Copy link

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.417 (+0.000) ▲ 0.532 (+0.007) ▲ 0.372 (+0.002) 0.631 (0.000) 0.359 (0.000) ▼ 0.291 (-0.000) 0.328 (0.000) ▲ 0.443 (+0.003)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 13690 (-153) ▼ 4389 (-9) 1456 (0) ▼ 1068 (-96) ▲ 53 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 21176 (-2096) ▼ 7141 (-56) ▲ 1914 (+32) ▼ 1120 (-96) ▼ 42 (-2)

Copy link

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.417 (+0.000) ▲ 0.532 (+0.007) ▲ 0.372 (+0.002) 0.631 (0.000) 0.359 (0.000) ▼ 0.291 (-0.000) 0.328 (0.000) ▲ 0.443 (+0.003)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14438 (-3404) ▼ 4389 (-9) 1456 (0) ▼ 1068 (-96) ▼ 48 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 21324 (-1271) ▼ 7141 (-56) ▲ 1914 (+32) ▼ 1120 (-96) ▲ 45 (+7)

scripts/synthesize.py Outdated Show resolved Hide resolved
@Hazardu
Copy link
Contributor

Hazardu commented Jan 25, 2025

It appears that a lot of code in FuncUnit derived classes is duplicated,
things like assignment of

self.gen_params = gen_params
self.layouts = layouts = gen_params.get(FuncUnitLayouts)
self.issue = Method(i=layouts.issue)
self.push_result = Method(i=layouts.push_result)

are the exact same in almost all cases, with one exception being dummyLsu.py that renamed layouts to fu_layouts

@tilk
Copy link
Member Author

tilk commented Jan 25, 2025

@Hazardu

It appears that a lot of code in FuncUnit derived classes is duplicated,

Indeed. This could probably be resolved by adding a base class for FUs, which would include a common constructor. This is not the subject of this PR, however - maybe you would be interested in doing such a refactor?

@Hazardu
Copy link
Contributor

Hazardu commented Jan 25, 2025

Yes, i'll do it. Seems quite simple.

Copy link

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.417 (+0.000) ▲ 0.532 (+0.007) ▲ 0.373 (+0.002) ▲ 0.632 (+0.001) ▼ 0.359 (-0.000) ▼ 0.291 (-0.000) ▲ 0.328 (+0.000) ▲ 0.443 (+0.003)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14682 (-3160) ▼ 4331 (-67) 1456 (0) ▼ 1136 (-28) ▲ 53 (+3)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 26783 (+4188) ▼ 6821 (-376) ▲ 1940 (+58) ▲ 1664 (+448) ▼ 37 (-1)

Copy link

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.417 (+0.000) ▲ 0.532 (+0.007) ▲ 0.373 (+0.002) ▲ 0.632 (+0.001) ▼ 0.359 (-0.000) ▼ 0.291 (-0.000) ▲ 0.328 (+0.000) ▲ 0.443 (+0.003)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14655 (-3187) ▼ 4331 (-67) 1456 (0) ▼ 1136 (-28) ▼ 43 (-6)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 27330 (+4735) ▼ 6821 (-376) ▲ 1940 (+58) ▲ 1664 (+448) ▲ 39 (+1)

Copy link

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.417 (+0.000) ▲ 0.532 (+0.007) ▲ 0.373 (+0.002) 0.632 (0.000) 0.359 (0.000) ▼ 0.291 (-0.000) 0.328 (0.000) ▲ 0.443 (+0.003)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 13464 (+124) ▼ 4331 (-9) ▼ 1424 (-32) ▼ 1136 (-96) ▲ 49 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 25212 (+942) ▼ 6821 (-56) 1940 (0) ▼ 1664 (-96) ▲ 43 (+6)

@tilk tilk merged commit e200b5b into kuznia-rdzeni:master Jan 25, 2025
14 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Benchmarks should be run for this change refactor Doesn't change functionality, but makes stuff nicer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants