-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce subassembly offset output artifact #15710
base: develop
Are you sure you want to change the base?
Changes from all commits
c5408cd
4ac9132
6edeb8e
e4f8896
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1203,6 +1203,7 @@ LinkerObject const& Assembly::assembleLegacy() const | |
|
||
size_t subTagSize = 1; | ||
std::map<u256, LinkerObject::ImmutableRefs> immutableReferencesBySub; | ||
std::vector<LinkerObject> subAssemblies; | ||
for (auto const& sub: m_subs) | ||
{ | ||
auto const& linkerObject = sub->assemble(); | ||
|
@@ -1215,6 +1216,7 @@ LinkerObject const& Assembly::assembleLegacy() const | |
); | ||
immutableReferencesBySub = linkerObject.immutableReferences; | ||
} | ||
subAssemblies.push_back(linkerObject); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The linker object is already available as |
||
for (size_t tagPos: sub->m_tagPositionsInBytecode) | ||
if (tagPos != std::numeric_limits<size_t>::max() && numberEncodingSize(tagPos) > subTagSize) | ||
subTagSize = numberEncodingSize(tagPos); | ||
|
@@ -1465,6 +1467,16 @@ LinkerObject const& Assembly::assembleLegacy() const | |
bytesRef r(ret.bytecode.data() + pos, bytesPerDataRef); | ||
toBigEndian(ret.bytecode.size(), r); | ||
} | ||
|
||
std::vector<LinkerObject::SubAssembly> nestedSubAssemblies{}; | ||
for (auto const& subAssembly: subAssemblies) | ||
nestedSubAssemblies.insert(nestedSubAssemblies.end(), subAssembly.subAssemblyData.begin(), subAssembly.subAssemblyData.end()); | ||
|
||
size_t const currentBytecodeSize = ret.bytecode.size(); | ||
updateSubAssemblyStartOffsets(nestedSubAssemblies, currentBytecodeSize); | ||
|
||
ret.subAssemblyData.push_back({0, ret.bytecode.size(), isCreation(), ret.toHex(), nestedSubAssemblies}); | ||
Comment on lines
+1471
to
+1478
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Calculating subassembly locations like this seems inherently flawed to me. You're not accounting for data or any future changes of the code above that might result in something new being inserted between assemblies. You're also ignoring deduplication that may result in overlapping assemblies (that needs a test case as well). I think this calculation should be instead done above, in the loop that goes over |
||
|
||
return ret; | ||
} | ||
|
||
|
@@ -1776,3 +1788,12 @@ Assembly::OptimiserSettings Assembly::OptimiserSettings::translateSettings(front | |
asmSettings.evmVersion = _evmVersion; | ||
return asmSettings; | ||
} | ||
|
||
void Assembly::updateSubAssemblyStartOffsets(std::vector<LinkerObject::SubAssembly>& _subAssemblies, size_t const _currentBytecodeSize) const | ||
{ | ||
for (auto& subAssembly: _subAssemblies) | ||
{ | ||
subAssembly.start = _currentBytecodeSize - subAssembly.length; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm either missing something or this assumes that all subassemblies overlap and extend to the end of their parent assembly. Does it even work with more than one subassembly? If you have two subassemblies of the same length then you will end up with the same start location for both. And even if they're of different lengths it will be wrong. EDIT: Yeah, you even have a test showing this overlap ( {
"isCreation": false,
"length": 780,
"start": 1007
},
{
"isCreation": true,
"length": 130,
"start": 1657,
"subs": [ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should also have some asserts here. At the very least that a subassembly does not stick outside of its parent assembly. |
||
updateSubAssemblyStartOffsets(subAssembly.subs, _currentBytecodeSize); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -58,6 +58,16 @@ struct LinkerObject | |||||
/// Bytecode offsets of named tags like function entry points. | ||||||
std::map<std::string, FunctionDebugData> functionDebugData; | ||||||
|
||||||
struct SubAssembly { | ||||||
size_t start; | ||||||
size_t length; | ||||||
bool isCreation; | ||||||
std::string bytecode; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not the same thing as
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, why do you even store it? It's relatively large (on the same order as the size of the whole |
||||||
std::vector<SubAssembly> subs {}; | ||||||
}; | ||||||
|
||||||
std::vector<SubAssembly> subAssemblyData; | ||||||
|
||||||
/// Appends the bytecode of @a _other and incorporates its link references. | ||||||
void append(LinkerObject const& _other); | ||||||
|
||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add the feature to the CLI too. We should keep them at parity (and it's a pain for testing/development if the only way to access a feature is through StandardJSON). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, Yul compilation is not covered. Aside from general inconsistency, this makes two-step compilation less powerful, which may be a problem for future parallelization. |
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -237,7 +237,7 @@ bool isArtifactRequested(Json const& _outputSelection, std::string const& _file, | |||||||||||||||||
std::vector<std::string> evmObjectComponents(std::string const& _objectKind) | ||||||||||||||||||
{ | ||||||||||||||||||
solAssert(_objectKind == "bytecode" || _objectKind == "deployedBytecode", ""); | ||||||||||||||||||
std::vector<std::string> components{"", ".object", ".opcodes", ".sourceMap", ".functionDebugData", ".generatedSources", ".linkReferences"}; | ||||||||||||||||||
std::vector<std::string> components{"", ".object", ".opcodes", ".sourceMap", ".functionDebugData", ".generatedSources", ".linkReferences", ".subAssemblyOffsets"}; | ||||||||||||||||||
if (_objectKind == "deployedBytecode") | ||||||||||||||||||
components.push_back(".immutableReferences"); | ||||||||||||||||||
return util::applyMap(components, [&](auto const& _s) { return "evm." + _objectKind + _s; }); | ||||||||||||||||||
|
@@ -374,6 +374,24 @@ Json formatImmutableReferences(std::map<u256, evmasm::LinkerObject::ImmutableRef | |||||||||||||||||
return ret; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
Json formatSubAssemblyOffsets(std::vector<evmasm::LinkerObject::SubAssembly> const& _subAssemblyOffsets) | ||||||||||||||||||
{ | ||||||||||||||||||
Json subs = Json::array(); | ||||||||||||||||||
|
||||||||||||||||||
for (auto const& subAssembly: _subAssemblyOffsets) | ||||||||||||||||||
{ | ||||||||||||||||||
Json subAssemblyInfo; | ||||||||||||||||||
subAssemblyInfo["start"] = Json::number_unsigned_t(subAssembly.start); | ||||||||||||||||||
subAssemblyInfo["length"] = Json::number_unsigned_t(subAssembly.length); | ||||||||||||||||||
subAssemblyInfo["isCreation"] = Json::boolean_t(subAssembly.isCreation); | ||||||||||||||||||
if (!subAssembly.subs.empty()) | ||||||||||||||||||
subAssemblyInfo["subs"] = formatSubAssemblyOffsets(subAssembly.subs); | ||||||||||||||||||
subs.emplace_back(subAssemblyInfo); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
return subs; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
std::optional<Json> checkKeys(Json const& _input, std::set<std::string> const& _keys, std::string const& _name) | ||||||||||||||||||
{ | ||||||||||||||||||
if (!_input.empty() && !_input.is_object()) | ||||||||||||||||||
|
@@ -1235,6 +1253,12 @@ Json StandardCompiler::importEVMAssembly(StandardCompiler::InputsAndSettings _in | |||||||||||||||||
creationJSON["functionDebugData"] = formatFunctionDebugData(stack.object(sourceName).functionDebugData); | ||||||||||||||||||
if (evmCreationArtifactRequested("linkReferences")) | ||||||||||||||||||
creationJSON["linkReferences"] = formatLinkReferences(stack.object(sourceName).linkReferences); | ||||||||||||||||||
if (evmCreationArtifactRequested("subAssemblyOffsets")) | ||||||||||||||||||
{ | ||||||||||||||||||
Json ret = Json::object(); | ||||||||||||||||||
ret["subs"] = formatSubAssemblyOffsets(stack.runtimeObject(sourceName).subAssemblyData); | ||||||||||||||||||
creationJSON["subAssemblyOffsets"] = std::move(ret); | ||||||||||||||||||
} | ||||||||||||||||||
evmData["bytecode"] = creationJSON; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
|
@@ -1503,6 +1527,12 @@ Json StandardCompiler::compileSolidity(StandardCompiler::InputsAndSettings _inpu | |||||||||||||||||
creationJSON["linkReferences"] = formatLinkReferences(compilerStack.object(contractName).linkReferences); | ||||||||||||||||||
if (evmCreationArtifactRequested("generatedSources")) | ||||||||||||||||||
creationJSON["generatedSources"] = compilerStack.generatedSources(contractName, /* _runtime */ false); | ||||||||||||||||||
if (evmCreationArtifactRequested("subAssemblyOffsets")) | ||||||||||||||||||
{ | ||||||||||||||||||
Json ret = Json::object(); | ||||||||||||||||||
ret["subs"] = formatSubAssemblyOffsets(compilerStack.object(contractName).subAssemblyData); | ||||||||||||||||||
creationJSON["subAssemblyOffsets"] = std::move(ret); | ||||||||||||||||||
} | ||||||||||||||||||
Comment on lines
+1531
to
+1535
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just FYI, with nlohmann-json you can create JSON objects in place:
Suggested change
We should generally use that style whenever possible, because the intermediate variables does not add anything of value, especially one with a horrible name like |
||||||||||||||||||
evmData["bytecode"] = creationJSON; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
|
@@ -1533,6 +1563,12 @@ Json StandardCompiler::compileSolidity(StandardCompiler::InputsAndSettings _inpu | |||||||||||||||||
deployedJSON["immutableReferences"] = formatImmutableReferences(compilerStack.runtimeObject(contractName).immutableReferences); | ||||||||||||||||||
if (evmDeployedArtifactRequested("generatedSources")) | ||||||||||||||||||
deployedJSON["generatedSources"] = compilerStack.generatedSources(contractName, /* _runtime */ true); | ||||||||||||||||||
if (evmDeployedArtifactRequested("subAssemblyOffsets")) | ||||||||||||||||||
{ | ||||||||||||||||||
Json ret = Json::object(); | ||||||||||||||||||
ret["subs"] = formatSubAssemblyOffsets(compilerStack.object(contractName).subAssemblyData); | ||||||||||||||||||
deployedJSON["subAssemblyOffsets"] = std::move(ret); | ||||||||||||||||||
} | ||||||||||||||||||
evmData["deployedBytecode"] = deployedJSON; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,16 @@ | |
"linkReferences": {}, | ||
"object": "<BYTECODE REMOVED>", | ||
"opcodes":"<OPCODES REMOVED>", | ||
"sourceMap":"<SOURCEMAP REMOVED>" | ||
"sourceMap":"<SOURCEMAP REMOVED>", | ||
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, I wonder why this isn't |
||
"length": 87, | ||
"start": 0 | ||
} | ||
] | ||
} | ||
Comment on lines
+12
to
+20
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The input file has two assemblies but the output shows only one. Is this a bug in your feature or an instance of #15725 (because the nested assembly is unreferenced)? |
||
}, | ||
"deployedBytecode": { | ||
"functionDebugData": {}, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,23 @@ | |
"linkReferences": {}, | ||
"object": "<BYTECODE REMOVED>", | ||
"opcodes":"<OPCODES REMOVED>", | ||
"sourceMap":"<SOURCEMAP REMOVED>" | ||
"sourceMap":"<SOURCEMAP REMOVED>", | ||
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": true, | ||
"length": 130, | ||
"start": 0, | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
"length": 104, | ||
"start": 26 | ||
} | ||
] | ||
} | ||
] | ||
} | ||
Comment on lines
+13
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, is Sourcify ok with getting only subassembly locations? I've always been thinking about metadata as a separate subassembly myself, because it's separate I think we may need to include the start and length of CC @kuzdogan. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, we should have some tests that include There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if we had the exact location of metadata, we could create a more robust, Boost-based test that would get the structure info and the bytecode and diff the CBOR bit. It's not easy to spot a problem just looking at the values in command-line tests and we're not even shown the bytecode. |
||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -9,7 +9,23 @@ | |||||||||||||||||||||||||
"linkReferences": {}, | ||||||||||||||||||||||||||
"object": "<BYTECODE REMOVED>", | ||||||||||||||||||||||||||
"opcodes":"<OPCODES REMOVED>", | ||||||||||||||||||||||||||
"sourceMap":"<SOURCEMAP REMOVED>" | ||||||||||||||||||||||||||
"sourceMap":"<SOURCEMAP REMOVED>", | ||||||||||||||||||||||||||
"subAssemblyOffsets": { | ||||||||||||||||||||||||||
"subs": [ | ||||||||||||||||||||||||||
{ | ||||||||||||||||||||||||||
"isCreation": true, | ||||||||||||||||||||||||||
"length": 27, | ||||||||||||||||||||||||||
"start": 0, | ||||||||||||||||||||||||||
"subs": [ | ||||||||||||||||||||||||||
Comment on lines
+13
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The top-level assembly is not nested in anything. Seems weird to put its data under Finally, we might end up including more info there later so maybe
Suggested change
I'd use that name for |
||||||||||||||||||||||||||
{ | ||||||||||||||||||||||||||
"isCreation": false, | ||||||||||||||||||||||||||
"length": 3, | ||||||||||||||||||||||||||
"start": 24 | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
] | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
] | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, we also need some coverage for optimized compilation. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
{ | ||
"language": "Solidity", | ||
"sources": { | ||
"contract.sol": { | ||
"content": "// SPDX-License-Identifier: GPL-3.0\npragma solidity >=0.7.0 <0.9.0;\ncontract A {}\ncontract Storage { uint256 public number; bytes public code = type(A).creationCode; function store(uint256 num) public { number = num; }}" | ||
} | ||
}, | ||
"settings": { | ||
"evmVersion": "cancun", | ||
"outputSelection": { | ||
"*": { | ||
"Storage": [ | ||
"evm.bytecode.subAssemblyOffsets" | ||
] | ||
} | ||
} | ||
Comment on lines
+10
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're not testing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, bad indentation. And I'd put the contract in a separate file. |
||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
{ | ||
"contracts": { | ||
"contract.sol": { | ||
"Storage": { | ||
"evm": { | ||
"bytecode": { | ||
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": true, | ||
"length": 1787, | ||
"start": 0, | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
"length": 780, | ||
"start": 1007 | ||
}, | ||
{ | ||
"isCreation": true, | ||
"length": 130, | ||
"start": 1657, | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
"length": 104, | ||
"start": 1683 | ||
} | ||
] | ||
} | ||
] | ||
} | ||
] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
}, | ||
"sources": { | ||
"contract.sol": { | ||
"id": 0 | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be implemented for EOF as well (i.e. in
assembleEOF()
, or, if possible, just inassemble()
covering both with the same code).Since we're at the stage where EOF is passing semantic tests (and they will be enabled by default quite soon), we should start requiring all new features for work on EOF as well.