Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft APIv2 request and responses #1545

Closed
1 task done
Tracked by #1470
kuzdogan opened this issue Aug 12, 2024 · 17 comments
Closed
1 task done
Tracked by #1470

Draft APIv2 request and responses #1545

kuzdogan opened this issue Aug 12, 2024 · 17 comments

Comments

@kuzdogan
Copy link
Member

kuzdogan commented Aug 12, 2024

See #1470

Currently in the process of drafting the API via Spotlight: https://sourcify.stoplight.io/docs/sourcify-apiv2/branches/main

Open questions:

  • Should we have separate "single-file" and "multi-file" endpoints? IMO no, there no point in providing these APIs if the server will convert them to JSON anyway. The UI using this API can still provide a converter/wrapper for that functionality.
  • In the /verify/metadata endpoint, should we have a separate "metadata" field (required) or just put everything in "files" field?
  • In /verify/etherscan should we make the API key optional or required? Optional meaning by default it uses our own API key. Keep in mind we are easily reaching the rate limits of the API key.
  • What HTTP code to return at the GET verify/{verificationId} endpoint if the verification failed. Still 200 or 500 etc?
  • Biggest Q IMO: What fields to provide the user at the GET /contract/{chainId}/{address}` endpoint and in which format?

Action Items:

  • Kaan to check how many of the Etherscan calls are done via our API vs custom keys.
@manuelwedler
Copy link
Collaborator

Some comments on this:

Should we have separate "single-file" and "multi-file" endpoints? IMO no, there no point in providing these APIs if the server will convert them to JSON anyway. The UI using this API can still provide a converter/wrapper for that functionality.

I agree here. The API should not support it, but the UI should have a converter to provide nice UX. If you are programmatically calling our API you are probably also able to create the standard json.

In the /verify/metadata endpoint, should we have a separate "metadata" field (required) or just put everything in "files" field?

In my opinion, to make it unambigous, we should add a "metadata" field. We would avoid errors in case of multiple metadata files and make it explicit that it is a required parameter.

Then, the question would be if we pass the metadata either as pure JSON, or as string encoded (which would be parsed by the server as it is now). I'm leaning more towards the second option as I see this endpoint mostly as a backward compatibility feature.

What HTTP code to return at the GET verify/{verificationId} endpoint if the verification failed. Still 200 or 500 etc?

500 only if the server experiences an error while handling the request. So it should be a 200 if the server is able to query the status of the provided verificationId. I would also not use the 202 here in case it is pending, and rather return 200 for all statuses. The reason is that you query the status of the job, not the contract as a resource.

Also, I would not return null in case the verification failed. Rather I would return "failed". I think we should avoid mixing types in the same field.

What fields to provide the user at the GET /contract/{chainId}/{address}` endpoint and in which format?

We can probably mostly orientate ourselves by the Blockscout API. The biggest question is here if we want to return the source code in the response, which will make the response payload quite big. However, in my opinion, it is easier to have just one endpoint that returns everything for a contract. We can still have a status endpoint that only returns the verification status of an address.

@kuzdogan
Copy link
Member Author

kuzdogan commented Sep 3, 2024

Some discussion points during our call:

  • How do we handle Hardhat output files?

    • Let's have a look at the usage metrics how many people use it.
    • We should keep the API lean and offload this parsing to the UI or a client.
  • Should the /metadata endpoint accept an object or a string?

    • Object

    To add:

    • An endpoint to get the list of verified contracts on a chain, similar to /files/contracts/any/{chain}

@kuzdogan
Copy link
Member Author

kuzdogan commented Sep 4, 2024

One important question is what to return in the /contract/{chainId}/{address} endpoint. This endpoint will be the main endpoint to check if a contract is verified on Sourcify, and also to retrieve detailed information about the contract.

The response by default will contain the minimal response:

{
  match: "exact_match" | "match" | null,
  creationMatch: "exact_match" | "match" | null,
  runtimeMatch: "exact_match" | "match" | null,
  chainId: "11155111",
  address: "0xfe34567890123456789012345678901234567890",
  verifiedAt: "2024-07-24T12:00:00Z"
}

In addition users can ask for other detailed fields in a query parameter e.g. /contracts/{chainId}/{address}?fields=abi,metadata,name or get everything but omit some: /contracts/{chainId}/{address}?omit=sources

We decided to define each field ourselves and not return a 1-to-1 reflection of the DB schema.

Now the tricky part is how to organize the bytecode information. We have the following data about the contract's bytecodes:

  • onchain runtime bytecode
  • onchain creation bytecode
  • recompiled runtime bytecode
  • recompiled creation bytecode
  • transformations
    (i.e operations that need to be applied on top of the recompiled bytcodes to reach their onchain correspondent, explained here)
    keep in mind the "positions-" and the "values objects" of each of these have different fields in the DB schema
    • runtime transformations and their values:
      • cborAuxdata
      • library
      • immutable
      • callProtection
    • creation transformations and their values:
      • constructorArguments
      • cborAuxdata
      • library
  • artifacts for the runtime bytecode
    • sourceMap
    • cborAuxdata
    • linkReferences
    • immutableReferences
  • artifacts for the creation bytecode
    • sourceMap
    • cborAuxdata
    • linkReferences

The question is, how do we organize this data in an understandable way. Considering this available information here's my suggested output structure:

{
  // minimal info, returned always, can't be omitted
  match: "exact_match" | "match" | null,
  creationMatch: "exact_match" | "match" | null,
  runtimeMatch: "exact_match" | "match" | null,
  chainId: "11155111",
  address: "0xfe34567890123456789012345678901234567890",
  verifiedAt: "2024-07-24T12:00:00Z", 
  // bytecode details
  creationBytecode: { // not availible if creationMatch is `null`
    onchainBytecode: "0x608060405234801561001057...610150806100206000396000f3",
    recompiledBytecode: "0x608060405234801561001057...610150806100206000396000f3",
    // artifacts from the compiler output
    sourceMap: "368:7042:14:-:0;;;;;;;;;;;;;;;;;;;",
    linkReferences: {
      "/home/home/dotfiles/poof-core/contracts/MerkleTreeWithHistory.sol": {
        "Hasher": [
          {
            "start": 910,
            "length": 20
          },
        ],
      },
    },
    cborAuxdata: {
      "1": {
        "value": "0xa26469706673582212201e80049ede18eadf4ab7f0dea2c32f2375c33b5aef0b1a16cc5223dbc681559364736f6c63430007060033",
        "offset": 5471
      }
    },
    transformations: [
      {
        "id": "sources/lib/MyLib.sol:MyLib",
        "type": "replace",
        "offset": 582,
        "reason": "library"
      },
      {
        "id": "0",
        "type": "replace",
        "offset": 1269,
        "reason": "cborAuxdata"
      },
      {
        "type": "insert",
        "offset": 1322,
        "reason": "constructorArguments"
      }
    ],
    transformationValues: {
      "library": {
        "sources/lib/MyLib.sol:MyLib": "0x40b70a4904fad0ff86f8c901b231eac759a0ebb0"
      },
      "constructorArguments": "0x00000000000000000000000085fe79b998509b77bf10a8bd4001d58475d29386",
      "cborAuxdata": {
        "0": "0xa26469706673582212201c37bb166aa1bc4777a7471cda1bbba7ef75600cd859180fa30d503673b99f0264736f6c63430008190033"
      }
    }
  }, 
  runtimeBytecode: { // not availible if runtimeMatch is `null`
    onchainBytecode: "0x608060405234801561001057...610150806100206000396000f3",
    recompiledBytecode: "0x608060405234801561001057...610150806100206000396000f3",
    // artifacts from the compiler output
    sourceMap: "340:4705:14:-:0;;;;685:37..........59:1;4355;:5;;;;;;;4217:150;-1:-1:-1;;;4217:150:8:o;3136:155::-;3194:7;3226:1;3221;:6;;3213:49;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-1:-1:-1;3279:5:8;;;3136:155::o",
    linkReferences: {
      "contracts/AmplificationUtils.sol": {
        "AmplificationUtils": [
          {
            "start": 3078,
            "length": 20
          },
        ],
      },
      "contracts/SwapUtils.sol": {
        "SwapUtils": [
          {
            "start": 2931,
            "length": 20
          },
        ],
      },
    },
    cborAuxdata: {
      "1": {
        "value": "0xa2646970667358221220fc5c3958995f3020259e8da7b7cfc439316c92fecdab62bef66bd9e7b1c0582864736f6c634300060c0033",
        "offset": 2026
      }
    },
    immutableReferences: {
      "1050": [
        {
          "start": 312,
          "length": 32
        },
        {
          "start": 2631,
          "length": 32
        }
      ]
    };
    // transformations applied to the recompiled bytecode to reach the onchain bytecode
    transformations: [
      {
        "id": "20",
        "type": "replace",
        "offset": 137, // in bytes
        "reason": "immutable"
      },
      {
        "id": "0",
        "type": "replace",
        "offset": 1002,
        "reason": "auxdata"
      }
    ];
    transformationValues: {
      "immutables": {
        "20": "0x000000000000000000000000b5d83c2436ad54046d57cd48c00d619d702f3814"
      },
      "cborAuxdata": {
        "0": "0xa26469706673582212205817b060c918294cc8107069c4fd0a74dbfc35b0617214043887fe9d4e17a4a864736f6c634300081a0033"
      }
    }
  },
  deployment: {
    transactionHash: "0xa0483d0b361b247883268883842e66b248952148898849884884884884884884",
    blockNumber: 35453621;
    transactionIndex: 2,
    deployer: "0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984",
  },
  compilation: {
    sources: {
      "contracts/Storage.sol": { 
        id: 0 // The AST identifiers of sources
        content: "// SPDX-License-Identifier: MIT\npragma solidity ^0.8.0;\n\ncontract Storage {\n    uint256 number;\n\n    function setNumber(uint256 newNumber) public {\n        number = newNumber;\n    }\n\n    function getNumber() public view returns (uint256) {\n        return number;\n    }\n}\n",
      },
      "contracts/Owner.sol": {
        id: 1
        content: "// SPDX-License-Identifier: MIT\npragma solidity ^0.8.0;\n\ncontract Owner {\n    address public owner;\n\n    constructor() {\n        owner = msg.sender;\n    }\n}\n"
      }
    },
    language: "Solidity",
    compilerVersion: "v0.8.12+commit.f00d7308",
    compilerSettings: {}
    name: "Storage",
    fullyQualifiedName: "contracts/Storage.sol:Storage",
  }
  abi: [],
  userDoc: {},
  devDoc: {},
  storageLayout: {},
  metadata: {},
}

@kuzdogan
Copy link
Member Author

Please do a final short review on the Draft here: https://sourcify.stoplight.io/docs/sourcify-apiv2/branches/main

@marcocastignoli @manuelwedler

@marcocastignoli
Copy link
Member

marcocastignoli commented Sep 11, 2024

  1. The verificationJob object will not last forever, we have to clean the completed ones. Would it make sense to also include the "expiration date" of a verificationJob?
  2. proxy support is missing Handling "Minimal Proxy Contract" pattern #1624, I think we should include it
  3. should we include also sources.hashes and ipfs/swarm links in the /v2/contract/chainId/address response?
  4. minor: I see custom_code is used everywhere with the same example "unsupported_chain", but I think we can create two different custom_code: one for /v2/verify/chainId/address fail statuses and another for /v2/verify/verificationId

@marcocastignoli
Copy link
Member

Also another big doubt: will we support RepositoryVXStorageService as valid read services with apiV2? If that's the case then many fields of the /v2/contract/chainId/address response will not be available

@manuelwedler
Copy link
Collaborator

To me, this looks good to get external feedback after solving Marco's questions.

  1. The verificationJob object will not last forever, we have to clean the completed ones. Would it make sense to also include the "expiration date" of a verificationJob?

I would have expected that the verification job is kept forever actually.

  1. should we include also sources.hashes and ipfs/swarm links in the /v2/contract/chainId/address response?

I like that

  1. minor: I see custom_code is used everywhere with the same example "unsupported_chain", but I think we can create two different custom_code: one for /v2/verify/chainId/address fail statuses and another for /v2/verify/verificationId

I agree that we can give more detailed error codes, but this was probably out of scope for this draft.

Also another big doubt: will we support RepositoryVXStorageService as valid read services with apiV2? If that's the case then many fields of the /v2/contract/chainId/address response will not be available

Very good point. I think either we disable them as read services, or things will get more complicated.


Lastly, a comment regarding the 500s returned from the /verify endpoints:

I don't think we should return 500s for compiler errors. compilation should be part of the verification, and therefore it would be shown in the result of the job. I think it is fine to remove the 500s from this specification, as these would be thrown as fallbacks when we encounter bugs on our end. Not sure if we would be able to show custom codes in these cases.

In the end, the verify endpoints should do some basic input validation, and throw 400 errors if this fails. Otherwise they should just start the job and return the job id.

@marcocastignoli
Copy link
Member

marcocastignoli commented Sep 11, 2024

I would have expected that the verification job is kept forever actually.

I thought of the job as something ephemeral because in my mind it was related to an entity in a queue (it gets in the queue, it's elaborated, then deleted). I see your point of keeping it as a database entity, but what are the pros? There is no additional information that we are not already storing.

@manuelwedler
Copy link
Collaborator

I would have expected that the verification job is kept forever actually.

I thought of the job as something ephemeral because in my mind it was related to an entity in a queue (it gets in the queue, it's elaborated, then deleted). I see your point of keeping it as a database entity, but what are the pros? There is no additional information that we are not already storing.

It's just difficult to find a point in time when the job info is not needed anymore. Let's imagine someone verifies a contract via the Remix plugin, but closes Remix before the verification job status returned successfully. If this person only opens Remix one month later and the job info would be removed by then, the verification would be shown as failed in the plugin.

@marcocastignoli
Copy link
Member

are Etherscan and Blockscout keeping that information forever?

@manuelwedler
Copy link
Collaborator

I checked Etherscan for a receipt id from one and a half months ago and it is still returned

@marcocastignoli
Copy link
Member

Ok, then I just had the wrong idea about that 😆

@kuzdogan
Copy link
Member Author

kuzdogan commented Sep 13, 2024

  1. The verificationJob object will not last forever, we have to clean the completed ones. Would it make sense to also include the "expiration date" of a verificationJob?

Let's keep the verificationJobs indefinitely if not forever. The amount of data should be trivial. If it grows big, we can start deleting e.g. older than 1 year old.

  1. proxy support is missing Handling "Minimal Proxy Contract" pattern #1624, I think we should include it

I actually think there should not be the concept of the minimal proxies in the DB. See #1624 and the VerA TG chat.

  1. should we include also sources.hashes and ipfs/swarm links in the /v2/contract/chainId/address response?

I'd say no:

  1. This is already inside the metadata
  2. If someone desperately needs it and don't want to use metadata the hashes are deterministic and they can calculate it.

Mayyybe we can return the keccak after we implement the #1615

  1. minor: I see custom_code is used everywhere with the same example "unsupported_chain", but I think we can create two different custom_code: one for /v2/verify/chainId/address fail statuses and another for /v2/verify/verificationId

For now let's keep it like that but yes the custom_codes should be an enum

Also another big doubt: will we support RepositoryVXStorageService as valid read services with apiV2? If that's the case then many fields of the /v2/contract/chainId/address response will not be available

It'd be really difficult to maintain the APIv2 for a filesystem. One, we have the whole concept of the verificationId and its status so we have to keep it somewhere. Two, all the fields in /v2/contract/{chainId}/{address} is based on the database schema. We don't even have runtimeMatch creationMatch concept directly in the current FS structure.

However the recent use case came up with the VSCode extension with Sourcify might be a good reason to have filesystem support. Still, I think this will overcomplicate things for a small use case and we can start thinking about this maybe after we complete the APIv2 MVP.

@RaoulSchaffranek
Copy link

Hi,

I’ve reviewed the drafts and appreciate the efforts in the new API design, particularly the shift toward standard JSON. Here’s some feedback from a developer tooling perspective:

It would be fantastic if the GET /contract/{chainId}/{address} endpoint could return data in a standard JSON-compliant format. I understand there is additional data you want to include, so this could be a conservative extension of the standard JSON.

This enhancement would be highly beneficial for developer tools. Many start with local development environments directly consuming output from the Solidity compiler or tools like Foundry/Hardhat. As they expand to support remote environments, they must fetch source code from external providers, which introduces complexity. Developers often face challenges with unavailable data due to unverified contracts, network issues, or missing compilation flags for sourcemaps. Sometimes, we even need to recompile contracts to gather necessary information.

Additionally, dealing with the differences between data from external sources like Sourcify or Blockscout and the standard JSON output from solc requires implementing compatibility layers. This process is resource-intensive due to the variety of APIs and the subtle differences in all their dialects that accumulate over time. It seems to me that standard JSON is really the smallest denominator from which to start.

If Sourcify could provide solc-compliant output directly, it would address these issues by ensuring a consistent format across local and remote environments. This would eliminate the need for extensive compatibility layers, reducing development time and complexity. This would make dev tool integration with Sourcify way simpler.

Just to give you a better feeling of what I mean, here is a small example:

// Response to: GET /contract/{chainId}/{address}
{
  "verifyInput": { 
    // Standard JSON input that was used to verify the contract
    // It's imporant that this input strictly follows the standard JSON schema
    // and can be directly fed back into the compiler
  },
  "input": {
    // Standard JSON input that was used to recompile the contract 
    // The input should contain additional flags for sourcemaps, and function debug information
    // that is commonly needed by developer tools
  },
  "output": {
    // Standard JSON output that was produced during recompilation
    // including the recompiled bytecodes, source maps, and function debug information
  },
  // Everyhting below is additional information that cannot be computed from the input/output
  // Redundant information, such as listing recompiled bytecode should be avoided
  "match": "match",
  "address": "0xDFEBAd708F803af22e81044aD228Ff77C83C935c",
  "chainId": "11155111",
  "deployment": {
    "deployer": "0xDFEBAd708F803af22e81044aD228Ff77C83C935c",
    "blockNumber": 0,
    "transactionHash": "0xb6ee9d528b336942dd70d3b41e2811be10a473776352009fd73f85604f5ed206",
    "transactionIndex": 0
  },
  "verifiedAt": "2024-07-24T12:00:00Z",
  "compilation": {
    // No need to repeat sources, language here. It should be part of the standard json
    "name": "MyContract",
    "compilerVersion": "v0.8.12+commit.f00d7308",
    "fullyQualifiedName": "contracts/MyContract.sol:MyContract"
  },
  "runtimeMatch": "match", // Should this be moved under runtimeBytecode?
  "creationMatch": "match", // Should this be moved under creationBytecode?
  "runtimeBytecode": {
    // No need to repeat source map or recompiled bytecode here. It should be part of the standard json output
    "onchainBytecode": "...",
    "transformations": [ /* ... */],
    "transformationValues": {
  },
  "creationBytecode": {
    // No need to repeat source map or recompiled bytecode here. It should be part of the standard json output
    "onchainBytecode": "...",
    "transformations": [ /* ... */],
    "transformationValues" []
  }
}

@kuzdogan
Copy link
Member Author

kuzdogan commented Sep 19, 2024

@RaoulSchaffranek Thanks for the suggestion! I really liked the direction of the comment and agree with it. We also had a difficult time how to organize the fields and we haven't thought about thinking input/output way.

For me it's not totally clear what the differences might be between verifyInput and input. Even though we don't save the full std-json input, we should be able to generate it reliably. You are mentioning JSON input...to verify the contract vs JSON input...to recompile the contract which are essentially the same thing in our pipeline.

Also

The input should contain additional flags for sourcemaps, and function debug information

By additional flags do you mean the outputSelection fields? We actually don't store it inside our DB's compiled_contracts.compilerSettings but it is consistent across all compilations/verifications and it is not a required input field (although if ommited will generate empty output).

@RaoulSchaffranek
Copy link

For me it's not totally clear what the differences might be between verifyInput and input.

By verifyInput I meant a verbatim copy of the standard JSON input that was originally submitted as a request body to the POST /verify/{chainId}/{address} endpoint. In contrast, by input, I was referring to the already instrumented standard JSON you feed to the solidity compiler for recompilation. I'm not sure if there are more differences besides the outputSelection-section. Giving it another thought, it might be confusing and mostly redundant to include both fields in the output.

Even though we don't save the full std-json input, we should be able to generate it reliably.

That would be great.

By additional flags do you mean the outputSelection fields?

Yes, I meant the outputSelection and all other instrumentation you apply to the std-json for the purpose of recompilation (I'm not sure if there are any more transformations involved). Essentially, it would be super handy if I could just take the response from Sourcify and feed it back into the Solidity compiler locally. Maybe it helps to understand our usecase: For a source-level Solidity debugger it's not sufficient to fetch the source code from Sourcify, we sometimes want to recompile locally with different optimization settings to get more tractable source mapping. Currently, that's a bit of a hassle because we need to manually convert Sourcify's responses to valid std-json first. My hope is to avoid this extra step.

Another argument in favor of the std-json approach is that it ensures long-term alignment with the Solidity compiler output. If the solc team decides to add new fields to the format, it should be relatively straightforward to add them to the Sourcify response. This might become relevant very soon, when the compiler starts outputting the EthDebug annotations.

@kuzdogan
Copy link
Member Author

Ok, even though we don't save the

verbatim copy of the standard JSON input that was originally submitted as a request body

we will be extracting the fields from there without touching anything, except the outputSelection which (in theory, lol 😄) should not affect the contents of the output like bytecode. The compiler can generate huge outputs so we won't let the user pass their outputSelections IMO.

But yes point taken, the in our response the input will directly be feeded to the compiler and the output the fields from the compiler as is. Anything else we derive, we'll put in other fields.

Just a small side note for the future us, and if anyone is informed about this, would be the compatibility with the Vyper format. AFAIK the formats are fairly compatible, and the user can expect a different Vyper conforming format if language: Vyper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: COMPLETED
Development

No branches or pull requests

4 participants