Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sanity checker for network config #279

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

AlexanderLukin
Copy link
Contributor

Implement a sanity checker that checks that the information stored in the DB corresponds the chain for which the app is run. If the app is run for one chain, and the DB contains data for another chain, the sanity checker prevents the app from start and throws the error.

Implement a sanity checker that checks that the information stored in
the DB corresponds the chain for which the app is run. If the app is run
for one chain, and the DB contains data for another chain, the sanity
checker prevents the app from start and throws the error.
@@ -0,0 +1,4 @@
export const MODULE_ADDRESSES_FOR_CHAINS = {
Copy link
Contributor

@Amuhar Amuhar Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear which module is it . also we use contract addresses from lido-nestjs-modules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Will rename it to CURATED_MODULE_ADDRESSES_FOR_CHAINS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to get rid of this custom app-level constant and use the REGISTRY_CONTRACT_ADDRESSES constant from the "@lido-nestjs/contracts".


const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([
await this.keyStorageService.find({}, { limit: 1 }),
await this.moduleStorageService.findOneById(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we agreed, I renamed the findOneById method from the SRModuleStorageService to findOneByModuleId. With the new method name, I don't see any problems with passing just "1" as the argument of this method. I think now it should be obvious that it is a module ID.

@@ -0,0 +1,84 @@
import { MikroORM, UseRequestContext } from '@mikro-orm/core';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets write tests on this service ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good idea. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented unit tests for the NetworkValidation service.

) {}

@UseRequestContext()
public async validate(): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's break the method into smaller methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic of checking that chin ID from the env variables, EL chain ID, and CL chain ID match each other to the separate private method. Despite this change, I feel that it doesn't make much sense to split the logic of the validate method into smaller methods and functions. I think that the logic of this method is holistic and it would be harder to understand it if it is artificially split into separate methods and functions.
If you have any particular ideas about what parts of this code should be extracted to a separate method, let's discuss.


if (appInfo != null) {
if (appInfo.chainId !== configChainId || appInfo.locatorAddress !== this.contract.address) {
throw new Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets customize this error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I created two new error types: ChainMismatchError and InconsistentDataInDBErrorTypes. The first one is thrown if a chain ID from config, EL chain ID, and CL chain ID don't match each other. The second one is thrown if the chain ID for which the app is run doesn't match the info stored in the DB or data in the DB are corrupted.

}
}

const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([
Copy link
Contributor

@Amuhar Amuhar Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbKeys, dbOperators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thansk.

export class NetworkValidationService {
constructor(
protected readonly orm: MikroORM,
@Inject(LIDO_LOCATOR_CONTRACT_TOKEN) protected readonly contract: LidoLocator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename contract to locatorContract ?
ALso we have LidoLocatorService , maybe we should get address from there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the contract to locatorContract.

As far as using the LidoLocatorService instead of the LidoLocator from "@lido-nestjs/contracts". It has only one getStakingRouter method to get the staking router address. And this method internally logs the contract.address from LidoLocator as the locator address.

I'm not sure, do you think that it is better to use the getStakingRouter method to get the locator address instead of the contract.address? If yes, why?

return;
}

if (dbCuratedModule == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't understand why we check only one table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we didn't get into this if
if (dbKey.length === 0 && dbCuratedModule == null && dbOperator.length === 0)
it means the DB has non-empty keys, modules, or operators table. If the dbCuratedModule turns out null, the later code where we get dbCuratedModule.stakingModuleAddress will fail with an error. To prevent this we check that the dbCuratedModule is not null before getting the stakingModuleAddress key of this object.

I agree that having only the check for the dbCuratedModule emptiness and not checking dbKey and dbOperator emptiness may be confusing. Will add checkings of non-emptiness for these variables as well and throw the specific error in each of these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added two extra checks that keys and operators tables are not empty.

);
}

this.logger.log('DB is not empty and chain info is not found in DB, write chain info into DB');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a comment explaining in what cases this situation is possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree. I've added more explanation notes to each important case of the validate method.

AlexanderLukin and others added 14 commits June 8, 2024 00:46
Rename the `findOneById` method of the `SRModuleStorageService` to
`findOneByModuleId`.
Replace the internal `CURATED_MODULE_ADDRESSES_FOR_CHAINS` constant with
curated module addresses with the appropriate
`REGISTRY_CONTRACT_ADDRESSES` constant from `@lido-nestjs` repo.
Remove unnecessary `consensusProviderService` variable and fix lint
errors in unit tests for `NetworkValidation` service.
Move checking that the network ID specified in .env config matches the
EL chain ID one level up in the `validate` method of the
`NetworkValidation` service.
Move checking that chain ID from the config, EL chain ID and CL chain ID
match each other to the separate `checkChainIdMismatch` private method
of the `NetworkValidation` service.
Add two new custom error types (`ChainMismatchError` and
`InconsistentDataInDBError`) with properties that help better indicate
specific of the error. Update test cases to make sure that correct error
types with correct properties are returned in each test case.
Rename `dbKey` local constant to `dbKeys`;
Rename `dbOperator` local constant to `dbOperators`.
Add new specific error types for cases when keys or operators table is
empty. Change error messages to better reflect the essence of the error.
Add new tests for testing these new error types.
Fix test that tests the case when chain ID and EL ID don't match each
other. This test should not depend on the value of the
`VALIDATOR_REGISTRY_ENABLE` env variable.
@Ziars Ziars requested a review from a team as a code owner June 18, 2024 12:23
@Amuhar
Copy link
Contributor

Amuhar commented Jul 9, 2024

I think this solution is mostly good and valid. The code is well-structured and isolated. Changes in the sanity checker will not influence other parts of the code. It contains a lot of checks, but they are straightforward.

It looks valid to check for empty keys, operators, and module tables. Currently, we write data in the table in one transaction, but in future implementations, this can be changed, and checking only the ElMeta table will not be sufficient.

It is important to note that any of the tables can't be empty on Holesky or mainnet. But if we run Lido from scratch on a new network or on devnet, in the beginning, the keys API may have data only in the ElMeta table. In this case, if we stop the Keys API and run it again, the first check of the sanity checker will not be correct as it doesn't check the ElMeta table. It is proposed to add the ElMeta check too, or prioritize the AppInfo values check (will it be valid?). If we prioritize the AppInfo table check during the second run, we can check that the chainId and locator address were not changed.

Is it possible to have an empty AppInfo table and empty keys, operators, modules tables, and a non-empty ElMeta? Only if we run an old release for new Lido deployment.

Also, we need to pay attention to the test coverage of the sanity checker. Are all possible cases well covered?

Why do we need such a difficult solution? If the database already has data, we somehow need to check if it is consistent with the chain we want to run the Keys API. For this case, we check the moduleAddress in the 'key' table.
Also, this solution checks consistency for the case where the database was partially cleaned accidentally.

@AlexanderLukin
Copy link
Contributor Author

Hey-hey! Thank you for your strong attention to this sanity checker and such a detailed and comprehensive review! I believe it's very important for the good product quality.

I agree that in case of the scratch deployment of the Lido protocol on a devnet, it is possible that the keys, operators, and modules table might be empty, but the ElMeta table might not. In this case, the sanity checker may not work correctly and this case should be covered.

I also agree with your point regarding the test coverage. Currently, all cases of the sanity checker code are covered by unit tests. But new implementation with the additional ElMeta emptiness checking must also be covered by new tests.

Also, this solution checks consistency for the case where the database was partially cleaned accidentally.

Just to be clear here. This sanity checker is not dedicated to handling cases when the DB is corrupted by side interventions (i. e. when other applications and processes, not Keys API, corrupt data in the DB somehow). You're correct that the sanity checker in its current form is able to prevent some cases of such corruption, but there are way more edge cases, that are not covered (and the code is not designed to cover them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants