Skip to content
This repository has been archived by the owner on Jan 19, 2025. It is now read-only.

LZ_security - topic_rewards/SafeApplyFuncOnAllActiveEpochEndingTopics used the wrong parameters #97

Open
sherlock-admin4 opened this issue Jul 19, 2024 · 22 comments
Labels
Escalation Resolved This issue's escalations have been approved/rejected Medium A Medium severity issue. Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed

Comments

@sherlock-admin4
Copy link
Contributor

sherlock-admin4 commented Jul 19, 2024

LZ_security

Medium

topic_rewards/SafeApplyFuncOnAllActiveEpochEndingTopics used the wrong parameters

Summary

SafeApplyFuncOnAllActiveEpochEndingTopics used the wrong parameters, can lead to acbi deal with too little or too much topic at a time,
handling too many topics, in the worst case, in a halt of the chain.

Vulnerability Detail

topic_rewards.go GetAndUpdateActiveTopicWeights function calls in the SafeApplyFuncOnAllActiveEpochEndingTopics function,

SafeApplyFuncOnAllActiveEpochEndingTopics function in the last two parameters for topicPageLimit and maxTopicPages

The number of topics that can be processed at a time is limited to topicPageLimit * maxTopicPages :

    if topicsActive == nil || i > maxTopicPages {
        break
    }

But the problem is that call SafeApplyFuncOnAllActiveEpochEndingTopics function used the wrong parameters:

	err = SafeApplyFuncOnAllActiveEpochEndingTopics(ctx, k, block, fn, moduleParams.DefaultPageLimit, moduleParams.DefaultPageLimit)

The caller USES moduleParams.DefaultPageLimit as maxTopicPages

This limits the number of topics to be processed each time: topicPageLimit * topicPageLimit

This can cause acbi/EndBlocker to process too many topics at a time.

acbi/EndBlocker -> rewards.GetAndUpdateActiveTopicWeights -> SafeApplyFuncOnAllActiveEpochEndingTopics -> with error parameters

Another problem is that if DefaultPageLimit > MaxPageLimit, CalcAppropriatePaginationForUint64Cursor function, can let the limit = MaxPageLimit:

func (k Keeper) CalcAppropriatePaginationForUint64Cursor(ctx context.Context, pagination *types.SimpleCursorPaginationRequest) (uint64, uint64, error) {
	moduleParams, err := k.GetParams(ctx)
	if err != nil {
		return uint64(0), uint64(0), err
	}
@>  limit := moduleParams.DefaultPageLimit
	cursor := uint64(0)

	if pagination != nil {
		if len(pagination.Key) > 0 {
			cursor = binary.BigEndian.Uint64(pagination.Key)
		}
		if pagination.Limit > 0 {
			limit = pagination.Limit
		}
		if limit > moduleParams.MaxPageLimit {
@>			limit = moduleParams.MaxPageLimit
		}
	}

	return limit, cursor, nil
}

However, if maxTopicPages = DefaultPageLimit, there is no such restriction,since maxTopicPages is in the outer for loop, the problem is made worse.

Impact

acbi/EndBlocker handling too many topics, in the worst case, in a halt of the chain.

Code Snippet

https://github.com/sherlock-audit/2024-06-allora/blob/main/allora-chain/x/emissions/module/rewards/topic_rewards.go#L206-L213

Tool used

Manual Review

Recommendation

Use the correct parameters

@github-actions github-actions bot added the Medium A Medium severity issue. label Jul 21, 2024
@sherlock-admin3 sherlock-admin3 added Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed labels Aug 7, 2024
@sherlock-admin4 sherlock-admin4 changed the title Tame Tan Seagull - topic_rewards/SafeApplyFuncOnAllActiveEpochEndingTopics used the wrong parameters LZ_security - topic_rewards/SafeApplyFuncOnAllActiveEpochEndingTopics used the wrong parameters Aug 9, 2024
@sherlock-admin4 sherlock-admin4 added the Reward A payout will be made for this issue label Aug 9, 2024
@sherlock-admin3
Copy link
Contributor

Escalate

This issue should be high :

  1. If the DefaultPageLimit is set to a large value, this issue could potentially cause a halt of the chain, which is as severe as issue imsrybr0 - Attacker can slow down / halt the chain by queuing multiple stake removals or delegate stake removals #56.
  2. If the DefaultPageLimit is set to a small value, the remaining topics might not be processed, which would have an impact and severity similar to issue defsec - Pagination method fails to return complete pages for non-consecutive active topic IDs #60. Therefore, this issue should also be considered high severity.

The escalation could not be created because you are not exceeding the escalation threshold.

You can view the required number of additional valid issues/judging contest payouts in your Profile page,
in the Sherlock webapp.

@ZeroTrust01
Copy link

Escalate

This issue should be high :

  1. If the DefaultPageLimit is set to a large value, this issue could potentially cause a halt of the chain, which is as severe as issue imsrybr0 - Attacker can slow down / halt the chain by queuing multiple stake removals or delegate stake removals #56.

  2. If the DefaultPageLimit is set to a small value, the remaining topics might not be processed, which would have an impact and severity similar to issue defsec - Pagination method fails to return complete pages for non-consecutive active topic IDs #60. Therefore, this issue should also be considered high severity.

@sherlock-admin3
Copy link
Contributor

Escalate

This issue should be high :

  1. If the DefaultPageLimit is set to a large value, this issue could potentially cause a halt of the chain, which is as severe as issue imsrybr0 - Attacker can slow down / halt the chain by queuing multiple stake removals or delegate stake removals #56.

  2. If the DefaultPageLimit is set to a small value, the remaining topics might not be processed, which would have an impact and severity similar to issue defsec - Pagination method fails to return complete pages for non-consecutive active topic IDs #60. Therefore, this issue should also be considered high severity.

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

@sherlock-admin4 sherlock-admin4 added the Escalated This issue contains a pending escalation label Aug 11, 2024
@mystery0x
Copy link
Collaborator

Escalate

This issue should be high :

  1. If the DefaultPageLimit is set to a large value, this issue could potentially cause a halt of the chain, which is as severe as issue imsrybr0 - Attacker can slow down / halt the chain by queuing multiple stake removals or delegate stake removals #56.
  2. If the DefaultPageLimit is set to a small value, the remaining topics might not be processed, which would have an impact and severity similar to issue defsec - Pagination method fails to return complete pages for non-consecutive active topic IDs #60. Therefore, this issue should also be considered high severity.

The problem depends heavily on the DefaultPageLimit being configured to a large/small value, which isn't as directly exploitable and straightforward as #56 & #60, which both have more direct attacks making them higher severity. I would classify as medium severity.

@WangSecurity
Copy link

Excuse me for a silly question, but want to confirm since not very familiar with Go, but DefaultPageLimit, MaxPageLimit and topicPageLimit are set by the admin? If not, then by whom?

@ZeroTrust01
Copy link

Excuse me for a silly question, but want to confirm since not very familiar with Go, but DefaultPageLimit, MaxPageLimit and topicPageLimit are set by the admin? If not, then by whom?

Yes. DefaultPageLimit, MaxPageLimit and topicPageLimit are set by the admin.
But there is a mistake in codebase. There should be topicPageLimit and maxTopicPages, instead of topicPageLimit and topicPageLimit. Typically, topicPageLimit is set relatively large, while TopicPages is set relatively small.​

@WangSecurity
Copy link

WangSecurity commented Aug 16, 2024

Fair enough, but as I understand, the code can function perfectly depending on the values set by the admin. Hence, it should be invalid, based on the following rule:

(External) Admin trust assumptions: When a function is access restricted, only values for specific function variables mentioned in the README can be taken into account when identifying an attack path.
If no values are provided, the (external) admin is trusted to use values that will not cause any issues.
Note: if the attack path is possible with any possible value, it will be a valid issue.

Correct me if I'm wrong and the issue is with any admin value.

For now, I believe it's not, hence, planning to reject the escalation cause it asks to increase the severity and invalidate the issue.

@zhaojio
Copy link

zhaojio commented Aug 16, 2024

Fair enough, but as I understand, the code can function perfectly depending on the values set by the admin. Hence, it should be invalid, based on the following rule:

(External) Admin trust assumptions: When a function is access restricted, only values for specific function variables mentioned in the README can be taken into account when identifying an attack path.
If no values are provided, the (external) admin is trusted to use values that will not cause any issues.
Note: if the attack path is possible with any possible value, it will be a valid issue.

Correct me if I'm wrong and the issue is with any admin value.

For now, I believe it's not, hence, planning to reject the escalation cause it asks to increase the severity and invalidate the issue.

admin is certainly trusted, but the code uses the wrong parameter, and the trusted parameter is used twice.

@ZeroTrust01
Copy link

Fair enough, but as I understand, the code can function perfectly depending on the values set by the admin. Hence, it should be invalid, based on the following rule:

(External) Admin trust assumptions: When a function is access restricted, only values for specific function variables mentioned in the README can be taken into account when identifying an attack path.
If no values are provided, the (external) admin is trusted to use values that will not cause any issues.
Note: if the attack path is possible with any possible value, it will be a valid issue.

Correct me if I'm wrong and the issue is with any admin value.

For now, I believe it's not, hence, planning to reject the escalation cause it asks to increase the severity and invalidate the issue.

I cannot agree that it is invalid.

First, this issue does not assume that the admin did anything wrong; rather, it is about discovering an error in the code.

Secondly, I understand your point—despite the error in the code, the admin could potentially mitigate the issue by adjusting the parameters. However, since the number of topics is continuously increasing, the admin cannot fix the problem simply by modifying the DefaultPageLimit parameter.

@WangSecurity
Copy link

Firstly, excuse me for the confusion, I didn't imply any mistake from the admin.

Secondly:

However, since the number of topics is continuously increasing, the admin cannot fix the problem simply by modifying the DefaultPageLimit parameter

Can you share such a scenario?

@ZeroTrust01
Copy link

Firstly, excuse me for the confusion, I didn't imply any mistake from the admin.

Secondly:

However, since the number of topics is continuously increasing, the admin cannot fix the problem simply by modifying the DefaultPageLimit parameter

Can you share such a scenario?

According to the comments in the code(https://github.com/sherlock-audit/2024-06-allora/blob/main/allora-chain/x/emissions/module/rewards/topic_rewards.go#L206-L213):

// default page limit for the max because default is 100 and max is 1000
// 1000 is excessive for the topic query

Let’s assume that the initial value of DefaultPageLimit is 100, and the number of topics has already reached 9990.

The admin notices that the number of topics will exceed DefaultPageLimit * DefaultPageLimit = 100 * 100 = 10,000.
Then, the admin sets the DefaultPageLimit to 1000(max).
However, when the number of topics reaches 100,000, this may start to cause a halt of the chain.

@WangSecurity
Copy link

Oh, excuse me for not noticing the code comments. But the code comments indicate that it's the exact intention to use Default limit instead of Max limit. In this case, I believe it's not an issue or a mistake, it's the exact design of the protocol.

Hence, this should remain invalid, planning to reject the escalation, since it asked to increase the severity, but invalidate the severity.

@ZeroTrust01
Copy link

ZeroTrust01 commented Aug 19, 2024

We’re back to the starting point of the issue.

The issue we pointed out is not about using the default limit instead of the max limit.
PageLimit refers to how many items (topics) are on one page, and Pages refers to how many pages there are. The mistake is that both parameters were using DefaultPageLimit when calling SafeApplyFuncOnAllActiveEpochEndingTopics() in codebase.

err=SafeApplyFuncOnAllActiveEpochEndingTopics(...,moduleParams.DefaultPageLimit,moduleParams.DefaultPageLimit)

This is also why the sponsor confirmed the issue and will fix it.

When I gave the scenario example, I mentioned the max limit and pointed out that even if DefaultPageLimit is set to the max limit, the issue could still arise.
I agree with the leader judge’s decision. This issue may not qualify as “high,” but it is definitely valid.

@WangSecurity
Copy link

But doesn't the report say that the exact problem is that the Default limit is used as Max limit, no?

SafeApplyFuncOnAllActiveEpochEndingTopics function in the last two parameters for topicPageLimit and maxTopicPages
But the problem is that call SafeApplyFuncOnAllActiveEpochEndingTopics function used the wrong parameters
The caller USES moduleParams.DefaultPageLimit as maxTopicPages

Then the report describes different scenarios where the Default limit equals or is bigger than the max limit. So, after reading the report I believe the problem is exactly in the Default limit being used instead of the max limit. The report doesn't say that the problem in both parameters, only the problem in one is mentioned.

Additionally, if we use the max instead of the default, i.e. the correct parameter, the issue is even more likely to occur since max is expected to be 10 times larger than the default.

Hence, my decision remains the same. The code comments indicate that the default instead of Max is used intentionally and the report talks about this problem, not about another parameter. Hence, planning to reject the escalation since it asked for higher severity, but will invalidate this issue.

@ZeroTrust01
Copy link

I think you misunderstood the issue a bit.
Let’s take a look at the definition of the function SafeApplyFuncOnAllActiveEpochEndingTopics().
https://github.com/sherlock-audit/2024-06-allora/blob/4e1bc73db32873476f8b0a88945815d3978d931c/allora-chain/x/emissions/module/rewards/topic_rewards.go#L53

func SafeApplyFuncOnAllActiveEpochEndingTopics(
	ctx sdk.Context,
	k keeper.Keeper,
	block BlockHeight,
	fn func(sdkCtx sdk.Context, topic *types.Topic) error,
	topicPageLimit uint64,
	maxTopicPages uint64,
)

There are two parameters related to the page.
PageLimit refers to how many items (topics) are in one page, and Pages refers to how many pages there are.
So, using DefaultPageLimit for one of the parameters is correct, but using DefaultPageLimit for the other parameter is wrong. This is the issue we pointed out.

@WangSecurity
Copy link

To clarify, I believe using DefaultPageLimit for topicPageLimit is appropriate and works correctly, because we just use the default amount of topics in one page.

I see how you saying that using DefaultPageLimit for maxTopicPages is not correct. But, that's the intention as evidenced in these comments. I understand your concern that it uses the default number of topics in one page as the max number of pages. But, I believe it is exactly what the comments mean. Hence, I still believe it's intended, planning to reject the escalation since it asked for higher severity, but will invalidate this issue.

@ZeroTrust01
Copy link

ZeroTrust01 commented Aug 20, 2024

I see how you saying that using DefaultPageLimit for maxTopicPages is not correct. But, that's the intention as evidenced in these comments. I understand your concern that it uses the default number of topics in one page as the max number of pages. But, I believe it is exactly what the comments mean. Hence, I still believe it's intended, planning to reject the escalation since it asked for higher severity, but will invalidate this issue.

I cannot agree with that point.

1 DefaultPageLimit is 100, when the number of topics are or greater than 10001, GetAndUpdateActiveTopicWeights() will miss active topics.

2 Let’s take another look at what happens inside the SafeApplyFuncOnAllActiveEpochEndingTopics() function.

// Apply a function on all active topics that also have an epoch ending at this block
// Active topics have more than a globally-set minimum weight, a function of revenue and stake
// "Safe" because bounded by max number of pages and apply running, online operations.
func SafeApplyFuncOnAllActiveEpochEndingTopics(
	ctx sdk.Context,
	k keeper.Keeper,
	block BlockHeight,
	fn func(sdkCtx sdk.Context, topic *types.Topic) error,
	topicPageLimit uint64,
	maxTopicPages uint64,
) error {
	topicPageKey := make([]byte, 0)
	i := uint64(0)
	for {
@>>		topicPageRequest := &types.SimpleCursorPaginationRequest{Limit: topicPageLimit, Key: topicPageKey}
@>>		topicsActive, topicPageResponse, err := k.GetIdsOfActiveTopics(ctx, topicPageRequest)
		if err != nil {
			Logger(ctx).Warn(fmt.Sprintf("Error getting ids of active topics: %s", err.Error()))
			continue
		}

		for _, topicId := range topicsActive {
			topic, err := k.GetTopic(ctx, topicId)
			if err != nil {
				Logger(ctx).Warn(fmt.Sprintf("Error getting topic: %s", err.Error()))
				continue
			}

			if k.CheckCadence(block, topic) {
				// All checks passed => Apply function on the topic
				err = fn(ctx, &topic)
				if err != nil {
					Logger(ctx).Warn(fmt.Sprintf("Error applying function on topic: %s", err.Error()))
					continue
				}
			}
		}

		// if pageResponse.NextKey is empty then we have reached the end of the list
		if topicsActive == nil || i > maxTopicPages {
			break
		}
		topicPageKey = topicPageResponse.NextKey
		i++
	}
	return nil
}

DefaultPageLimit is mainly applied in the GetIdsOfActiveTopics() function, which also has an O(n²) sorting operation based on the size of the limit within the function.

func (k Keeper) GetIdsOfActiveTopics(ctx context.Context, pagination *types.SimpleCursorPaginationRequest) ([]TopicId, *types.SimpleCursorPaginationResponse, error) {
	limit, start, err := k.CalcAppropriatePaginationForUint64Cursor(ctx, pagination)
	if err != nil {
		return nil, nil, err
	}

	startKey := make([]byte, binary.MaxVarintLen64)
	binary.BigEndian.PutUint64(startKey, start)
	nextKey := make([]byte, binary.MaxVarintLen64)
	binary.BigEndian.PutUint64(nextKey, start+limit)

@>>	rng, err := k.activeTopics.IterateRaw(ctx, startKey, nextKey, collections.OrderAscending)
	if err != nil {
		return nil, nil, err
	}
	activeTopics, err := rng.Keys()
	if err != nil {
		return nil, nil, err
	}
	defer rng.Close()

	// If there are no topics, we return the nil for next key
	if activeTopics == nil {
		nextKey = make([]byte, 0)
	}

	return activeTopics, &types.SimpleCursorPaginationResponse{
		NextKey: nextKey,
	}, nil
}

This is also why the team dev doesn’t directly sort all totalTopics at once, but instead breaks them into multiple pages for processing—because the pagelimit is relatively small.
Therefore, I believe the comment refers to the first parameter(max means moduleParams.MaxPageLimit which is 1000).

The second parameter, maxTopicPages, can be set relatively large(like 1000) because it does not increase the time complexity of the sorting algorithm. Additionally, when the number of topics is insufficient, topicsActive = nil will directly break the loop.

if topicsActive == nil || i > maxTopicPages {
			break
		}

@relyt29
Copy link

relyt29 commented Aug 21, 2024

We did intentionally set it to moduleParams.DefaultPageLimit, but it was a half-assed hotfix to a real problem to limit it that way. I marked it as "will fix" because this issue highlights bad code that should be fixed because the code is overly complex and poorly written, regardless of whether the issue is a real security bug or not

I would flag it as a confirmed bug, low or medium severity issue, it is true that our administrative parameters can control this, but it also conflates the meaning of DefaultPageLimit, it's bad practice and we should be more carefully setting this with a separate parameter that is just tunable for this particular processing loop alone

@WangSecurity
Copy link

In that case I agree that it should be a valid bug, but I still believe it's medium severity. The admin can partially control the situation, but the issue would arise regularly based on this comment. Planning to reject the escalation and leave the issue as it is.

@WangSecurity
Copy link

Result:
Medium
Unique

@sherlock-admin2 sherlock-admin2 removed the Escalated This issue contains a pending escalation label Aug 22, 2024
@sherlock-admin3 sherlock-admin3 added the Escalation Resolved This issue's escalations have been approved/rejected label Aug 22, 2024
@sherlock-admin4
Copy link
Contributor Author

Escalations have been resolved successfully!

Escalation status:

@sherlock-admin2
Copy link

The protocol team fixed this issue in the following PRs/commits:
allora-network/allora-chain#542

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Escalation Resolved This issue's escalations have been approved/rejected Medium A Medium severity issue. Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed
Projects
None yet
Development

No branches or pull requests

8 participants