Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(throttle): extend throttling to API requests #11325

Merged
merged 1 commit into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions datahub-upgrade/src/main/resources/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ management.health.neo4j.enabled=false
ingestion.enabled=false
spring.main.allow-bean-definition-overriding=true
entityClient.impl=restli
metadataChangeProposal.throttle.updateIntervalMs=0
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
package com.linkedin.datahub.upgrade;

import static org.testng.AssertJUnit.*;
import static org.testng.Assert.assertEquals;
import static org.testng.Assert.assertFalse;
import static org.testng.Assert.assertNotNull;
import static org.testng.Assert.assertTrue;

import com.linkedin.datahub.upgrade.restoreindices.RestoreIndices;
import com.linkedin.datahub.upgrade.system.BlockingSystemUpgrade;
import com.linkedin.metadata.dao.throttle.NoOpSensor;
import com.linkedin.metadata.dao.throttle.ThrottleSensor;
import com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder;
import javax.inject.Named;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.testng.AbstractTestNGSpringContextTests;
Expand All @@ -28,6 +34,10 @@ public class UpgradeCliApplicationTest extends AbstractTestNGSpringContextTests

@Autowired private ESIndexBuilder esIndexBuilder;

@Qualifier("kafkaThrottle")
@Autowired
private ThrottleSensor kafkaThrottle;

@Test
public void testRestoreIndicesInit() {
/*
Expand All @@ -46,4 +56,10 @@ public void testBuildIndicesInit() {
assertFalse(
esIndexBuilder.getElasticSearchConfiguration().getBuildIndices().isAllowDocCountMismatch());
}

@Test
public void testNoThrottle() {
assertEquals(
new NoOpSensor(), kafkaThrottle, "No kafka throttle controls expected in datahub-upgrade");
}
}
37 changes: 19 additions & 18 deletions docs/authorization/policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,15 +146,15 @@ These privileges are for DataHub operators to access & manage the administrative

#### Access & Credentials

| Platform Privileges | Description |
|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Generate Personal Access Tokens | Allow actor to generate personal access tokens for use with DataHub APIs. |
| Manage Policies | Allow actor to create and remove access control policies. Be careful - Actors with this privilege are effectively super users. |
| Manage Secrets | Allow actor to create & remove Secrets stored inside DataHub. |
| Manage Users & Groups | Allow actor to create, remove, and update users and groups on DataHub. |
| Manage All Access Tokens | Allow actor to create, list and revoke access tokens on behalf of users in DataHub. Be careful - Actors with this privilege are effectively super users that can impersonate other users. |
| Manage User Credentials | Allow actor to manage credentials for native DataHub users, including inviting new users and resetting passwords | |
| Manage Connections | Allow actor to manage connections to external DataHub platforms. |
| Platform Privileges | Description |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Generate Personal Access Tokens | Allow actor to generate personal access tokens for use with DataHub APIs. |
| Manage Policies | Allow actor to create and remove access control policies. Be careful - Actors with this privilege are effectively super users. |
| Manage Secrets | Allow actor to create & remove Secrets stored inside DataHub. |
| Manage Users & Groups | Allow actor to create, remove, and update users and groups on DataHub. |
| Manage All Access Tokens | Allow actor to create, list and revoke access tokens on behalf of users in DataHub. Be careful - Actors with this privilege are effectively super users that can impersonate other users. |
| Manage User Credentials | Allow actor to manage credentials for native DataHub users, including inviting new users and resetting passwords | |
| Manage Connections | Allow actor to manage connections to external DataHub platforms. |

#### Product Features

Expand Down Expand Up @@ -191,15 +191,16 @@ These privileges are for DataHub operators to access & manage the administrative

#### System Management

| Platform Privileges | Description |
|-----------------------------------------------|--------------------------------------------------------------------------|
| Restore Indices API[^1] | Allow actor to use the Restore Indices API. | |
| Get Timeseries index sizes API[^1] | Allow actor to use the get Timeseries indices size API. |
| Truncate timeseries aspect index size API[^1] | Allow actor to use the API to truncate a timeseries index. |
| Get ES task status API[^1] | Allow actor to use the get task status API for an ElasticSearch task. |
| Enable/Disable Writeability API[^1] | Allow actor to enable or disable GMS writeability for data migrations. |
| Apply Retention API[^1] | Allow actor to apply retention using the API. |
| Analytics API access[^1] | Allow actor to use API read access to raw analytics data. |
| Platform Privileges | Description |
|-----------------------------------------------|------------------------------------------------------------------------|
| Restore Indices API[^1] | Allow actor to use the Restore Indices API. | |
| Get Timeseries index sizes API[^1] | Allow actor to use the get Timeseries indices size API. |
| Truncate timeseries aspect index size API[^1] | Allow actor to use the API to truncate a timeseries index. |
| Get ES task status API[^1] | Allow actor to use the get task status API for an ElasticSearch task. |
| Enable/Disable Writeability API[^1] | Allow actor to enable or disable GMS writeability for data migrations. |
| Apply Retention API[^1] | Allow actor to apply retention using the API. |
| Analytics API access[^1] | Allow actor to use API read access to raw analytics data. |
| Manage System Operations | Allow actor to manage system operation controls. |

[^1]: Only active if REST_API_AUTHORIZATION_ENABLED is true
[^2]: DataHub Cloud only
Expand Down
22 changes: 11 additions & 11 deletions docs/deploy/environment-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,21 @@ DataHub works.
| `UI_INGESTION_ENABLED` | `true` | boolean | [`GMS`, `MCE Consumer`] | Enable UI based ingestion. |
| `DATAHUB_ANALYTICS_ENABLED` | `true` | boolean | [`Frontend`, `GMS`] | Collect DataHub usage to populate the analytics dashboard. |
| `BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE` | `true` | boolean | [`GMS`, `MCE Consumer`, `MAE Consumer`] | Do not wait for the `system-update` to complete before starting. This should typically only be disabled during development. |
| `ER_MODEL_RELATIONSHIP_FEATURE_ENABLED` | `false` | boolean | [`Frontend`, `GMS`] | Enable ER Model Relation Feature that shows Relationships Tab within a Dataset UI. |
| `ER_MODEL_RELATIONSHIP_FEATURE_ENABLED` | `false` | boolean | [`Frontend`, `GMS`] | Enable ER Model Relation Feature that shows Relationships Tab within a Dataset UI. |


## Ingestion

| Variable | Default | Unit/Type | Components | Description |
|------------------------------------|---------|-----------|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ASYNC_INGEST_DEFAULT` | `false` | boolean | [`GMS`] | Asynchronously process ingestProposals by writing the ingestion MCP to Kafka. Typically enabled with standalone consumers. |
| `MCP_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MCE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MCE Consumer`. |
| `MCL_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MAE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MAE Consumer`. |
| `PE_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MAE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MAE Consumer`. |
| `ES_BULK_REQUESTS_LIMIT` | 1000 | docs | [`GMS`, `MAE Consumer`] | Number of bulk documents to index. `MAE Consumer` if standalone. |
| `ES_BULK_FLUSH_PERIOD` | 1 | seconds | [`GMS`, `MAE Consumer`] | How frequently indexed documents are made available for query. |
| `ALWAYS_EMIT_CHANGE_LOG` | `false` | boolean | [`GMS`] | Enables always emitting a MCL even when no changes are detected. Used for Time Based Lineage when no changes occur. | |
| `GRAPH_SERVICE_DIFF_MODE_ENABLED` | `true` | boolean | [`GMS`] | Enables diff mode for graph writes, uses a different code path that produces a diff from previous to next to write relationships instead of wholesale deleting edges and reading. |
| Variable | Default | Unit/Type | Components | Description |
|-----------------------------------|---------|-----------|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ASYNC_INGEST_DEFAULT` | `false` | boolean | [`GMS`] | Asynchronously process ingestProposals by writing the ingestion MCP to Kafka. Typically enabled with standalone consumers. |
| `MCP_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MCE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MCE Consumer`. |
| `MCL_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MAE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MAE Consumer`. |
| `PE_CONSUMER_ENABLED` | `true` | boolean | [`GMS`, `MAE Consumer`] | When running in standalone mode, disabled on `GMS` and enabled on separate `MAE Consumer`. |
| `ES_BULK_REQUESTS_LIMIT` | 1000 | docs | [`GMS`, `MAE Consumer`] | Number of bulk documents to index. `MAE Consumer` if standalone. |
| `ES_BULK_FLUSH_PERIOD` | 1 | seconds | [`GMS`, `MAE Consumer`] | How frequently indexed documents are made available for query. |
| `ALWAYS_EMIT_CHANGE_LOG` | `false` | boolean | [`GMS`] | Enables always emitting a MCL even when no changes are detected. Used for Time Based Lineage when no changes occur. | |
| `GRAPH_SERVICE_DIFF_MODE_ENABLED` | `true` | boolean | [`GMS`] | Enables diff mode for graph writes, uses a different code path that produces a diff from previous to next to write relationships instead of wholesale deleting edges and reading. |

## Caching

Expand Down
1 change: 1 addition & 0 deletions lombok.config
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
config.stopBubbling = true
lombok.addLombokGeneratedAnnotation = true
lombok.copyableAnnotations += org.springframework.beans.factory.annotation.Qualifier
Loading
Loading