Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bigquery subscription type support #476

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

POD666
Copy link

@POD666 POD666 commented Oct 11, 2022

Description of your changes

Trying to add support for bigquery subscriptions. See feature request: #469

Fixes #469

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

✅ make reviewable test
🔴 make e2e
fails with timeouts, seems to be expecting linux build but I'm on osx
🔴 build/run make -j4
fails with unknown flag: --load here https://github.com/crossplane-contrib/provider-gcp/blob/master/cluster/images/provider-gcp/Makefile#L16

I'm not sure how to proceed with testing.

@Feggah
Copy link
Collaborator

Feggah commented Oct 11, 2022

Hey @POD666, thanks for opening the PR!

I'm not sure how to proceed with testing.

Did you mean testing with the unit tests or manual testing in GCP? Can you give me more details to help you?

@POD666
Copy link
Author

POD666 commented Oct 11, 2022

@Feggah I failed to run e2e locally.
make e2e fails with timeout and looks like it expects Linux build. But I have darwin build with simple make.
I tried running a build in docker with build/run make -j4 but it also fails with another error.
It looks like your CI skipped e2e too.

Should I check it manually in GCP somehow?

@Feggah
Copy link
Collaborator

Feggah commented Oct 11, 2022

Got it, I think running make reviewable test is enough TBH.


Testing manually is a good idea. Have you tried to use BigQueryConfig in the PubSub?

You can test the following cases:

  1. Create a PubSub specifying BigQueryConfig in your crossplane cluster and check in GCP if it was configured
  2. Update the BigQueryConfig configuration in the crossplane cluster and check if in GCP it was also updated
  3. Delete the PubSub in your crossplane cluster and check if it was deleted in GCP

You can also check if the import is working:

  1. Create by hand a PubSub in GCP, do not forget to configure the BigQueryConfig
  2. Create in your crossplane cluster the PubSub with the same name created in GCP (so it should import instead of creating a new PubSub)
  3. Check with kubectl describe if the BigQueryConfig field was populated
  4. Delete the PubSub in your crossplane cluster and check if it was deleted in GCP

@POD666
Copy link
Author

POD666 commented Oct 12, 2022

I have tried to install it in my GCP but it fails with:

cannot initialize parser backend: failed to open package stream file: EOF

not sure what it means.

Here are my steps:

  1. I published local build to my docker registry
docker image tag build-ddce7843/provider-gcp-amd64 us.gcr.io/my-dev/crossplane-danyliuk-gcp-provider:test
docker push us.gcr.io/my-dev/crossplane-danyliuk-gcp-provider:test
  1. kubectl apply -f providers/test.yaml
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-gcp-danyliuk
spec:
  package: us.gcr.io/my-dev/crossplane-danyliuk-gcp-provider:test
  controllerConfigRef:
    name: default
  1. So I see it with kubectl get pkg
NAME                                                                                   INSTALLED   HEALTHY   PACKAGE                                                                    AGE
provider.pkg.crossplane.io/crossplane-provider-gcp-v0-16-0                             True        True      crossplane/provider-gcp:v0.20.0                                            239d

provider.pkg.crossplane.io/provider-gcp-danyliuk                                       True        False     us.gcr.io/my-dev/crossplane-danyliuk-gcp-provider:test   18m

HEALTHY=False
4. In logs I see

cannot initialize parser backend: failed to open package stream file: EOF

I also tried to set this image in provider.packages list of values.yaml for crossplane.

Feels like I use a wrong build or it should be published elsewhere except my docker registry?

Are there any simple instructions I could follow to install my local build? Maybe I missed it.

@Feggah
Copy link
Collaborator

Feggah commented Oct 12, 2022

I find it easier to just run the code locally when I'm developing. You can build, push and install the controller image but its slower than just running locally.

Try this:

  1. Create a local cluster
  2. Install Crossplane
  3. In your develop branch, apply all CRDs generated in this provider: kubectl apply -f package/crds
  4. Run the code: make run

Now you can interact normally with your local cluster (creating, editing and deleting manifests) and the code will be running in your terminal.


If you prefer to build the image and install it, try this:

  1. Create a cluster
  2. Install Crossplane
  3. Run make -j2 build.all in your develop branch
  4. A .xpkg file will be created in the _output folder, change directories to the same where the generated file lives and use the crossplane CLI: kubectl crossplane push provider <image-name>
  5. Install the pushed image to your cluster: kubectl crossplane install provider <image-name>

@POD666
Copy link
Author

POD666 commented Oct 17, 2022

Well, that was pretty challenging.

I had to create a new cluster because it's impossible install two different gcp-providers 😅
A lot of issues with permissions and I also had to publish my docker image to public registry.
I also faced some issues with kubectl being stuck when I tried to remove pubsub topic after crossplace removal, so I force removed them by removing finalisers from definitions.

After all, I found a bug, so it was worth testing manually.

I had to rename BigQueryConfig -> BigqueryConfig because of

cannot update Subscription: googleapi: Error 400: Invalid update_mask provided in the UpdateSubscriptionRequest: 'big_query_config' is not a known Subscription field. Note that field paths must be of the form 'push_config' rather than 'pushConfig'., badRequest

to avoid undesired underscores on CamelCase to snale_case conversion

Signed-off-by: Viktor Danyliuk <[email protected]>
@POD666
Copy link
Author

POD666 commented Oct 19, 2022

Seems to be ready to merge. Anything else is required?

Copy link
Collaborator

@Feggah Feggah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you effort, @POD666!

I added some small comments in the code.


In the test file, I think you don't need to create new Test functions neither a new function to create Parameters or the Subscription struct. You can just add the BigQuery fields to the existent functions and the test will cover the rest. (the same way that all other fields are being tested).

apis/pubsub/v1alpha1/subscription_types.go Show resolved Hide resolved
Comment on lines 151 to 168
type BigqueryConfig struct {
// Bigquery table to deliver messages to.
Table string `json:"table,omitempty"`

// When enabled, the topic schema will be used when writing to BigQuery. Else,
// tes the message bytes to a column called data in BigQuery.
UseTopicSchema bool `json:"useTopicSchema,omitempty"`

// When enabled, the metadata of each message is written to additional columns in
// the BigQuery table. Else, the metadata is not written to the BigQuery table.
// https://cloud.google.com/pubsub/docs/bigquery?hl=ru#write-metadata
WriteMetadata bool `json:"writeMetadata,omitempty"`

// When enabled along with the "Use topic schema" option, any field that is present in
// the topic schema but not in the BigQuery schema will be dropped. Else, messages with extra fields are not written
// and remain in the subscription backlog.
DropUnknownFields bool `json:"dropUnknownFields,omitempty"`
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are optional, right?

So I think we should add the // +optional marker and change all types to be pointers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I will check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works but it changes existing values to "false". I mean I had this:

bigqueryConfig:
      table: "..."
      useTopicSchema: true
      writeMetadata: true
      dropUnknownFields: false

changed to

bigqueryConfig:
      table: "..."
      useTopicSchema: false

and all 3 flags become disabled
image

Is it expected? I expected writeMetadata to stay enabled but maybe I'm wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it. Do you mean that you specified true in the managed resource manifest and in the interface it was changed to Disabled (false)?

@POD666
Copy link
Author

POD666 commented Oct 20, 2022

you don't need to create new Test functions neither a new function to create Parameters or the Subscription struct
You can just add the BigQuery fields to the existent functions and the test will cover the rest.

Not sure. Existing parameters define PushConfig and they should not be used together with BigQueryConfig.

Do I miss anything?

@POD666
Copy link
Author

POD666 commented Oct 20, 2022

So from 3 comments:

  1. I would keep BigqueryConfig case untouched (it didn't work to change only json, anyway such case is used in other codebases)
  2. +Optional seems to work, LMK if it's expected, I will push changes pushed already.
  3. Tests need some separation if I miss nothing because subscription should not define push and bigquery configs at the same time. Actually it seems to work

@Feggah
Copy link
Collaborator

Feggah commented Oct 20, 2022

I would keep BigqueryConfig case untouched (it didn't work to change only json, anyway such case is used in other codebases)

Agreed. Good to me

+Optional seems to work, LMK if it's expected

I saw that you added the //+optional marker, but you also need to change the types to be pointers -- for example, from bool to *bool. All optional fields has to be pointers, take a look in this example.

@POD666
Copy link
Author

POD666 commented Oct 23, 2022

Hmm, that seems to make sense.

There are other places where pointers are not used for optional fields:

// +optional
// +kubebuilder:validation:Pattern=^[0-9]*s$
MessageRetentionDuration string `json:"messageRetentionDuration,omitempty"`
// PushConfig is a parameter which configures push delivery. An empty
// `pushConfig` signifies that the subscriber will pull and ack messages
// using API methods.
// +optional
PushConfig *PushConfig `json:"pushConfig,omitempty"`
// RetainAckedMessages is a message which indicates whether to retain acknowledged
// messages. If true, then messages are not expunged from the
// subscription's backlog, even if they are acknowledged, until they
// fall out of the `message_retention_duration` window.
// +optional
RetainAckedMessages bool `json:"retainAckedMessages,omitempty"`

You might need to add a linter to prevent it.

I have tried to use pointers and it forces to use gcp.BoolValue/gcp.BoolPtr in some places but make reviewable test fails with

pkg/clients/subscription/subscription.go:26: File is not `goimports`-ed with -local github.com/crossplane-contrib/provider-gcp (goimports)
        gcp "github.com/crossplane-contrib/provider-gcp/pkg/clients"
pkg/clients/subscription/subscription_test.go:22: File is not `goimports`-ed with -local github.com/crossplane-contrib/provider-gcp (goimports)
        gcp "github.com/crossplane-contrib/provider-gcp/pkg/clients"
16:27:10 [FAIL]
make[2]: *** [go.lint] Error 1
make[1]: *** [lint] Error 2
make: *** [reviewable] Error 2

Using pointers here is just a memory optimization or it's required?
I would prefer to merge instead of trying to fix memory for 3 bool variables 😅

@Feggah
Copy link
Collaborator

Feggah commented Oct 23, 2022

Using pointers here is just a memory optimization or it's required?

That's a pattern that we try to follow for all providers. There are some fields that aren't in the same specification, but this could be just a mistake from the reviewer of the PR.

Could you please fix it? It is pretty straight forward 🙏🏼


The error that you are getting from the make reviewable test is just the order of the imports in the import ( ) block of the code, it isn't about memory problems.

Check if your code is with following the example image below. You can't have packages mixed together.

image

@POD666
Copy link
Author

POD666 commented Oct 24, 2022

Ah, that was just import ordering! I thought it was some kind of cyclic import. My vs code was sorting import statements by itself so I was sure it was good.

make reviewable test is enough to verify switching to pointers? Do I need manual testing again?

Copy link
Collaborator

@Feggah Feggah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just added a quick suggestion that would be helpful for users (I needed to search myself to get this working)

Comment on lines +152 to +153
// Bigquery table to deliver messages to.
Table string `json:"table,omitempty"`
Copy link
Collaborator

@Feggah Feggah Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the Table field required (when bigqueryConfig is specified) you need to remove the omitempty JSON tag.

Suggested change
// Bigquery table to deliver messages to.
Table string `json:"table,omitempty"`
// Bigquery table to deliver messages to.
// Must be in the format {projectId}.{datasetId}.{tableId}
Table string `json:"table"`

Copy link
Collaborator

@Feggah Feggah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was testing a little more and I got these strange behaviors:

image

Regarding the fields DropUnknownFields, UseTopicSchema and WriteMetadata not being LateInitialized, I believe that's because the if condition (print screen below) will only enter if bigqueryConfig is nil. If you define any value inside, it won't LateInitilize the rest. (and we should LateInitialize everything that is not defined in the manifest but is defined in the external API)

image

Regarding the PushConfig field being LateInitialized, that's an error, right? Because it shouldn't exist if the subscription type is BigQuery. Maybe it can be the cause of the Update loop, I suggest you start investigating there.


You can make run your code locally with some fmt.Println() to understand what is going wrong and which conditions aren't behaving as expected.

Or you can watch container logs as well. See this documentation about how to run the provider container in debug mode

@POD666
Copy link
Author

POD666 commented Nov 1, 2022

I have a little context around expected behavior. Update loop 7 times feels wrong.
But the first screenshot you shared looks ok to me (perhaps it's LateInitialized with nil value).

I guess that is something that should be covered with TestLateInitialize. I extended test cases for it, see my latest commit.
Please check them and let me know if you think that some test case doesn't match expectations or if you think I should add other variations.

Anyway, I have copied code for Push config when implementing BigqueryConfig. So the issue might be not related to my PR (it might require refactoring the whole LateInitialize function).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing bigqueryConfig for pubsub subscriptions
2 participants