Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Windows runner for CI workflows #1275

Merged
merged 77 commits into from
Feb 3, 2023

Conversation

RyanL1997
Copy link
Collaborator

Description

Support Windows runner for CI workflows of integration test and Cypress test

Category

Enhancement

Issues Resolved

Testing

GitHub Actions

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@RyanL1997 RyanL1997 changed the title Support Windows runner for CI workflows [WIP] Support Windows runner for CI workflows Dec 20, 2022
@RyanL1997
Copy link
Collaborator Author

RyanL1997 commented Dec 20, 2022

This PR is still a WIP. The major blockers are:

  1. The SAML Integration test is failing due to unknown reason.[1]
  2. For the Cypress test, the yarn start --no-base-path & command cannot make the dashboards server up running after the step "Configure and Run OpenSearch Dashboards".[2]

Reference

[1]: https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3716119921/jobs/6333900876
[2]: https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3732417653/jobs/6331827892#step:13:84

@cwperks
Copy link
Member

cwperks commented Dec 20, 2022

Is the test getting stuck somewhere? It looks like its reaching the 10min timeout limit:

Timeout - Async callback was not invoked within the 600000 ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 600000 ms timeout specified by jest.setTimeout.Error:

on this test:

> 287 |   it('Tenancy persisted after Logout in SAML', async () => {

@RyanL1997
Copy link
Collaborator Author

Updates

Here is the log of using:

...
Start-Process yarn -ArgumentList 'pretest:jest_server'
node .\test\run_jest_tests.js --config .\test\jest.config.server.js
...

https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3752295912/jobs/6374283816#step:17:101.
I think by running like this the IDP server is not running it up, because me and Stephen received the same log when we only used the command: node .\test\run_jest_tests.js --config .\test\jest.config.server.js

@RyanL1997 RyanL1997 force-pushed the windows-support-for-ci branch 2 times, most recently from c95647d to 076b2d8 Compare January 6, 2023 22:47
@RyanL1997
Copy link
Collaborator Author

Updates on 01/06/2023

There are two major issues blocking this PR:

  1. IDP server of SAML integration test couldn't be started on windows runner so that will lead to the failing of integration test, and the workflow will stuck on the running of integration test without any logs by using yarn command. [1]
  2. OS Dashboards server couldn't be started on windows runner, so that will lead to the failing of Cypress test. [2]

For the first issue of IDP server of integration test workflow, instead of using yarn, I also tried to use node .\test\jest_integration\runIdpServer.js & with node .\test\run_jest_tests.js --config .\test\jest.config.server.js [3] and it seems like it will performs the same, but with logs to show that SAML test is the only test case failed [4]. According to these, I will first try to refactor the initialization of IDP server into the SAML test case. If this approach is not feasible, I will set a bypass of the SAML test case in window runner only, and create a follow up issue and documentation to this for future fix.

For the second issue of Cypress workflow, I was assuming that it may need longer time to set up OSD on windows runner. However, according to the experiments me and @scrawfor99 did for using the original yarn command, even we have extended the sleep time to 800 seconds [5], the dashboard server still could't be started on windows runner (see [2]).

On behalf the similarity of the above findings, I guess the way we are using of starting severs on windows runner is wrong. I'm still doing research about this and I will also keep updating the findings.

Reference

[1] : https://github.com/opensearch-project/security-dashboards-plugin/actions/runs/3859072362/jobs/6578260549
[2] : https://github.com/opensearch-project/security-dashboards-plugin/actions/runs/3859072379/jobs/6578260760#step:13:62
[3] : https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3716119921/workflow#L109-L110
[4] : https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3716119921/jobs/6333900876#step:17:61
[5] : https://github.com/opensearch-project/security-dashboards-plugin/actions/runs/3859072379/workflow#L89

@peternied
Copy link
Member

As written the SAML IDP is started before the tests run, but this seems backwards to me. The SAML server should be started by the tests that need it so endpoint, state, and other properties can be reviewed and confirmed by the tests. Can we migrate the IDP start up logic into the tests - this side-steps issues of how to start the server in Windows vs Linux environments.

As there is considerable progress towards getting the rest of the tests running, can we handle the SAML tests running on windows in a separate PR to keep up speed, what do you think?

@stephen-crawford
Copy link
Contributor

@RyanL1997 thank you for filing this. I think @peternied's suggestion to migrate the server launch into the tests themselves is a good one. You and I have both tried about every other way of getting the server to launch and I think it is just a matter of something being different on the window's runner versus the Linux which may not be configurable. That being said, great job with all this.

@RyanL1997
Copy link
Collaborator Author

Update 01/09/2023

@cwperks, Here is the findings after we tried to curl the http://localhost:5601 after checkout to the functional test repo [1]:

curl: (7) Failed to connect to localhost port 5601 after 2229 ms: Connection refused

It seems like we lost the server after the checkout.
I also tried to curl the http://0.0.0.0:5601, and it returned me this [2]:

curl: (7) Failed to connect to 0.0.0.0 port 5601 after 0 ms: Address not available

Reference

[1] : https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3880154925/jobs/6617977487#step:12:30.
[2] : https://github.com/RyanL1997/security-dashboards-plugin/actions/runs/3880184753/jobs/6618034164#step:12:28

@RyanL1997
Copy link
Collaborator Author

RyanL1997 commented Jan 12, 2023

@opensearch-project/security Guys I think I need your support on this. If you can take a look of this comment (#1275 (comment)) I think the server of OSD cannot be carried after the checkout of functional test repo for Cypress Test.

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jan 12, 2023

Looks like both, Cypress and Integration test, CI checks failed with java.lang.IllegalStateException: failed to load plugin class [org.opensearch.security.OpenSearchSecurityPlugin] during Run Opensearch with Single plugin stage

@stephen-crawford
Copy link
Contributor

@DarshitChanpura, I was trying to help @RyanL1997 remove the SAML test from the Windows side and saw this too. I am looking into it now and will hopefully identify a cause soon.

@peternied peternied mentioned this pull request Jan 12, 2023
3 tasks
@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jan 12, 2023

Seems like this could be a potential root cause:

Caused by: java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/InjectableValues$Std
	at org.opensearch.security.ssl.OpenSearchSecuritySSLPlugin.<init>(OpenSearchSecuritySSLPlugin.java:195) ~[?:?]

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jan 12, 2023

@DarshitChanpura, thanks for the help. I see that too on mine. I am actually running the 2.x and main versions of the install workflow on the backend side to see what the results are. It may be as simple as updating the fasterxml dependency if it fails on the backend too.

Update: 2.x is fine but main fails saying a bunch of jackson stuff does not exist. I am going to look at the changes that impacted jackson recently and see where we will need to add it back in.

Update: Now it seems to be working on main again. I am not sure if it is intermittent build changes on the distribution build that is causing the behavior difference but even when the backend is fixed the dashboards repo is still running into the issue. I am going to look more closely at the specific error mentioned above. For the jackson issue I am unsure what we want to elect as the course of action. It seems like we are running into the issue as a result of the core snapshot having moved jackson around (my best guess without following the path further). We can directly depend on jackson ourselves in order to make sure we have the correct version where we expect but this would add an entire dependency and I know that jackson has been a topic for discussion as of late.

@DarshitChanpura
Copy link
Member

Update: Now it seems to be working on main again. I am not sure if it is intermittent build changes on the distribution build that is causing the behavior difference but even when the backend is fixed the dashboards repo is still running into the issue. I am going to look more closely at the specific error mentioned above. For the jackson issue I am unsure what we want to elect as the course of action. It seems like we are running into the issue as a result of the core snapshot having moved jackson around (my best guess without following the path further). We can directly depend on jackson ourselves in order to make sure we have the correct version where we expect but this would add an entire dependency and I know that jackson has been a topic for discussion as of late.

The class is present in jackson-databind 2.14.1 which is what we use: https://fasterxml.github.io/jackson-databind/javadoc/2.14/com/fasterxml/jackson/databind/InjectableValues.Std.html
So I'm not sure where exactly the issue lies. I was able to reproduce it locally. Will look into possible solutions now

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jan 12, 2023

The class is present in jackson-databind 2.14.1 which is what we use: https://fasterxml.github.io/jackson-databind/javadoc/2.14/com/fasterxml/jackson/databind/InjectableValues.Std.html So I'm not sure where exactly the issue lies. I was able to reproduce it locally. Will look into possible solutions now

Looks like jackson-databind is not shipped with the packaging distribution, neither in core nor in security which is most likely the root cause. Investigating now

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jan 13, 2023

I also saw that core and security seemed to not be building with the jackson dependency. However, it looks like the classes can be found when you just focus on the backend tests--if you run plugin install workflow, everything is successful on main. I suspect this means that dashboards may be getting a different version of either the security plugin or the core SNAPSHOT.

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jan 13, 2023

I also saw that core and security seemed to not be building with the jackson dependency. However, it looks like the classes can be found when you just focus on the backend tests--if you run plugin install workflow, everything is successful on main. I suspect this means that dashboards may be getting a different version of either the security plugin or the core SNAPSHOT.

Interesting. I did confirm that the building the package using main branch of security doesn't include jackson-databind, which makes me think that the way we are building artifacts might be a little skewed.

@RyanL1997 RyanL1997 force-pushed the windows-support-for-ci branch 2 times, most recently from d565352 to 549afe5 Compare January 20, 2023 19:57
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2023

Codecov Report

Merging #1275 (0c95b14) into main (82c27c7) will not change coverage.
The diff coverage is n/a.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##             main    #1275   +/-   ##
=======================================
  Coverage   71.78%   71.78%           
=======================================
  Files          88       88           
  Lines        2027     2027           
  Branches      274      274           
=======================================
  Hits         1455     1455           
  Misses        509      509           
  Partials       63       63           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@RyanL1997 RyanL1997 force-pushed the windows-support-for-ci branch 4 times, most recently from 4dd8a09 to 833cdff Compare January 24, 2023 05:51
@RyanL1997
Copy link
Collaborator Author

Update 01/23/2023

I have successfully fixed the Cypress workflow [1]. However, since the original github action for checkout is broken for windows runner [2]. I have came out a work around: instead of checking-out to functional test repo to run the Cypress test, I clone the FTR into current working directory and directly setup and run Cypress in the same step of OSD setup [3].

Reference

[1] : https://github.com/opensearch-project/security-dashboards-plugin/actions/runs/3993203040/workflow
[2] : actions/checkout#702
[3] : https://github.com/opensearch-project/security-dashboards-plugin/pull/1275/files#diff-1c29b5bd518bbbdff17cb10f614dd430171102ccd8c78497bebe5da820657dbbR91

@RyanL1997 RyanL1997 force-pushed the windows-support-for-ci branch 4 times, most recently from 916f111 to 429c71e Compare January 31, 2023 19:20
@RyanL1997
Copy link
Collaborator Author

RyanL1997 commented Jan 31, 2023

Update 01/31/2023

Since the latest distribution build does not have security plugin included, so I'm testing on build version 7028 which includes all the latest changes we made for core and security plugin, and I will change it back to latest once everything is ready.

@cwperks, @peternied, @scrawfor99, and @DarshitChanpura, I have finished some of the refactoring so some of the old reviews of you guys disappeared on the latest change. I also replied some of the comments in the old commits but for better view and understanding I'm manually updating with you guys following:

  1. I have created a action for plugin install and script setup, so that it reduces the redundancy of the commands.
  2. I have deleted all the conditional commands in the action of dashboard installation, since everything is compatible with windows runner.
  3. In the plugin install action I switch to use peternied/download-file@v2 for plugin downloading
  4. For the gecko action, I was checking their repo and it seems like it only has latest as their version control, so that I cannot change it.
  5. For the Firefox action, I have change the latest into v1. However, I cannot change the action for windows, otherwise it will return:

Screenshot 2023-01-31 at 11 17 25 AM

Since it doesn't support for windows, and I can only use my fork for the windows support. (However, due to the IDP server issue, we do not have saml_auth test included in windows runner).

@stephen-crawford
Copy link
Contributor

Great work @RyanL1997. I know this has been a major thorn and it is appreciated that you continue to be diligent about it. All of your comments make sense to me. I would also mention that if you were really motivated, you could fork the gecko action and firefox action for windows and then make your own versioned action which you could refer to (this is not necessary but is an option should you want to go that route). Great job overall!

@RyanL1997
Copy link
Collaborator Author

Great work @RyanL1997. I know this has been a major thorn and it is appreciated that you continue to be diligent about it. All of your comments make sense to me. I would also mention that if you were really motivated, you could fork the gecko action and firefox action for windows and then make your own versioned action which you could refer to (this is not necessary but is an option should you want to go that route). Great job overall!

Thanks for the comment @scrawfor99. That makes sense. The reason we were trying to avoid using @latest is that we are trying to avoid to having breaking changes for their update. Using our own fork is actually a good idea for avoiding this.

uses: peternied/download-file@v2
if: ${{ runner.os == 'Linux' }}
with:
url: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/${{ inputs.opensearch-version }}/7028/linux/x64/tar/builds/opensearch/plugins/${{ inputs.plugin-name }}-${{ inputs.plugin-version }}.zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded build 7028?

Copy link
Collaborator Author

@RyanL1997 RyanL1997 Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was checking with infra team this morning and seems like the latest doesnt include security plugin yet and that's also the reason for the failing of pre-requisite test. I will change it back to latest once everything is ready. This is only for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link the issue where this is being worked on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just linked issue in the general comment section.

Copy link
Member

@peternied peternied Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the latest distribution build of 3.0.0 does not have security plugin included, so I'm testing on build version 7028 which includes all the latest changes we made for core and security plugin, and I will change it back to latest once this issue gets fixed opensearch-project/cross-cluster-replication#696

If the security plugin is building on 3.0.0 why don't we have a fresh build of it available? Should we be publishing our own builds to maven for consumption so we aren't stuck because of a separate plugin, or should the build process automatically publish binaries for components that do function?

@RyanL1997 Please create a new issue in opensearch-build and start a thread in our release channel linking that issue, we need to be pushing for a solution to the systemic problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: They have fixed the build

.github/actions/install-dashboards/action.yml Outdated Show resolved Hide resolved
.github/workflows/cypress-test.yml Outdated Show resolved Hide resolved
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
# Browser-action version does not work on Windows
- name: Set up Firefox browser for Windows
if: ${{ runner.os == 'Windows' }}
uses: RyanL1997/setup-browser@main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any changes in your fork, can you link the commit with the changes you needed? I'm fine using your fork and should use the same fork for both linux/windows workflows

@RyanL1997
Copy link
Collaborator Author

RyanL1997 commented Feb 1, 2023

I don't see any changes in your fork, can you link the commit with the changes you needed? I'm fine using your fork and should use the same fork for both linux/windows workflows

I was forking from the other's fork. I think this is the commit from the original author to fix firefox browser on arm64 on windows runner: RyanL1997/setup-browser@4e4eb11#diff-75000eb32e68d8d07d14a37b002c16b1920c849082945fbaf79c4d263a182e87

@RyanL1997
Copy link
Collaborator Author

Since the latest distribution build of 3.0.0 does not have security plugin included, so I'm testing on build version 7028 which includes all the latest changes we made for core and security plugin, and I will change it back to latest once this issue gets fixed opensearch-project/cross-cluster-replication#696

Copy link
Member

@cwperks cwperks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One qq on yarn add sha.js, but otherwise this looks good to me 🙌

.github/actions/install-dashboards/action.yml Show resolved Hide resolved
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets merge it, thanks for your hardwork @RyanL1997 !

Copy link
Contributor

@stephen-crawford stephen-crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woot woot! Way to go Ryan.

@cwperks
Copy link
Member

cwperks commented Feb 3, 2023

Updating branch protection rules to make integ tests on Ubuntu + Windows a requirement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing Integration tests CI on Windows
6 participants