-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] CI setups for metrics measurements #6
Comments
@grahamwhaley - FYI, Talked with Clark @ openstack and he pointed me to http://lists.openstack.org/pipermail/openstack-dev/2018-March/127972.html The TL;DR gist is that the existing infrastructure for openstack doesn't have a set solution at the moment. IMO it seems our best bet is the current solution we're using/migrating (that is, a bare-metal slave). |
Basically, the current resources donated to the OpenStack community CI system are all virtual machines in (frequently oversubscribed) public cloud providers, so performance consistency across providers or even within a single provider or even on the same virtual machine from one minute to the next could vary significantly. To be able to have comparative performance between runs we need an entirely different kind of donated server resource (which basically reduces to dedicated bare metal). |
@grahamwhaley the proposal looks good to me ! And I am sure it's not gonna be a huge deal to setup, but I agree with @fungi that we need to find (gently offered :)) those dedicated baremetal machines and pin them to the metrics CI. |
Hi all (@annabellebertooch @fungi @mnaser @cboylan @egernst @sboeuf et. al. ).
First, can I check, has the situation with the OSF/Zuul build hosts changed at all - do we presently have any dedicated bare metal machines for instance? (I will reference the thread referenced above for example: http://lists.openstack.org/pipermail/openstack-dev/2018-March/127972.html) If not, can we discuss if there is any method whereby we can allocate or schedule any of the cloud hardware/systems to enable a 'whole machine' allocation or 'single VM' allocation to a job so that we can avoid any noisy-neighbor situations? I will note that internally we are effectively doing this - running the jobs withing a fresh-VM-per-build on a bare metal machine whilst ensuring there is only ever a single VM running on that host at any one time. Referencing the previous OS thread above, sure I understand that the cloud hosts might be over subscribed and thus such an idea may not have been feasible before - but, I thought I'd check the status at least. And for our other cloud partners (@jessfraz @jon) - if you were able to source some dedicated hosts or hardware in your clouds for Kata metrics CI, that would be great :-) Thanks everybody! |
The OpenStack Zuul instance does not have access to dedicated baremetal instances currently. @mnaser is probably in the best position to know how feasible setting up "single VM" hypervisors is (as he manages the hypervisors). From Zuul's perspective it is largely just a matter of plugging whatever resources we end up with into Nodepool and consuming them from there. Current tested drivers (that are known to work) can support this talking to OpenStack APIs (specifically Nova for VMs or Baremetal) or via simple static setup that speaks ssh to a preconfigured server. There is an untested Azure driver as well, but I have no idea how to gauge how well it would be expected to work until we found a way to test and use it. |
@grahamwhaley there is an existing sponsored Kata Containers CI project at Packet, perhaps we can fold this all into one shared effort. Reference https://github.com/WorksOnArm/cluster/issues/31 |
Ah, thanks @vielmetti , I'd not seen that thread/request itself. |
@grahamwhaley if you can put a request (this text is fine) in the Works on Arm cluster referenced above, then I can invite you to that project and you can continue on there. Also, if you could cc that request to [email protected] to my attention that will speed up a whole bunch of other processes to get you on board. Thanks! |
For completeness, things have been a bit quiet, so I have also applied through the CNCF program for access to packet.net hardware: |
An update then.
|
\o/ - great news! 😄 |
Update time. We'll monitor those three PR hook/builds for the next few days, and then given they are producing stable results, also enable the runtime and tests repos. |
🎆 🎈 Awesome news! 🎆 🎈 Thanks again to @vielmetti and packet.net! 😄 |
Thanks @jodh-intel and want to make sure that the CNCF also gets proper credit for this! |
@vielmetti - let's discuss offline with @annabellebertooch what can be done to attribute credit appropriately etc. |
I think... we can close this Issue. We have the packet.net machine running the CI metrics. I still have some work to do to try and understand some results 'noise' and slowness on that machine, but that does not require this Issue to be open. Closing now... |
Add a script that will be the **single** source of all static tests run before building kata containers components. Initially, it simply runs the `checkcommits` tool from this repository, but will be extended later to run linters, etc. All other kata containers repositories should invoke this script to avoid a proliferation of (different) static check scripts. Fixes kata-containers#6. Signed-off-by: James O. D. Hunt <[email protected]>
Hi.
For Clear Containers we have a Jenkins CI system running (inside Intel) that executes a set of metrics to try and detect and report performance regressions. We'd like to discuss options for setting up such a system for Kata.
There are a few requirements and features for such a system - I'll list them out here to help discussion:
/cc @emonty @gnawux for any input from your side, wrt any setups you presently have or infra available etc.
The text was updated successfully, but these errors were encountered: