-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86/defconfig: Add kata containers defconfig #5
x86/defconfig: Add kata containers defconfig #5
Conversation
Hi @devimc - as 2000+ lines of kernel config is quite tricky and takes some time to review ;-), can you give us a little background:
Also, I think you may have submitted this PR against @gnawux - if you have a kernel config expert, please point /cc them in here - we'd welcome input to share kernel config knowledge (but I also understand it can be quite a task to do a config compare or review, so maybe we will have to schedule that as a task for later). Generally though, presuming we have tested this kernel in some form, and we get this PR aimed at the correct branch: |
Hi @grahamwhaley |
6d45e5d
to
4feec8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list is quite long to review. I might have missed something but I already see following differences between cc and hyper kernels:
- initramfs support
- loadable module support
- XEN PV guset
- veth support
We need to get consensus on them before this can be merged.
CONFIG_CGROUP_PERF=y | ||
# CONFIG_CGROUP_DEBUG is not set | ||
CONFIG_SOCK_CGROUP_DATA=y | ||
CONFIG_CHECKPOINT_RESTORE=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need checkpoint/restore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devimc, did we need this as part of the investigation of implementing docker checkpoint
using CRIU?
https://docs.docker.com/engine/reference/commandline/checkpoint/
https://github.com/clearcontainers/runtime/blob/master/docs/limitations.md#checkpointandrestore
https://criu.org/Main_Page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grahamwhaley we have to options to support live migration:
- using libcontainer to migrate only the process
- using qemu micro checkpoint to migrate all the VM
probably both options won't work because of 9p gaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, I'll disable it
CONFIG_NET_NS=y | ||
CONFIG_SCHED_AUTOGROUP=y | ||
# CONFIG_SYSFS_DEPRECATED is not set | ||
CONFIG_RELAY=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and relay? Do you want debugfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we don't need this - anybody know for sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don explicitly need it, but it may come as a dependency from something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to disable it
CONFIG_SCHED_AUTOGROUP=y | ||
# CONFIG_SYSFS_DEPRECATED is not set | ||
CONFIG_RELAY=y | ||
# CONFIG_BLK_DEV_INITRD is not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please enable CONFIG_BLK_DEV_INITRD support? We want to use initramfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @devimc please enable INITRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
CONFIG_SLABINFO=y | ||
CONFIG_RT_MUTEXES=y | ||
CONFIG_BASE_SMALL=0 | ||
# CONFIG_MODULES is not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please enable module support? We'd like to use loadable modules to keep the kernel small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here we can have an interesting discussion :-)
We have deliberately not enabled modules before for a couple of reasons:
- size, speed, probing etc.
- where will you store the modules (in the container images or on a volume??), and how will you ensure the modules that live on the host match the kernel that you are running inside the VM?
We have debated this a few times. Quite often a new network or filesystem or feature will be needed by a specific client, and having modules available would be one way to fix that.
If we turn on modules, then we should:
- assess any size/speed impact
- I believe turn on kernel API checking so we can try to avoid loading a module/kernel that have a mismatch in API
- discuss signing of modules as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grahamwhaley All fair points, that should be handled by osbuilder or the kata containers user tools for building its image. Provided that kernel modules support does not impact boot time for kernels that do not load any module at all, I think we should enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we store modules in sandbox rootfs and bind mount it to container rootfs images. Modules are built together with the guest kernel and compressed in the initramfs.
Optional features are built as modules. A sample list includes crypto, nfs, vsock, ipv6, l2tp etc. They are not essential to run a pod and are not loaded if unneeded.
And providing loadable modules opens up the opportunity to provide guest modules if the default kernel config misses something important to a user. In such case, we only need to provide additional kernel modules rather than rebuilding entire kernel.
CONFIG_PARAVIRT=y | ||
# CONFIG_PARAVIRT_DEBUG is not set | ||
# CONFIG_PARAVIRT_SPINLOCKS is not set | ||
# CONFIG_XEN is not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd like to have XEN PV guest support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok :)
# CONFIG_NET_POLL_CONTROLLER is not set | ||
CONFIG_TUN=y | ||
# CONFIG_TUN_VNET_CROSS_LE is not set | ||
# CONFIG_VETH is not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need CONFIG_VETH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf You need veth
in the guest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameo not for the common case. But we got quite a few docker in hyperd
and hyperd in hyperd
requests, which is why we enabled KVM/cgroups support in guest kernel as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Fair enough, thanks.
Since we have https://github.com/kata-containers/osbuilder, IMO, it would be better to maintain the kernel config in it. |
@laijs that's a good question. My only concern would be separating the config away from any patches we are carrying. Having said that, if we end up supporting multiple configs for different performance characteristic/feature sets for instance, then maybe holding those in |
@laijs I believe we should keep a reference config file in the kernel and maintain variants in osbuilder, as you suggest. |
@grahamwhaley @sameo Thanks, I got it. |
Hi @devimc , We need CONFIG_VMAP_STACK. Thanks, |
Hi @devimc , I think the kata config should not be included in the kernel repo because I think the kernel repo should just keep the patch that we want to upstream it. Thanks, |
Have you benchmarked this? Having stacked vmalloc'ed can potentially introduce TLB misses from stack accesses and could hurt performance. cc @mcastelino |
4feec8b
to
41e0e79
Compare
after apply the changes suggested by @bergwolf:
|
@sameo , Thanks, |
@teawater with vmap stack the performance degrades then. Did you check the kvm stats to see if you are seeing much higher ept violations? |
Sorry to make a mistake in language. I want to say performance increased but input the execution time increased. This the clear test result of sysbench memory inside a qemu: With vmap stack: |
Following part is what we need: Thanks, |
@sameo @mcastelino @bergwolf are you agree with @teawater 's changes ? |
41e0e79
to
6647ef5
Compare
Update
|
@teawater Why do you need all the serial options? Same question for NFS. |
The SCSI changes make sense for sure. We plan to move to SCSI for all block devices. The VMAP change also makes sense as the data from @teawater seems to indicate that this does not degrade performance. It will give us some density benifits. However we still need to run I/O benchmarks. @grahamwhaley @GabyCT we should run PnP with VMAP enabled for storage and network I/O. We really care about I/O performance more than anything else. |
@sameo We use these kernel options with hyperstart. Thanks, |
@teawater I don't think this will be needed with the Kata agent. Could you please double check? |
@sameo Yes, we are sure we need them. hyperstart supports NFS volumes. We'd expect the kata agent to support NFS volumes as well. And we need serial ports in case vsock is unavailable, right? |
@bergwolf Yes, I figured you'd need NFS volumes, and that is fine with me, especially since it's going to be a module. |
@egernst Yes, that's the idea. |
@sameo - I think that's a very good idea. I've raised kata-containers/packaging#8 (and kata-containers/packaging#7 for qemu config). /cc @grahamwhaley. |
@sameo a script in packaging or here? |
Personally, I think such scripts should live in the packaging repo as that's where they will be used. It also allows repos like this (and qemu) to remain as "vanilla" as possible. |
@devimc @jodh-intel In packaging, yes. Also, see: |
f95bdc7
to
f4e94c0
Compare
@kata-containers/linux can we merge this PR ? |
Looks like this is mergeable @devimc , althought the discussion about having a config generating script in the packaging did not seem to conclude. |
Any update on this? |
btw I would be good to have the steps described in this PR documented somewhere.
|
tbh, I think we could start by simply having the kernel config in the packaging area, as we did for Clear Containers: |
I am thinking if it's better to put the kernel config in kernel repo, and qemu config in qemu repo. this can allow us compile our own kernel/qemu without packaging repo. |
I can understand that viewpoint too. But I think as long as we clearly document the process, either would work. It's worth pointing out that we already have precedent for the hypervisor though: -https://github.com/kata-containers/packaging/blob/master/scripts/configure-hypervisor.sh |
That's reasonable, however, as linux and qemu are forked from upstream, we'd better keep the stuff not tend to be upstream outside the repo. A packaging script works too as mentioned by @jodh-intel . |
@gnawux @jodh-intel That's fine. I believe we can eliminate the confusion by providing some good documentations 😄 |
kata_containers_common_defconfig is based on Clear Containers config https://github.com/clearcontainers/packaging/blob/master/kernel/kernel-config-4.9.x and was adapted to the 4.14.13 kernel by just accepting the defaults configurations For example, to compile a x86 kernel with KVM enabled: ``` cat arch/x86/configs/kata_containers_common_defconfig arch/x86/configs/kata_containers_kvm_defconfig > arch/x86/configs/kata_containers_defconfig make ARCH=x86 kata_containers_defconfig make -j8 ``` and to compile a x86 kernel with XEN enabled: ``` cat arch/x86/configs/kata_containers_common_defconfig arch/x86/configs/kata_containers_xen_defconfig > arch/x86/configs/kata_containers_defconfig make ARCH=x86 kata_containers_defconfig make -j8 ``` fixes kata-containers#3 Signed-off-by: Julio Montes <[email protected]>
f4e94c0
to
1355353
Compare
@grahamwhaley @jodh-intel @gnawux @laijs what about the proposed PR by @devimc in kata-containers/packaging#18 |
Thanks for the pointer @jcvenegas. @devimc - do we need to add the |
kata_containers_common_defconfig is based on Clear Containers config
https://github.com/clearcontainers/packaging/blob/master/kernel/kernel-config-4.9.x
and was adapted to the 4.14.13 kernel by just accepting the defaults configurations
For example, to compile a x86 kernel with KVM enabled:
and to compile a x86 kernel with XEN enabled:
fixes #3
Signed-off-by: Julio Montes [email protected]