sprint 2012 07 26

Sprint 0726

Objective: Establish process for future sprints

Following Sprint: sprint 2012-08-02

Crowbar Refactoring Coordination Efforts

Tuesday 7/31 @ 10am CDT. Voice & Screen Share: https://join.me/dellcrowbar

Unfortunately, this session was not recorded :(

Database & Rails3 Background

Nav & Documentation
Barclamps
- replace catalog
- Proposals attributes & elements (roles)
Nodes
Networking Models (future brief)

Using migrations to create the models - migrations will show up in the barclamps db/migrate directory

CMDB Abstraction Layer

So far, we've build good practices w/ Chef and been able to create orchestration models. However, Chef is not along in the CMDB space. Now that we understand the problem better, we're able to start decoupling from Chef and be able to drive any CMDB.

"Clawhammer" can yank out and bash in things from CMDB

Attribute maps will help figure out which Crowbar information needs to be synchronized into each CMDB

for example, the number of cores on a box would be yanked out of the CMDB and stored in Crowbar DB
there is the concept of generic (default attribute map) and custom attributes (injected attribute map)
each barclamp needs to be able to enhance the attribute maps

Does this create a problem with data synchronization? How much data should Crowbar store?

We want to store the minimum amount of data.
In the past, we needed very little information from Chef to make good decisions
Challenge for us with Chef has been that "last write wins" and the node always ownes the attributes
- We had to push everything into Roles because that was the only thing that we could manage
- This allowed us to manage the node lists, but caused problems
We had to store a lot of Node information in the Roles. We thought this was a problem
Crowbar has a need to become the source of truth - that's the key element of Orchestration
- This is an important factor (Rob spoke on this, did not write it down)
Orchestration Machine
- Asynch queue is core element
- Notifications
- Late-binding is critical to orchestration (http://robhirschfeld.com/2012/07/25/ops-late-binding-is-critical-best-practice-and-key-to-crowbar-differentiation/ )
Message Queue
- runing w/ a database
- could run from mulitiple nodes
- services from a daemon - not related to web UI, so does not request state changes

Greg Althaus on how Crowbar doesn't use chef in the "normal" way.

Crowbar is automated orchestration. This is different than most people use chef. Rundeck and similar tools do some of this, but more of an automation of what people do to drive chef.

Crowbar has two problems dealing with chef.

Chef does not have synchronized access to its data elements. Last writer wins. In "normal" operations, I edit a node, then later it runs chef-client or I run chef-client and all is fine. In crowbar, we, at times, almost guarantee writing the node object as a chef-client run is going on. This creates "badness".
This badness usually takes the form of losing run_list elements or losing attributes set on the node. So, this leads to the first automation trick we use to control chef runs - the node-role. This is a role named for the node. It contains attributes and run_lists for that node. Crowbar edits this freely because it is the owner of this object. The only time crowbar app edits the node object is to create the first run list for the node at discovering time. This adds the node-role to the node and the crowbar app never touches the node again.
Some recipes (ohai, ipmi, raid, provisioner with UEFI changes) write on the node object from recipes. This is used for status and inventory. The attribute space reserved for these operations are crowbar_wall and crowbar_ohai.
Regardless of names, the recipes write on the node object to return recipe status (crowbar_wall => raid status and ipmi status live here) or ohai extensions (crowbar_ohai => Switch config lives here).
The second usage problem is that most people edit the attributes.rb of the recipe and upload that into their cookbook for customization. While we could do this, it doesn't allow for multiple instances of the cookbook being deployed in the cloud. It also means that you have upload/change recipes for configuration changes. This felt "bad" for a system that wanted to use unmodified recipes (yes, that was a goal in the olden days, though not really realized, hopefully with attribute driven recipes shortly).
This leads to the proposal role. This role basically overrides the attributes in the attributes.rb file of a recipe which allows customization. The role's attributes are overlay the recipes proposal.
This also has the side effect of building treating recipes/roles/crowbar as C++ style objects. <tilt head just right>
The recipes are the class methods with the proposal roles being the data for an instance of that class and the crowbar barclamp pieces being the factory for those instances. <squint, no I mean it, squint>
Add on to that the element roles (e.g. glance-server, nova-multi-compute, ....), you get reasonable orchestration.
The element roles are mini-run-lists for subsets of function.

So, nova-multi-controller is a run-list that pulls in the correct recipes or other roles to build a nova controller node. These are short-cut lists so that the master run-list in the node-role can be easily updated to add desired function.

nova-multi-controller is a roll that has a run-list ....
The element roles are reflect in the UI and APIs as things that nodes can be added to when creating or updating a proposal. They are logical sub-functions of a barclamp service. e.g. glance-server, ntp-client, nova-multi-compute, swift-proxy ....
The crowbar framework has some centralized routines to manage the run_list in the node-role, the attributes in the node role, and the proposal roles are just the current holders of config (big jsonized hashes and lists).

What's changing now

Clawhammer has the real possibily to adding even more scalpel like slicing to the problem directed chef runs. I could see us build run_lists on the fly so that only certain recipes are run at only certain times. This helps with faster and more controlled bring up of nodes

So, where did this come from? Defensive coding because chef sucks. To clarify: Chef sucks when used as a relational database. BECAUSE IT ISN'T ONE! But we use it that way anyway. This is why we are doing some of the database work.

Some of the things were designed from the beginning (or near beginning).

Proposals stored as roles (for persistence and consumption).
Node-roles are a side-effect of reactive coding to race condition hells that come about as chef-client runs write the node object out from under crowbar.

Planned Work

Coodination between Dell and Suse for DevTool changes to support Suse's integration

Participants

The following people are available for development this sprint (time zone)

Dell: Rob @zehicle Hirschfeld (-6), Greg Althaus (-6), Victor Lowther (-6), Andi Abes (+1), Judd Maltin (-5), ...

Engineering Efforts

DevTool Changes

Dell: Victor Lowther
Suse: Tim or Adam

Notes:

Location of SUSE source: https://github.com/SUSE-Cloud/crowbar/tree/release/essex-hack-suse/master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly