Skip to content
This repository was archived by the owner on Nov 6, 2023. It is now read-only.

Add CiFile - Parser and convertor for streamlined CI pipeline/job config file #61

Closed
wants to merge 2 commits into from

Conversation

gsantner
Copy link
Contributor

@gsantner gsantner commented Jun 29, 2018

This adds the base for resolving issues #24 #50 and #55.
It sports the quite often mentioned Ci-File parser I have working on many nights.
I have tested it with simple and complex Ci-Files in GitLab CI format, which all is too FOSS, open spec and one of the most powerfullies available currently.

The goal of this PR is not to integrate it into Flux CI already, this is a bigger task, where I'm out of in depth info of Flux CI.
You can see an example on the very bottom, and test things out (too included in file as __main__).

  file_to_load = "user/project/.ci.yml"
  cifile = CiFile(variables = {"BUILD_COOL_FEATURE": "1"}).read_cifile(fyaml=file_to_load)
  cifile.prepare_run_with("master","trigger","Very cool\nSome more descriptive commit text","commitsha5ar55f56s87ng8z98z4ß9z1t98gr3", ci_commit_tag=None)

  # Example usage for CIs
  matrix = cifile.convert_to_matrix()
  for i, stagename in enumerate(matrix[MATRIXKEY_STAGES]):
    for jobname, job in matrix[MATRIXKEY_JOBS][i].items():
      # We have now details of a job, lets execute script
      # which is a list of commands to execute one after other in shell / cmd
      for cmd in job[JOBKEY_SCRIPT]:
        exitcode = print("shell.execute(cmd)")
        if exitcode != 0:
          pass # Don't let start any other jobs after we got info that one thing failed

This basically is what we have todo:

Overall Preperation step:

  • (at this point nothing gets runned, it's for creating and filtering joblist, as in checking e.g. BUILD_FEATURE_X=1 supplied by WebApi - adds or removes jobs. This always must pre-evaluated.)
  • Create new instance of CiFile object, pass in variables that we got from Web API call (when we have it available). Too pass in IdCounter with initial value, so a job ID can be generated. It is thread-safe, just keep the project-counterinstance reference for future jobs till pipeline finished.
  • Parse/Read CiFile from filesystem, using read_cifile()

Run preperation:

  • When we are about to start building / the pipeline actually, we call prepare_run_with( and pass in information from CI itself like pipeline ID, branch, etc. This step is fully about CI and repository data, no user values here.
  • This has to happen when getting the request to start - this schedules the pipeline

Run:

  • Get a matrix using convert_to_matrix()
  • Basically just foreach job in jobs, foreach cmd in job[script]. Thats all we need for first.
  • Other things like artifact specifying, cache directories, docker image info are too already supported and handled by script, but I suggest to not go for this at the beginning.
  • In general you can be sure that things are there, they may be empty, but never None. You always have a jobs: element in matrix, with jobs grouped by stage. each jobs is ensured to always have a script: list of string.
  • This may happen at any time, its not required that this runs directly after preperation step. (This is with multiple slaves, with waiting for availability in mind.)

One big note: The CiFile logic already does implement job conditional filtering internal, this is something that doesn't need to be implemented separately.

--
License Note: I have this class over at https://github.com/gsantner/opoc/blob/master/python/services/cifile.py, but herebly grant the project to use it freely/fully re-licensed under the Flux projects MIT license.

@gsantner
Copy link
Contributor Author

BTW, I have made many comments so this is understandable for everybody. Take a look at that too - it basically says most of PR message too :)

@NiklasRosenstein
Copy link
Owner

Thanks a lot for your work @gsantner. This will take some time to check out and review. I hope to have that time next week or the week after!

@gsantner
Copy link
Contributor Author

gsantner commented Jul 28, 2018

ehm, is it awaitable to get this merged and worked on? took quite much time to get it in that working state :D

Copy link
Owner

@NiklasRosenstein NiklasRosenstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied the changes that I made comments about. Things that I noticed (list may get longer)

  • The except: [<branch>] doesn't seem to be evaluated yet (however only: [<branch>] is)
  • I think maybe the variables in the artifact:name value should be rendered by the CIFile already when using convert_to_matrix() so that user of that matrix doesn't have to

flux/cifile.py Outdated
# - jobs: Dict of jobs grouped by stage, orded by order in `stages`
# - stages: Name and order of all stages
@classmethod
def convert_to_matrix(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this method should be a classmethod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it takes the own internal state

Copy link
Owner

@NiklasRosenstein NiklasRosenstein Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand what you are saying, sorry. convert_to_matrix() does indeed take the internal state of self, specifically, to produce the matrix from self._cifile. Hower self is supposed to be an instance of the CiFile class, and not the class itself, that is why the @classmethod decorator is inappropriate.

Copy link
Contributor Author

@gsantner gsantner Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, didn't understand before :D you're right

flux/cifile.py Outdated

# Print given matrix (from `convert_to_matrix`) to command line
@classmethod
def print_matrix(self, matrix):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above and for others below.

flux/cifile.py Outdated

## Don't move between areas unless knowing consequences

## Work area 10
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the work area thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't move between areas unless knowing consequences it's about this

flux/cifile.py Outdated
# pip deps: dateparser
import os, sys, threading
import yaml, re
import dateparser
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module seems to be unused.

flux/cifile.py Outdated
stages.append(stage)

# Add default stage when nothing found
stages=stages if stages else [preference_siai_default_stage]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preference_siai_default_stage is not defined anywhere.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be self._default_stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

flux/cifile.py Outdated
@classmethod
def _parse_cidict(self, cidict):
# Store dictionary as member
if not cidict or not isinstance(cidict, dict):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right -- it should raise an exception if the value passed is invalid rather than just ignoring the fact and using an empty dictionary instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doable

@NiklasRosenstein
Copy link
Owner

NiklasRosenstein commented Aug 2, 2018

With the exmaple CI file that you linked (https://gitlab.com/gsantner/kimai-android/raw/master/.gitlab-ci.yml) on branch release you get these stages and jobs:

Stage: before_script
  Job: before:script
    bash -c '(while [ 1 ]; do sleep 5; echo y; done) | ${ANDROID_HOME}/tools/android update sdk -u -a -t "build-tools-${ANDROID_BUILD_TOOLS},android-${ANDROID_TARGET_SDK}" ; exit 0'
Stage: build
  Job: debug
    ./gradlew assembleDebug
  Job: release
    ./gradlew assembleRelease

Just so I understand this right, we concatenate the jobs in every stage with every other job to get a single pipeline?

So the above would give you two jobs running in Flux CI, running the following Stage's Jobs

  • before:script into debug
  • before:script into release

If we had another Job in the before_script stage (which we usually wouldn't have but just assuming that there could be another Job, it could by any other stage as well), eg. before:script2 we would get

  • before:script into debug
  • before:script into release
  • before:script2 into debug
  • before:script2 into release

?

@gsantner
Copy link
Contributor Author

gsantner commented Aug 2, 2018

before_script and after_script are basically two special things - to run before anything else and after everything else. What we are doing here is reshaping it to be a job just like everything else (so you can iterate over it and put into common hierachy).

Just so I understand this right, we concatenate the jobs in every stage with every other job to get a single pipeline?

We put the job infomration all in same level and add global information (like CI_JOB_ID submitted from Flux CI) to all jobs. We don't put one job into another.
The other thing is the hierachy (matrix view) pipeline -> stage -> jobs, which is how its supposed to run / looked at CI run.

  • So one pipeline is the whole big thing, that gets started
  • A stage is e.g. test, build, deploy
    • And each stage has it's jobs. Only when all jobs in a stage complete successfully the next stage is allowed to start
    • If all stages complete, than the pipeline is successfull

This all above is the matrix view of the data, at the core its just an array of jobs with all having same properties. That was the main point of this CiFile work, that all is streamlined.

So the above would give you two jobs running in Flux CI, running the following Stage's Jobs

No it's will be stage called before_script, with one job before:script, and after that (when succesfull) a stage build with debug and release jobs (which could be executed in paralell theroetically.

If we had another Job in the before_script stage (which we usually wouldn't have but just assuming that there could be another Job, it could by any other stage as well), eg. before:script2 we would get

No it would be

  • stage before_script
    • job before:script
    • job before:script2
  • stage build
    • job debug
    • job release

happy if I can tell more if something unclear

@NiklasRosenstein
Copy link
Owner

NiklasRosenstein commented Aug 2, 2018

before_script and after_script are basically two special things - to run before anything else and after everything else. What we are doing here is reshaping it to be a job just like everything else (so you can iterate over it and put into common hierachy).

I understand that they are special because they are actually executed around any other job, and not independent jobs. But then I'm not sure why they are put into their own stages and jobs -- wouldn't it make more sense to put them into a Job directly, as in

before_script:
- echo "before_script"

after_script:
- echo "after_script"

build:
  script:
  - echo "build:script"

The before_script and after_script would then basically be inserted into the build Job.

build:
  script:
  - echo "before_script"
  - echo "build:script"
  - echo "after_script"

Maybe I also misunderstand the way you were planning on accessing and executing the before and after scripts per job.

@gsantner
Copy link
Contributor Author

gsantner commented Aug 2, 2018

I understand that they are special because they are actually executed around any other job, and not independent jobs. But then I'm not sure why they are put into their own stages and jobs -- wouldn't it make more sense to put them into a Job directly, as in

Yes! But other CIs usually have this special two cases, and we are doing it cleverer by just converting it to be like a normal job :D. (As in, to make existing cifiles easy reusable, without big modifications)

Why own stages? Because this >Only when all jobs in a stage complete successfully the next stage is allowed to start<. So they are ensured to be run before/after everything else, and nothing else is in the same stage gets executed.

Currently there would be one special case, were you really manually write stage: before_script, but well I think in this case a user really really wants to have it like this :'D.

The before_script and after_script would then basically be inserted into the build Job.

I don't quite understand this one. One job has as much script lines how it wants, but this way we have a concated job with no differentation what has to be run before and after:

before:script:
  stage: before_script
  script:
     - bash "cool_script.sh"
     - rm -rf log.txt
     - mail -s "everything succeed"

How would we know then that bash,rm belongs to start and mail to the very end of all jobs?

Maybe sorry, didn't describe that: one script part is a ordered list of shell executions, that all have to succeed for the job to go green. (heres a litte better example than mine: https://docs.gitlab.com/ee/ci/examples/test-and-deploy-ruby-application-to-heroku.html)

@NiklasRosenstein
Copy link
Owner

NiklasRosenstein commented Aug 2, 2018

Why own stages? Because this >Only when all jobs in a stage complete successfully the next stage is allowed to start<. So they are ensured to be run before/after everything else, and nothing else is in the same stage gets executed.

This only makes sense to me if you assume that all stages are executed in the same folder and without resetting or flushing the folder between jobs. Does that apply?

How would we know then that bash,rm belongs to start and mail to the very end of all jobs?

Well we know it from the original CI input file -- the global before_script would be a shortcut to adding the commands before the actual commands of a job.

Once we have a full description of the commands for a job, it is no longer important whether a command was part of the before_script or after_script section, as they have been placed in the semantically correct place in the Job's scripts.

@gsantner
Copy link
Contributor Author

gsantner commented Aug 3, 2018

This only makes sense to me if you assume that all stages are executed in the same folder and without resetting or flushing the folder between jobs. Does that apply?

Exactly. When a job starts, the pwd should be the git root folder, when calling - cd docs and - bash makedocumentation.sh it will be in docs in second script line (as chdir before). But still all other jobs should start with git root folder indipendently (otherwise it's impossible to use jobs :D ).

At least till jobs etc would work, thats exactly what I thougt. When we have script/jobs/stages/pipeline working, we could then go on for respecting the dependencies: and artifacts: tags, which control what gets shared between jobs. But that might get quite tricky. Would not go/think of that in the current state.


Well we know it from the original CI input file -- the global before_script would be a shortcut to adding the commands before the actual commands of a job.

Aah, I think we are talking both about something differently! :D Oh I see actually GitLab CI seems todo that differently or at least I thought so. In GitLab CI it works like you said, with putting all lines before and after job script (didn't use that recently :D). Sorry! Forget what I said before about before/after script! :D

I see now. We do want it the way of before/after script of all jobs instead of "before/after pipeline" (thats currently) right? Can change that

grafik

@NiklasRosenstein
Copy link
Owner

All jobs should be run independently from each other with no leftovers from a previous job (so basically a fresh checkout from the repository). That is why I thought it was a good idea to have the before_script and after_script prepended/appended to the actual job's script. And that is also why it didn't make sense to me to put the before_script and after_script into their own stages.

At least till jobs etc would work, thats exactly what I thougt. When we have script/jobs/stages/pipeline working, we could then go on for respecting the dependencies: and artifacts: tags, which control what gets shared between jobs. But that might get quite tricky. Would not go/think of that in the current state.

I think sharing between jobs is an interesting feature, but is that actually a feature in the GitLab pipelines? I can't seem to find info about that anywhere. Artifacts are usually what you can download from the CI after the Job has run.

@gsantner
Copy link
Contributor Author

gsantner commented Jul 7, 2019

Closing as it looks like the project doesn't really goes on. Even though that one is mostly complete, there is still stuff left todo to actually use it into the whole. Which I assume will not happen

@gsantner gsantner closed this Jul 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants