-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from t4d-gmbh/8-ci-cd-for-reproducibility
raw content - ci cd for reprod
- Loading branch information
Showing
4 changed files
with
340 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
{% raw %} | ||
## Using GitLab and GitHub CI/CD for Scientific Analysis | ||
|
||
Both **GitLab** and **GitHub CI/CD** can be used for running scientific analyses, automating workflows, ensuring reproducibility, and enhancing collaboration. These tools offer automation capabilities tailored for complex, repetitive tasks, and can be customized to support various scientific applications. | ||
|
||
--- | ||
|
||
### Why Use CI/CD for Scientific Analysis? | ||
|
||
Scientific workflows often include processes that are ideal for automation, such as: | ||
|
||
- **Data preprocessing**: Cleaning, normalizing, and structuring data. | ||
- **Simulations**: Running computational models based on data or parameter sets. | ||
- **Reproducibility**: Ensuring results can be reliably reproduced by others. | ||
- **Collaboration**: Allowing collaborators to share and reuse workflows. | ||
|
||
Both **GitLab** and **GitHub** pipelines help you: | ||
|
||
- Automate repetitive tasks. | ||
- Ensure experiments are performed in consistent environments. | ||
- Track changes to both data and code for transparency. | ||
- Collaborate and share results with ease. | ||
|
||
--- | ||
|
||
### How GitLab CI/CD Can Be Used for Scientific Analysis | ||
|
||
#### 1. Running Data Analysis Pipelines | ||
|
||
In GitLab CI/CD, runners can execute data analysis scripts written in Python, R, or other languages. | ||
|
||
**Example: Data Analysis Pipeline with Python** | ||
|
||
```yaml | ||
stages: | ||
- data_preprocessing | ||
- analysis | ||
|
||
preprocess_data: | ||
stage: data_preprocessing | ||
script: | ||
- python preprocess_data.py raw_data.csv cleaned_data.csv | ||
|
||
run_analysis: | ||
stage: analysis | ||
script: | ||
- python analyze_data.py cleaned_data.csv results.csv | ||
``` | ||
#### 2. Running Simulations | ||
You can set up GitLab pipelines to run simulations automatically whenever data or configurations change. | ||
**Example: Running a Simulation in GitLab** | ||
```yaml | ||
stages: | ||
- simulation | ||
|
||
run_simulation: | ||
stage: simulation | ||
script: | ||
- python run_simulation.py input_data.csv output_results.csv | ||
``` | ||
#### 3. Using Docker for Reproducibility | ||
With GitLab CI/CD, you can run jobs inside Docker containers to ensure reproducibility and consistent environments for scientific analyses. | ||
**Example: Running a Job in a Docker Container in GitLab** | ||
```yaml | ||
stages: | ||
- test | ||
|
||
run_in_docker: | ||
stage: test | ||
image: python:3.9 | ||
script: | ||
- pip install -r requirements.txt | ||
- python analyze_data.py cleaned_data.csv | ||
``` | ||
#### 4. Scheduling Scientific Workflows | ||
GitLab CI/CD allows you to schedule recurring jobs (e.g., running analyses or simulations at regular intervals). | ||
**Example: Scheduling a Daily Data Analysis Job in GitLab** | ||
```yaml | ||
stages: | ||
- analysis | ||
|
||
run_daily_analysis: | ||
stage: analysis | ||
script: | ||
- python daily_analysis.py | ||
only: | ||
- schedules | ||
``` | ||
--- | ||
### How GitHub Actions Can Be Used for Scientific Analysis | ||
#### 1. Running Data Analysis Pipelines | ||
GitHub Actions can automate the execution of data analysis workflows, triggered by events such as new data uploads or code changes. | ||
**Example: Automating a Data Analysis Workflow in GitHub** | ||
```yaml | ||
name: Data Analysis | ||
|
||
on: [push] | ||
|
||
jobs: | ||
preprocess: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v2 | ||
- name: Set up Python | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: 3.x | ||
- name: Install dependencies | ||
run: pip install -r requirements.txt | ||
- name: Preprocess Data | ||
run: python preprocess_data.py raw_data.csv cleaned_data.csv | ||
|
||
analyze: | ||
runs-on: ubuntu-latest | ||
needs: preprocess | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v2 | ||
- name: Run Analysis | ||
run: python analyze_data.py cleaned_data.csv results.csv | ||
``` | ||
#### 2. Running Simulations | ||
GitHub Actions can trigger simulations to run on GitHub-hosted runners or custom environments, useful for automating experiments. | ||
**Example: Running a Simulation with GitHub Actions** | ||
```yaml | ||
name: Simulation Run | ||
|
||
on: [push] | ||
|
||
jobs: | ||
run_simulation: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v2 | ||
- name: Set up Python | ||
uses: actions/setup-python@v2 | ||
- name: Run Simulation | ||
run: python run_simulation.py input_data.csv output_results.csv | ||
``` | ||
#### 3. Using Docker for Reproducibility | ||
Like GitLab, GitHub Actions supports Docker containers, which can help ensure that analyses are performed in a consistent, reproducible environment. | ||
**Example: Running a Dockerized Analysis in GitHub Actions** | ||
```yaml | ||
name: Dockerized Analysis | ||
|
||
on: [push] | ||
|
||
jobs: | ||
analysis: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Set up Docker | ||
uses: docker/setup-buildx-action@v1 | ||
- name: Build and run Docker container | ||
run: | | ||
docker build -t analysis . | ||
docker run analysis python analyze_data.py | ||
``` | ||
#### 4. Scheduling Scientific Jobs | ||
You can use GitHub Actions to schedule jobs that run periodically, such as weekly data analyses or simulations. | ||
**Example: Scheduling a Weekly Job in GitHub Actions** | ||
```yaml | ||
name: Weekly Data Processing | ||
|
||
on: | ||
schedule: | ||
- cron: "0 0 * * 0" # Every Sunday at midnight | ||
|
||
jobs: | ||
process_data: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v2 | ||
- name: Process Data | ||
run: python process_data.py | ||
``` | ||
--- | ||
### Benefits of Using GitLab/GitHub CI/CD for Scientific Analysis | ||
- **Automation**: Eliminate manual execution of repetitive tasks such as data cleaning, analysis, or model training. | ||
- **Reproducibility**: Use Docker containers and version control to ensure that all jobs run in the same environment, making it easier to replicate analyses. | ||
- **Collaboration**: Collaborators can easily replicate, review, and contribute to workflows by accessing the pipelines. | ||
- **Scalability**: Use custom or cloud-based runners to handle large, resource-intensive scientific workflows. | ||
--- | ||
### Conclusion | ||
Both **GitLab** and **GitHub CI/CD** are excellent tools for automating scientific analysis workflows, ensuring reproducibility, and improving collaboration. Whether you're running simulations, analyzing data, or automating machine learning workflows, CI/CD pipelines provide a powerful framework to streamline research and make it more robust, transparent, and scalable. | ||
{% endraw %} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
{% raw %} | ||
## Using GitLab/GitHub CI/CD Pipelines to Create and Distribute Docker Images | ||
|
||
Both **GitLab** and **GitHub** provide robust CI/CD capabilities that can be leveraged to automate the creation and distribution of Docker images. Below is a high-level overview of how to set up CI/CD pipelines in both platforms for this purpose. | ||
|
||
### 1. Prerequisites | ||
|
||
#### Docker Installation | ||
- Ensure that Docker is installed on the machine where the CI/CD runner will execute the jobs. | ||
|
||
#### Docker Registry | ||
- Set up a Docker registry to store your images. You can use: | ||
- **Docker Hub**: A public registry for sharing images. | ||
- **GitLab Container Registry**: A built-in private registry for GitLab users. | ||
- **GitHub Container Registry**: A built-in registry for GitHub users. | ||
|
||
### 2. Creating Docker Images | ||
|
||
#### GitLab CI/CD | ||
|
||
##### Step 1: Define the `.gitlab-ci.yml` File | ||
Create a `.gitlab-ci.yml` file in the root of your repository to define the CI/CD pipeline. Here’s a basic example: | ||
|
||
```yaml | ||
stages: | ||
- build | ||
- push | ||
|
||
build: | ||
stage: build | ||
image: docker:latest | ||
services: | ||
- docker:dind | ||
script: | ||
- docker build -t myapp:latest . | ||
|
||
push: | ||
stage: push | ||
image: docker:latest | ||
script: | ||
- echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY | ||
- docker tag myapp:latest $CI_REGISTRY/mygroup/myapp:latest | ||
- docker push $CI_REGISTRY/mygroup/myapp:latest | ||
``` | ||
##### Step 2: Configure Variables | ||
- Set up CI/CD variables in GitLab for `CI_REGISTRY`, `CI_REGISTRY_USER`, and `CI_REGISTRY_PASSWORD` to authenticate with your Docker registry. | ||
|
||
#### GitHub Actions | ||
|
||
##### Step 1: Define the Workflow File | ||
Create a workflow file in the `.github/workflows` directory (e.g., `docker-build.yml`). Here’s a basic example: | ||
|
||
```yaml | ||
name: Build and Push Docker Image | ||
on: | ||
push: | ||
branches: | ||
- main | ||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v2 | ||
- name: Log in to Docker Hub | ||
uses: docker/login-action@v1 | ||
with: | ||
username: ${{ secrets.DOCKER_USERNAME }} | ||
password: ${{ secrets.DOCKER_PASSWORD }} | ||
- name: Build the Docker image | ||
run: | | ||
docker build -t myapp:latest . | ||
- name: Push the Docker image | ||
run: | | ||
docker tag myapp:latest myusername/myapp:latest | ||
docker push myusername/myapp:latest | ||
``` | ||
|
||
##### Step 2: Configure Secrets | ||
- In your GitHub repository settings, add secrets for `DOCKER_USERNAME` and `DOCKER_PASSWORD` to authenticate with your Docker registry. | ||
|
||
### 3. Distributing Docker Images | ||
|
||
#### Using Docker Registries | ||
Once the Docker images are built and pushed to the registry, they can be easily distributed and pulled by other developers or deployment environments. Here’s how: | ||
|
||
- **Pulling Images**: Users can pull the images from the registry using the `docker pull` command: | ||
|
||
```bash | ||
docker pull myusername/myapp:latest | ||
``` | ||
|
||
- **Deployment**: The images can be deployed to various environments (e.g., staging, production) using orchestration tools like Kubernetes or Docker Compose. | ||
|
||
### 4. Best Practices | ||
|
||
- **Versioning**: Tag your Docker images with version numbers (e.g., `myapp:v1.0.0`) to keep track of changes and ensure reproducibility. | ||
- **Automated Testing**: Include automated tests in your CI/CD pipeline to validate the Docker image before pushing it to the registry. | ||
- **Security Scans**: Use tools to scan your Docker images for vulnerabilities before distribution. | ||
|
||
### Conclusion | ||
|
||
By leveraging the CI/CD capabilities of **GitLab** and **GitHub**, you can automate the process of creating and distributing Docker images. This not only streamlines your development workflow but also ensures that your applications are consistently built and deployed across different environments. 🚀🐳 | ||
{% endraw %} |