Skip to content

Commit

Permalink
Merge branch 'main' of github.com:csc-training/geocomputing_course
Browse files Browse the repository at this point in the history
  • Loading branch information
samumantha committed Oct 10, 2023
2 parents b704af1 + 18e7a22 commit d4e8f4f
Show file tree
Hide file tree
Showing 18 changed files with 207 additions and 139 deletions.
11 changes: 5 additions & 6 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,16 @@ Welcome to CSC Geocomputing course!
* Are you curious on how you can take your geospatial data processing and analysis to the next level?
* Or maybe you have been using a supercomputer already, but would like to make sure your are getting the most out of it?

**This course is intended for you!**
→ This course is intended for you!

In this course we will learn the basics of geocomputing on a supercomputer through a combination of lectures and hands-on activities. The main focus of the course is Puhti supercomputer, were all hands-on exercises will be done. The CSC services discussed in this course are free-of-charge for academic research, education and training purposes for Finnish higher education institutions and state research institutes (subsidized by the Ministry of Education and Culture, Finland).
**In this course we will learn the basics of geocomputing on a supercomputer through a combination of lectures and hands-on activities.** The main focus of the course is Puhti supercomputer, were all hands-on exercises will be done. The CSC services discussed in this course are free-of-charge for academic research, education and training purposes for Finnish higher education institutions and state research institutes (subsidized by the Ministry of Education and Culture, Finland).

Most of the course content also applies to LUMI supercomputer, which is available for academic users **and companies**.
Most of the course content also applies to LUMI supercomputer, which is available for academic users and companies.

The course is meant both for academic researchers planning to use Puhti supercomputer and for data analysts from private companies planning to use LUMI.

.. warning::
THIS MATERIAL IS WORK IN PROGRESS, do not trust anything ! ;)

Table of contents
###################

.. toctree::
:maxdepth: 2
Expand Down
7 changes: 5 additions & 2 deletions materials/account_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ Every CSC account must be linked with a **CSC project**, which enables you to sh

CSC services are **free of charge** for open science at Finnish higher education institutions and research institutes.

[CSC Docs: Accounts and projects](https://docs.csc.fi/accounts/)
* [CSC Docs: Accounts and projects](https://docs.csc.fi/accounts/)
* [LUMI, get started](https://lumi-supercomputer.eu/get-started/)
* [CSC, LUMI high performance computing services offer companies a competitive advantage](https://csc.fi/web/guest/solutions-for-business)

:::{admonition} Course project
:class: hint
Expand Down Expand Up @@ -47,7 +49,8 @@ Your first steps into many CSC services goes via [`https://my.csc.fi`](https://m
- Amount of resources allocated: All requested resources are billed ie. number of cores, amount of memory
- Time allocated: Resources are billed based on the actual (wall) _time_ a job has **used**, not the reserved maximum time

[CSC Docs: Billing units](https://docs.csc.fi/accounts/billing/)
* [CSC Docs: Billing units](https://docs.csc.fi/accounts/billing/)
* [LUMI Docs: Billing policy](https://docs.lumi-supercomputer.eu/runjobs/lumi_env/billing/)

### Applying for billing units

Expand Down
15 changes: 13 additions & 2 deletions materials/csc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,28 @@

![Kajaani](images/kajaani.png)

## [Geoportti](https://www.geoportti.fi)
## Geoportti

Geoportti Research Infrastructure (RI) is a shared service for researchers, teachers and students using geospatial data and geocomputing tools. Geoportti RI helps the researchers in Finland to use, to refine, to preserve and to share their geospatial resources.

* [GeoPortti web portal](https://www.geoportti.fi)
* GeoPortti services:
* [GeoPortti GeoCubes](https://vm0160.kaj.pouta.csc.fi/geocubes/) - a harmonised, multi-resolution raster geodata repository containing several national datasets
* [GeoPortti GeoPrivacy](https://geoprivacy.fi/#/) - a service where cyclists and pedestrians can donate GPS tracking data for science.
* [UEF Drone Lab](https://www.geoportti.fi/tools/drones/)
* [Geospatial Challenge Camp](https://challenge-camp.geoportti.fi/en/latest/) - a 10-week long challenge-based course (5 ECTS) that aims to provide participants a chance to tackle relevant real-world challenges in cross-disciplinary teams
* At CSC: supercomputer geospatial installations, support and documentation, STAC, GIS training.

![](./images/geoportti.png)


## [Location Innovation Hub](https://locationinnovationhub.eu)
## Location Innovation Hub

The Location Innovation Hub (LIH) is a centre of excellence in location information coordinated by the Finnish Geospatial Research Institute. Our services are produced in conjunction with a partner network. We help companies to grow their business with location information. We also serve the public sector.

* [Location Innovation Hub](https://locationinnovationhub.eu)
* At CSC: introducing LUMI to geospatial companies

![](./images/lih.png)


8 changes: 7 additions & 1 deletion materials/csc_services.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,10 @@
:::{admonition} Want to know more?
:class: seealso
See also [CSC service catalog](https://research.csc.fi/en/service-catalog)
:::
:::

:::{admonition} Sensitive data
:class: important

Sensiteve data should be saved and processed only in services for sensitive data: [SD services](https://research.csc.fi/sensitive-data-services-for-research), [ePouta](https://research.csc.fi/-/epouta) Encrypted files can be stored also to [Allas](https://research.csc.fi/-/allas). Supercomputers and cPouta should not be used for sensitive data.
:::
4 changes: 2 additions & 2 deletions materials/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Arttu Kivimäki, FGI/NLS: [Mosaicking Sentinel-2 data in Puhti](https://a3s.fi/g
Tapio Friberg, ICEYE: [LUMI usecase](https://gis-seminars.a3s.fi/2023-06-08-lumi-for-gis-iceye-use-case.pdf)
```
You can find all CSC seminar presentations on [CSC geocomputing research pages](https://research.csc.fi/geocomputing-seminars).
You can find more use case presentations from [CSC: geocomputing seminars page](https://research.csc.fi/geocomputing-seminars).

## Some publications from Finland that used Puhti

Expand Down Expand Up @@ -83,4 +83,4 @@ Samantha Wittke et al, FGI/Aalto [EODIE - Earth Observation Data Information Ext
```


Know more? -> Please let us know :)
Know more? -> Please let us know :)
4 changes: 2 additions & 2 deletions materials/exercise_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ In an interactive batch job, an interactive shell session is launched on a compu

### Launching an interactive job / compute node shell

Observe how now you need to define the resources you want to reserve now.
Observe how you need to now define the resources you want to reserve.
Let's reserve 10 minutes.

:::{admonition} Other ways of starting an interactive session
Expand Down Expand Up @@ -201,4 +201,4 @@ gdalinfo /appl/data/geo/luke/forest_wind_damage_sensitivity/2017/windmap2017_int
* Resource request lines start with `#SBATCH`
* You can find the jobs output, errors and prints in `slurm-jobid.out`

:::
:::
29 changes: 24 additions & 5 deletions materials/exercise_r.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Exercise: R

## R in supercomputers
* `r-env` is the only R module in Puhti with ~1300 packages for all fields of science.
* Mahti does not have R.
* LUMI has only [EasyBuild recepy for R](https://lumi-supercomputer.github.io/LUMI-EasyBuild-docs/r/R/)
* [CSC Docs: r-env](https://docs.csc.fi/apps/r-env/)
* [CSC Docs: R for GIS](https://docs.csc.fi/apps/r-env-for-gis/)

:::{admonition} Timing
:class: note

Expand All @@ -10,23 +17,35 @@
:::{admonition} Goals
:class: note

*
* Get to know `r-env` R environment on Puhti
* Running R code interactively and as batch job
* Try out different ways of parallelizing R code


:::

:::{admonition} Prerequisites
:class: important

* ...
* [CSC user account](https://docs.csc.fi/accounts/how-to-create-new-user-account/) and [project](https://docs.csc.fi/accounts/how-to-create-new-project/) with [access to Puhti](https://docs.csc.fi/accounts/how-to-add-service-access-for-project/)
* Some experience with R spatial
* Basic Linux skills

:::



[R exercise materials in Geocomputing Github](https://github.com/csc-training/geocomputing/tree/master/R/puhti)

* Interactive working
* Simple batch job
* Parallel job
* Optional, Array job

:::{admonition} Key points
:class: important

* ...

* Puhti web interface enables working with RStudio interactively
* `future` can be used to parallelization

:::
:::
23 changes: 11 additions & 12 deletions materials/exercise_webinterface.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,38 +27,37 @@

* Open [Puhti web interface](https://puhti.csc.fi) and log in

:::{admonition} Change the default project and username

* `project_200xxxx` is example project name, replace with your own CSC project name.
* `cscusername` is example username, replace with your username.
:::

### Info
* Puhti general status: bottom of front page
* Sometimes when the `Disk lag` here is high, reading and writing files might get slow.
* Own projects, remaining billing units: `Tools` -> `Project view`
* Disk usage of own projects: `Tools` -> `Disk quotas`
* Running jobs: `Jobs` -> `Active jobs`

:::{admonition} Change the default project and username

* `project_200xxxx` is example project name, replace with your own CSC project name.
* `cscusername` is example username, replace with your username.
:::

### Files
* Open home directory: `Files` -> `Home Directory`
* Create new directory and open it
* Create new `.txt` file with your name
* Move the new file under scratch:
* Create new `myfile.txt` file and add some text to it.
* Create new directory `mydata`
* Move the new file under `mydata`:
* Mark check-box in front of the file
* Click `Copy/Move`
* Open `/scratch/project_200xxxx`
* Open `mydata`
* Click `Move`
* Open your scratch folder
* Open your `mydata` folder
* Download your file to your local computer
* Delete the file

:::{admonition} Moving data

Web interface is for moving up to 10Gb data, if you have more data use other tools. More info in [moving data](moving_data.md)

:::


### Graphical applications
#### Jupyter
Expand Down
4 changes: 2 additions & 2 deletions materials/job_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ Apart from interactive jobs, a job can be classified as **serial, parallel or GP

## Serial jobs

Serial jobs means that the computer works on only one task at a time following a sequence of instructions, while only using one core.
Serial job means that the computer works on only one task at a time following a sequence of instructions, while only using one core.

Why could your serial job benefit from being executed using CSC's resources instead of on your own computer?
Why would your serial job benefit from being executed using CSC's resources instead of on your own computer?
- Part of a larger workflow
- Avoid data transfer between CSC and your own computer
- Data sharing among other project members
Expand Down
22 changes: 14 additions & 8 deletions materials/moving_data.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Moving data

## Local computer <-> supercomputer

* [CSC Docs: Moving files between a local computer and a supercomputer](https://docs.csc.fi/data/moving/)

### Puhti Web Interface

- Very easy, no installations needed.
- Graphical, no installations needed.
- Limited functionality compared to other options.
- For smaller amounts of data, < 10 Gb.
- Upload, download, moving, creating folders.
- [Puhti Web Interface](https://puhti.csc.fi) -> Files
Expand All @@ -16,14 +14,15 @@

- For example: **FileZilla**, **WinSCP** and **CyberDuck**
- For medium amounts of data, < 1 Tb.
- Very easy, but installation required.
- Easy drag-and-drop for moving, but installation required.
- WinSCP is slower than others.
- [CSC Docs: Graphical data transfer tools](https://docs.csc.fi/data/moving/graphical_transfer/)

!["FileZilla"](./images/filezilla.jpg "FileZilla")

### Command line tools on local computer
- For any amount of data, practically required if data size > 1 Tb.
- Requires knowing the commands.

#### scp

Expand Down Expand Up @@ -55,7 +54,7 @@ rsync --info=progress2 -a /path/to/a_file [email protected]:/scratch/proj
# One folder:
rsync --info=progress2 -a /path/to/directory [email protected]:/scratch/project_200xxxx/directory
```
* `progress2` shows time left and percentage
* `--info=progress2` shows time left and percentage


:::{admonition} Firewall limitations
Expand All @@ -70,8 +69,9 @@ Some organizations, for example research institutes with IT-services from Valtor

- When downloading from exernal services try to download directly to CSC, not via your local computer
- Check what APIs/tools the service supports:
- OGC APIs, [STAC](https://csc-training.github.io/geocomputing_course/materials/stac.html)
- ftp, rsync
- Standard APIs: OGC APIs, [STAC](https://csc-training.github.io/geocomputing_course/materials/stac.html)
- Custom service APIs
- ftp, rsync
- wget/curl if HTTP-urls avaialable

### wget
Expand All @@ -86,6 +86,12 @@ wget http://wwwd3.ymparisto.fi/d3/gis_data/spesific/syvyyskayra.zip
wget -r -nc ftp://ftp.aineistot.metsaan.fi/Metsamaski/Maakunta/ --cut-dirs=2
```

:::{admonition} More options :class: note

* [CSC Docs: Moving files between a local computer and a supercomputer](https://docs.csc.fi/data/moving/)

:::



:::{admonition} Possible trouble with file transfer between Windows and Linux
Expand Down
5 changes: 3 additions & 2 deletions materials/partitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

Partitions are logical sets of nodes. Resource limitations for a job are defined by the partition (or queue) the job is submitted to. The limitations affect the maximum run time, the amount of memory, and the number of available CPU cores (which are called CPUs in Slurm). In addition, partitions may also define default resources that are automatically allocated for jobs if nothing has been specified.

Jobs should be submitted to the partition that best matches the required resources. That way, as few resources as possible are blocked and another user with a higher demand in RAM can run a job earlier. Of course, other considerations may also influence the choice of a partition.
Jobs should be submitted to the partition that best matches the required resources. That way, as few resources as possible are blocked and another user with a higher demand in memory can run a job earlier. Of course, other considerations may also influence the choice of a partition.

- [CSC Docs: Available batch job partitions](https://docs.csc.fi/computing/running/batch-job-partitions/)
- [CSC Docs: Available batch job partitions](https://docs.csc.fi/computing/running/batch-job-partitions/)
- [LUMI Docs: Slurm particions](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/partitions/)
- In order to use the resources in an efficient way, it is important to estimate the request as accurately as possible
- By avoiding an excessive "just-in-case" request, the job will start earlier

Expand Down
4 changes: 2 additions & 2 deletions materials/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ To make this course as enjoyable as possible for you and to make sure you can ge
* [UNIX tutorial for beginners](http://www.ee.surrey.ac.uk/Teaching/Unix/) (the first two topics are a good start, try also some editor)
* [Basic Linux Commands 10 min tutorial video](https://www.youtube.com/watch?v=uFPly_nGBMg) (sit back and watch)
* [CSC and Linux Cheat Sheet](./cheatsheet.md) (one page summary of the most important Linux commands – handy to have near you during the course)
* [Terminal intro](terminal.md)

## CSC account

For the exercises, a [CSC account](https://docs.csc.fi/accounts/how-to-create-new-user-account/) is needed.
For self-learning you will also need a project.
For the exercises, [CSC user account](https://docs.csc.fi/accounts/how-to-create-new-user-account/) and [project](https://docs.csc.fi/accounts/how-to-create-new-project/) with [access to Puhti](https://docs.csc.fi/accounts/how-to-add-service-access-for-project/). For Allas exercise also Allas service must be enabled for the project.
Loading

0 comments on commit d4e8f4f

Please sign in to comment.