Skip to content

Commit

Permalink
Merge pull request #95 from jhudsl/kweav-nameChunks
Browse files Browse the repository at this point in the history
Update 01-intro.Rmd with code chunk names and fix urls
  • Loading branch information
carriewright11 authored Jan 13, 2025
2 parents 682f37e + 4749b1a commit 6e5aff7
Show file tree
Hide file tree
Showing 9 changed files with 31 additions and 26 deletions.
6 changes: 3 additions & 3 deletions 01-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ One of the key challenges in cancer informatics is dealing with and managing the
This course is intended for researchers, including postdocs and students, with limited to intermediate experience with informatics research. The conceptual material will also be useful for those in management roles who are collecting data and using informatics pipelines.


```{r, fig.align='center', echo = FALSE, fig.alt= "For individuals whom: Have no formal training in informatics. Are relatively new to informatics. Want to learn the basics of computers and shared computing resources. Want guidance for choosing computing options", out.width= "100%"}
```{r for_individuals_who, fig.align='center', echo = FALSE, fig.alt= "For individuals whom: Have no formal training in informatics. Are relatively new to informatics. Want to learn the basics of computers and shared computing resources. Want guidance for choosing computing options", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g11db82d2864_1_65")
```

## Topics covered:

```{r, fig.align='center', echo = FALSE, fig.alt= "Concepts discussed in the Computing for Cancer Informatics course: How computer hardware and software work. Computing resources designed for research Data sizes and computational capacity. Guidance about computing resource decisions. How shared computing resources work. Etiquette for shared computing resources.", out.width= "100%"}
```{r topics_covered, fig.align='center', echo = FALSE, fig.alt= "Concepts discussed in the Computing for Cancer Informatics course: How computer hardware and software work. Computing resources designed for research Data sizes and computational capacity. Guidance about computing resource decisions. How shared computing resources work. Etiquette for shared computing resources.", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g11db82d2864_1_81")
```

Expand All @@ -38,6 +38,6 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE
The course will cover key underlying principles and concepts in computing. We will go over concrete discussions of the differences between cloud and local computing. The course will also highlight a number of computing options and describe etiquette basics for using shared resources.


```{r, fig.align='center', echo = FALSE, fig.alt= "Overall Course Learning Objectives. This course will demonstrate how to: 1.Recognize various data management systems especially for cancer research related data, 2.Compare and make informed decisions about computation platforms (including economic considerations),3.Implement best practices for data security and privacy, 4. Share data safely and securely in a variety of contexts,5.Handle IRB and data access requests,6.Apply ethical consideration in data management workflows", out.width= "100%"}
```{r learning_objectives, fig.align='center', echo = FALSE, fig.alt= "Overall Course Learning Objectives. This course will demonstrate how to: 1.Recognize various data management systems especially for cancer research related data, 2.Compare and make informed decisions about computation platforms (including economic considerations),3.Implement best practices for data security and privacy, 4. Share data safely and securely in a variety of contexts,5.Handle IRB and data access requests,6.Apply ethical consideration in data management workflows", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf5f8818810_1_5")
```
4 changes: 2 additions & 2 deletions 03-Binary_data_to_computations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Previously, back when a university might have one single computer, as they were
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf96b1d997a_0_1")
```

There were many [different kinds](https://www.jkmscott.net/data/Punched%20Cards.html) of punch cards over time, see @scott_collection_2016 for a collection.
There were many [different kinds](https://www.jkmscott.net/data/PunchedCards/PunchedCards.html) of punch cards over time, see @scott_collection_2016 for a collection.



Expand All @@ -125,7 +125,7 @@ Also check out @hardware_history_2021 for really interesting and more extensive

Also, here is some fascinating additional reading on the role of women as computer operators starting in the 1940s. Initially computer science was actually thought of as a field for women, however this changed over time (and now women and gender minorities are hopefully becoming more represented) :

* [Article titled: Woman pioneered computer programming. Then men took their industry over](https://timeline.com/women-pioneered-computer-programming-then-men-took-their-industry-over-c2959b822523) [@visions_women_2017]
* [Article titled: Woman pioneered computer programming. Then men took their industry over](https://pages.memoryoftheworld.org/library/Josh%20O%27Connor/Women%20pioneered%20computer%20programming.%20Then%20men%20took%20their%20industry%20over_%20%28321%29/Women%20pioneered%20computer%20programming.%20Then%20-%20Josh%20O%27Connor.pdf) [@visions_women_2017]
* [Article titled: Untold History of AI: Invisible Women Programmed America's First Electronic Computer The “human computers” who operated ENIAC have received little credit](https://spectrum.ieee.org/untold-history-of-ai-invisible-woman-programmed-americas-first-electronic-computer) [@untold_2019]


Expand Down
27 changes: 15 additions & 12 deletions 04-Computing_Systems.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ Many of us use cloud storage regularly for Google Docs and backing up photos usi
Furthermore, this also allows for more opportunity to scale your work to a larger extent, as there is generally more computing capacity possible with most cloud resources [@cloudvstrad].


Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that paying customers use through the internet.** In addition to these commercial options, there are newer national government funded resource options like [Jetstream](https://portal.xsede.org/jetstream) (described in the next section). We will compare computing options in another chapter coming up.
Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that paying customers use through the internet.** In addition to these commercial options, there are occasionally national government funded resource options like Texas Advanced Computing Center (TACC) and others previously funded by the former project called [XSEDE](https://portal.xsede.org/) (described in the next section). We will compare computing options in another chapter coming up.



Expand All @@ -308,7 +308,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE

It's important to remember that all of the shared computing options that we previously described involve a [data center](https://en.wikipedia.org/wiki/Data_center) where are large number of computers are physically housed.

```{r, fig.align='center', echo = FALSE, fig.alt= "Examples of servers or shared computers include clusters that may exist at your institution or national computing resources like Xsede.", out.width= "100%"}
```{r, fig.align='center', echo = FALSE, fig.alt= "Examples of servers or shared computers include clusters that may exist at your institution or national computing resources", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_23")
```

Expand All @@ -319,26 +319,30 @@ You may have access to a [HPC (which stands for High Performance Computing) clus
If your university or institution has a HPC [cluster](https://en.wikipedia.org/wiki/Computer_cluster), this means that they have a group of computers acting like a server that people can use to store data or assist with intensive computations. Often institutions can support the cost of many computers within an HPC cluster. This means that multiple computers will simultaneously perform different parts of the computing required for a given task, thus significantly speeding up the process compared to you trying to perform the task on just your computer!


If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource option like [Xsede](https://www.xsede.org/).
[Xsede](https://www.xsede.org/) is led by the University of Illinois National Center for Supercomputing Applications (NCSA) and includes 18 other partnering institutions (which are mostly other universities). Through this partnership, they currently support 16 supercomputers. Universities and non-profit researchers in the United States can request access to their computational and data storage resources. See [here](https://portal.xsede.org/allocations/resource-info) for descriptions of the available resources.
If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource option like the [Texas Advanced Computing Center (TACC)](https://en.wikipedia.org/wiki/Texas_Advanced_Computing_Center) which was funded by the National Science Foundation (NSF) [XSEDE](https://www.xsede.org/) program.
Universities and non-profit researchers in the United States can request access to their computational and data storage resources. Other resource options include:

- [San Diego Supercomputer Center (SDSC)](https://www.sdsc.edu/) at the University of California, San Diego
- [National Institute for Computational Sciences (NICS)](https://www.nics.tennessee.edu/), at the University of Tennessee, Knoxville
- [Pittsburgh Supercomputing Center (PSC)](https://www.psc.edu/) at the Carnegie Mellon University and University of Pittsburgh

Here you can see a photo of Stampede2, one of the supercomputers that members of Xsede can utilize.

```{r, fig.align='center', echo = FALSE, fig.alt= "An image of Stampede2 one of the supercomputers that members of Xsede can use.", out.width= "100%"}
Here you can see a photo of Stampede2, one of the supercomputers that members of TACC could utilize (it has now been replaced with Stampede3).

```{r, fig.align='center', echo = FALSE, fig.alt= "An image of Stampede2 one of the supercomputers that members of TACC could use.", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_63")
```


[[source](https://www.xsede.org/ecosystem/resources)]
[[source](https://www.xsede.org/)]

> Stampede2, generously funded by the National Science Foundation (NSF) through award ACI-1134872, is one of the Texas Advanced Computing Center (TACC), University of Texas at Austin's flagship supercomputers.
See [here](https://portal.xsede.org/tacc-stampede2) for more information about how you could possibly connect to and utilize Stampede2.
See [this article about Stampede2 and the transition to Stampede3](https://tacc.utexas.edu/news/latest-news/2023/07/24/taccs-new-stampede3-advances-nsf-supercomputing-ecosystem/) for more information about their resources and see [their getting started website](https://tacc.utexas.edu/use-tacc/getting-started) on how you could possibly use their resources.

Importantly when you use shared computers like national resources like Stampede2 available through Xsede, as well as institutional HPCs, you will share these resources with many other people and so you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in a coming chapter.
Importantly when you use shared computers like national resources like [Stampede2](https://tacc.utexas.edu/systems/stampede2/) and [Stampede3](https://docs.tacc.utexas.edu/hpc/stampede3/), as well as institutional HPCs, you will share these resources with many other people and so you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in a coming chapter.

However, there is also now an option to access the different XSEDE computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/).
There is also an option to access national computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/).

Here is a video about Jetstream2:

Expand All @@ -348,7 +352,6 @@ knitr::include_url("https://www.youtube.com/embed/NQ3flxJANTw")




We will also discuss how the use of these various computing options differ in the next chapters. Importantly there are also some computing platforms that have been especially designed for scientists and specific types of researchers, so it is also useful to know about these options.


Expand All @@ -367,6 +370,6 @@ In conclusion, here are some of the major take-home messages:
7) A supercomputer is a computer that has much more storage, memory, and computing capacity than a typical personal computer. Supercomputers are generally much more expensive than using a group of more typical computers that together would have the same collective computing and storage capacity.
8) There are two general types of servers: clusters and grids. Cluster approaches work by having several computers working on pieces of the same task simultaneously in a method called parallel computing. Grid approaches work by having different types of computers working on different tasks.
9) Cloud computing is essentially the use of many servers accessed through the internet. This is often more reliable because there are many servers to use, even if one other users are performing large tasks or if a server goes down. We will talk more about the pros and cons of this option in the coming chapters.
10) If your institute doesn't provide you access to a shared computing resource and you don't want to use a commercial cloud option, you could consider options like [Xsede](https://www.xsede.org/) and or [Jetstream2](https://jetstream-cloud.org/), which is a national resource that you can request access to.
10) If your institute doesn't provide you access to a shared computing resource and you don't want to use a commercial cloud option, you could consider options like [TACC](https://en.wikipedia.org/wiki/Texas_Advanced_Computing_Center) and or [Jetstream2](https://jetstream-cloud.org/), which is a national resource that you can request access to.


4 changes: 2 additions & 2 deletions 05-Shared_computing_etiquette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Each cluster or other shared computing resource will have different rules and re

One major aspect to consider is keeping the computers in the cluster safe from harm. You wouldn't want to lose your precious data stored on the cluster and neither would your colleagues!

- Use a good [secure password](https://its.lafayette.edu/policies/strongpasswords/) that is not easy for someone else to guess.
- Use a good [secure password](https://help.lafayette.edu/guidelines-for-strong-passwords/) that is not easy for someone else to guess.

Some people suggest using sentences that are easy for you to remember, you could consider a line of lyrics from song or poem that you like, or maybe a movie. Modify part of it to include symbols and numbers [@passwords].

Expand Down Expand Up @@ -138,7 +138,7 @@ Typically a program is used to schedule jobs. Remember that jobs are the individ

Such job scheduling programs assign jobs to available node resources as they become available and if they have the required resources to meet the job. These programs have their own commands for running jobs, checking resources, and checking jobs. Remember to use the management system to run your jobs using the compute nodes not the login nodes (nodes for users to log in). There are often nodes set up for transferring files as well.

In the case of the JHPCE, a program called Sun Grid Engine (SGE) is used, but there are others job management programs. See [here](https://jhpce.jhu.edu/wp-content/uploads/2021/06/JHPCE-Overview-2021-10.pdf) for more information on how people use SGE for the JHPCE shared resource.
In the case of the JHPCE, a program called Sun Grid Engine (SGE) is used, but there are others job management programs. See [here](https://jhpce.jhu.edu/orient/images/sge-orient.pdf) for more information on how people use SGE for the JHPCE shared resource.

### Specifying memory (RAM) needs

Expand Down
Loading

0 comments on commit 6e5aff7

Please sign in to comment.