diff --git a/README.md b/README.md index 270009c..2a57175 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,7 @@ -# programming-in-python -Materials for the CDH course 'Programming in Python' +# Programming in Python + +The entry level course 'Programming in Python', by the [Utrecht Centre for Digital Humanities](https://cdh.uu.nl/) aims to teach the basics of the Python programming language. Special attention is given to best practises in coding, e.g.: writing clean code and documentation. + +The course was first taught 15-16 November, 2021. + +This repository contains all teaching materials, exercises, solutions and extra resources relevant to the course. Where possible, all files are provided as Python Notebooks. \ No newline at end of file diff --git a/exercises/2_exercise.ipynb b/exercises/2_exercise.ipynb new file mode 100755 index 0000000..6ecdb2d --- /dev/null +++ b/exercises/2_exercise.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Exercise 2.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyP0DJevbsf+LA115QvuuYbc"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"QXRFmxivqk3J"},"source":["# Exercise 2\n","\n","1. \n"," 1. Loop over `exercise_list`, and find the word `\"stressed\"`. Assign it to a new variable.\n"," 2. Reverse the word you found in 1.1, and print the reversed word."]},{"cell_type":"code","metadata":{"id":"loap3-NPqqGD"},"source":["exercise_list = [1, 12.9, \"stressed\", \"bar\", True]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3leg-2-oqsmU"},"source":["2. Make a list of numbers. Print the average. See what happens when you try to add strings to the list."]},{"cell_type":"code","metadata":{"id":"qfYQgZ6UtGYX"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"yL3K8Yygte30"},"source":["3. For each of the following lists, check if the first and last values are the same. Print `\"yay\"` if so, `\"nay\"` if not. Try to first turn them into a lists of lists. Are there any results you did not expect?"]},{"cell_type":"code","metadata":{"id":"EZvk0hKluPNy"},"source":["list1 = [1, 2, 3, 5, 1]\n","list2 = [1, 2, 3, 5, \"1\"]\n","list3 = [1, 2, 3, 5, True]\n","list4 = [False, \"True\", \"False\", \"False\"]\n","list5 = [\"true\", False, \"True\"]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"FMySjPAXx4v1"},"source":["4. *FizzBuzz* (advanced)\n","\n"," 1. Print all numbers from 1 to 100. The numbers are provided by the code below.\n"," 2. If a number is divisible by 3, print `\"Fizz\"` instead.\n"," 3. If a nummber is divisible by 5, print `\"Buzz\"` instead.\n"," 4. If a number is divisible by 3 AND divisible by 5, print only`\"FizzBuzz\"`, not \"Fizz\" or \"Buzz\".\n","\n","Hint: use the modulo operator: `number1 % number2`. Try to see if you can figure out what it does.\n","\n"]},{"cell_type":"code","metadata":{"id":"Z3CaG-uhyngV"},"source":["numbers = list(range(0, 101))\n","print(2 % 3)\n","print(3 % 3)\n"],"execution_count":null,"outputs":[]}]} \ No newline at end of file diff --git a/exercises/2_solutions.ipynb b/exercises/2_solutions.ipynb new file mode 100755 index 0000000..06f53ab --- /dev/null +++ b/exercises/2_solutions.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Exercise 2 - solutions.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"v1wakPAeqvhA"},"source":["# Exercise 2 - Solutions"]},{"cell_type":"markdown","metadata":{"id":"iPcFeOQjuxE4"},"source":["## 1"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0ZaiO9o6q7VO","executionInfo":{"status":"ok","timestamp":1636915394327,"user_tz":-60,"elapsed":5,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"261578b6-8d55-42f7-943a-38a0c27f8cb7"},"source":["exercise_list = [1, 12.9, \"stressed\", \"bar\", True]\n","\n","for element in exercise_list:\n"," if element == \"stressed\":\n"," word = element\n","\n","reversed_word = \"\"\n","for letter in word:\n"," reversed_word = letter + reversed_word\n","print(word + \" in reverse is: \" + reversed_word)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["stressed in reverse is: desserts\n"]}]},{"cell_type":"markdown","metadata":{"id":"KYfCrRt1rbbd"},"source":["## 2"]},{"cell_type":"code","metadata":{"id":"4_FAlJ5irdkI"},"source":["numbers = [1, 5, -12, 38987, 0, 24]\n","\n","total = 0\n","length = 0\n","for num in numbers:\n"," total = total + num\n"," length = length + 1\n","average = total/length\n","print(average)\n","\n","# Adding strings makes the program fail, you cannot add a string to a number!\n","\n","# Bonus: using builtin functions\n","total = sum(numbers)\n","length = len(numbers)\n","print(total/length)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Eu7Idfj7uyv1"},"source":["## 3"]},{"cell_type":"code","metadata":{"id":"XOKAZfR0uzoy"},"source":["list1 = [1, 2, 3, 5, 1]\n","list2 = [1, 2, 3, 5, \"1\"],\n","list3 = [1, 2, 3, 5, True]\n","list4 = [False, \"True\", \"False\", \"False\"]\n","list5 = [\"banana\", \"bananas\", \"Banana\"]\n","\n","master_list = [list1, list2, list3, list4, list5]\n","for sub_list in master_list:\n"," if sub_list[0] == sub_list[-1]:\n"," print(\"yay\")\n"," else:\n"," print(\"nay\")\n","\n","# You might not expect list 3 to be a \"yay\"\n","# However, in Python False and True are equal to 0 and 1\n","print(False == 0)\n","print(True == 1)\n","\n","# list5 is a nay, because strings are case sensitive\n","# if you want to compare them case-insensitive, first make then all lowercase\n","print(\"banana\".lower() == \"Banana\".lower())"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"nY1j9g2dyoqc"},"source":["## 4"]},{"cell_type":"code","metadata":{"id":"qC2qt-sPypwM"},"source":["# Many optimizations are possible!\n","numbers = list(range(0, 101))\n","\n","for num in numbers:\n"," if num % 3 == 0 and num % 5 == 0:\n"," print(\"FizzBuzz\")\n"," elif num % 3 == 0:\n"," print(\"Fizz\")\n"," elif num % 5 == 0:\n"," print(\"Buzz\")\n"," else:\n"," print(num)"],"execution_count":null,"outputs":[]}]} \ No newline at end of file diff --git a/exercises/3_exercise.ipynb b/exercises/3_exercise.ipynb new file mode 100644 index 0000000..1bd53f7 --- /dev/null +++ b/exercises/3_exercise.ipynb @@ -0,0 +1,76 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Exercise3.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Fj68DvsUltDP" + }, + "source": [ + "# Exercise 3\n", + "\n", + "This is the third hands-on exercise of the 2021 CDH entry level Python course at Utrecht University.\n", + "\n", + "For this exercise, you may choose whether you use your own code from exercise 2, or the code below. The exercise consists of two parts:\n", + "\n", + "1. Refactor the code into functions, so that the logic is reusable and non-repetitive and the purpose of each line of code is clear.\n", + "2. Identify any hard-coded parameters and replace them by constants that you define at the top.\n", + "\n", + "If you finish the exercise quickly, you can try whether you can generalize your functions to make them more powerful. For bonus points!\n", + "\n", + "Good luck!" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "--nCEHqYlmKw", + "outputId": "e1294fd0-7353-4205-e887-f21e6a82719b" + }, + "source": [ + "name = 'Python' # could be your own name\n", + "print('We have a: ' + name[0])\n", + "print('We have a: ' + name[1])\n", + "print('We have a: ' + name[2])\n", + "print('We have a: ' + name[3])\n", + "print('We have a: ' + name[4])\n", + "print('We have a: ' + name[5])\n", + "print('Go go ' + name.upper() + '!!!')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "We have a: P\n", + "We have a: y\n", + "We have a: t\n", + "We have a: h\n", + "We have a: o\n", + "We have a: n\n", + "Go go PYTHON!!!\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/exercises/3_solutions.ipynb b/exercises/3_solutions.ipynb new file mode 100644 index 0000000..030e004 --- /dev/null +++ b/exercises/3_solutions.ipynb @@ -0,0 +1,120 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Exercise3-answer.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Fj68DvsUltDP" + }, + "source": [ + "# Exercise 3\n", + "\n", + "This is the third hands-on exercise of the 2021 CDH entry level Python course at Utrecht University.\n", + "\n", + "For this exercise, you may choose whether you use your own code from exercise 2, or the code below. The exercise consists of two parts:\n", + "\n", + "1. Refactor the code into functions, so that the logic is reusable and non-repetitive and the purpose of each line of code is clear.\n", + "2. Identify any hard-coded parameters and replace them by constants that you define at the top.\n", + "\n", + "Good luck!" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "--nCEHqYlmKw", + "outputId": "e1294fd0-7353-4205-e887-f21e6a82719b" + }, + "source": [ + "name = 'Python' # could be your own name\n", + "print('We have a: ' + name[0])\n", + "print('We have a: ' + name[1])\n", + "print('We have a: ' + name[2])\n", + "print('We have a: ' + name[3])\n", + "print('We have a: ' + name[4])\n", + "print('We have a: ' + name[5])\n", + "print('Go go ' + name.upper() + '!!!')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "We have a: P\n", + "We have a: y\n", + "We have a: t\n", + "We have a: h\n", + "We have a: o\n", + "We have a: n\n", + "Go go PYTHON!!!\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sc8aY_ZrT-Gh", + "outputId": "4b53cf51-4124-495a-e498-2bdb97079f4e" + }, + "source": [ + "ANNOUNCE_START = 'We have a: '\n", + "YELL_START = 'Go go '\n", + "YELL_END = '!!!'\n", + "\n", + "def announce(letter):\n", + " return ANNOUNCE_START + letter\n", + "\n", + "def yell(name):\n", + " return YELL_START + name + YELL_END\n", + "\n", + "def cheerlead(name):\n", + " lines = []\n", + " for letter in name:\n", + " lines.append(announce(letter))\n", + " lines.append(yell(name))\n", + " return '\\n'.join(lines)\n", + "\n", + "print(cheerlead('Python'))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "We have a: P\n", + "We have a: y\n", + "We have a: t\n", + "We have a: h\n", + "We have a: o\n", + "We have a: n\n", + "Go go Python!!!\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/exercises/4_exercise.ipynb b/exercises/4_exercise.ipynb new file mode 100644 index 0000000..71c0c9d --- /dev/null +++ b/exercises/4_exercise.ipynb @@ -0,0 +1,69 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Exercise4.ipynb", + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "bVfoqS2eXjOV" + }, + "source": [ + "# Exercise 4\n", + "\n", + "This is the fourth hands-on exercise of the 2021 CDH entry level Python course at Utrecht University.\n", + "\n", + "The following program takes in a binary string of ones and zeros. It returns `True` if all bits are ones and `False` otherwise. However, there is a bug in the program. In the example below, the program returns `True` while there is clearly a zero in the string.\n", + "\n", + "1. Insert a breakpoint and step through the program to find out what is going wrong. Step multiple times through the program if necessary and do not stop until you fully understand what is going on.\n", + "2. Try to think of at least two ways in which you might solve the problem. Which is the safest way to be *very* sure that the problem will *never* happen again?\n", + "\n", + "If you have written a program with a bug during exercise 2 or 3, you may choose to use that program for this exercise instead." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3N0mSvSyXWe4", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f0c90396-c2e0-45ed-d502-73b8e59f16e4" + }, + "source": [ + "def only_ones(binary_string):\n", + " for bit in binary_string:\n", + " if bit is '0':\n", + " return False\n", + " return True\n", + "\n", + "only_ones('1O1')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": {}, + "execution_count": 1 + } + ] + } + ] +} \ No newline at end of file diff --git a/exercises/4_solutions.ipynb b/exercises/4_solutions.ipynb new file mode 100644 index 0000000..9d4cf2e --- /dev/null +++ b/exercises/4_solutions.ipynb @@ -0,0 +1,85 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Exercise4-answer.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "bVfoqS2eXjOV" + }, + "source": [ + "# Exercise 4\n", + "\n", + "This is the fourth hands-on exercise of the 2021 CDH entry level Python course at Utrecht University.\n", + "\n", + "The following program takes in a binary string of ones and zeros. It returns `True` if all bits are ones and `False` otherwise. However, there is a bug in the program. In the example below, the program returns `True` while there is clearly a zero in the string.\n", + "\n", + "1. Insert a breakpoint and step through the program to find out what is going wrong. Step multiple times through the program if necessary and do not stop until you fully understand what is going on.\n", + "2. Try to think of at least two ways in which you might solve the problem. Which is the safest way to be *very* sure that the problem will *never* happen again?\n", + "\n", + "If you have written a program with a bug during exercise 2 or 3, you may choose to use that program for this exercise instead." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3N0mSvSyXWe4" + }, + "source": [ + "def only_ones(binary_string):\n", + " import pdb; pdb.set_trace()\n", + " for bit in binary_string:\n", + " if bit is '0':\n", + " return False\n", + " return True\n", + "\n", + "only_ones('1O1')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M0OZwOYEb2S1" + }, + "source": [ + "## Answer\n", + "\n", + "The problem is that the string contains the capital letter `'O'` instead of the digit `'0'`. Stepping with the debugger, you would eventually be at the line\n", + "\n", + "```python\n", + " if bit is '0':\n", + "```\n", + "\n", + "and running the command `p bit`, you would find that it prints `'O'`. At this point, you *might* notice that `bit` looks slightly different from the character in the condition.\n", + "\n", + "Otherwise, after going over this line a couple of times and starting to doubt yourself, you might try `p bit is '0'` and you would find that it prints `False`. At this point, you would certainly conclude that the string does not contain a zero, but something that looks similar to a zero.\n", + "\n", + "A solution that might quickly come to mind, is to fix the string. After all, a binary string of ones and zeros is not supposed to contain a capital letter `'O'`. This is indeed a mistake that needs to be fixed.\n", + "\n", + "However, this is not a complete solution. Next time somebody calls `only_ones`, they might again mistakenly include a character other than `'1'` or `'0'`, and the function would give the wrong answer again. This can be prevented by changing the condition:\n", + "\n", + "```python\n", + " if bit is not '1':\n", + "```\n", + "\n", + "Now, if the string contains *anything* other than a `'1'`, whether it's a `'0'` or something else, it will always returns `False`, as it should." + ] + } + ] +} \ No newline at end of file diff --git a/exercises/5_exercise.ipynb b/exercises/5_exercise.ipynb new file mode 100755 index 0000000..1b30355 --- /dev/null +++ b/exercises/5_exercise.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Exercise 5.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyN61O3wVJQW63lhTuCcO6hc"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"vJPGrtLJMaJd"},"source":["# Exercise 5 \n","\n","1. Upload your own dataset, or use `sample_data/california_housing_test.csv`\n","2. Use the example code to read the data from the CSV file\n","3. Print the following items:\n"," 1. The last row\n"," 2. The last column\n"," 3. The first row. If this is a header row, try if you can come up with a way to separate it from the data rows.\n","\n","Use best practises (write functions, document the code, etc)"]},{"cell_type":"code","metadata":{"id":"mIvIGXjLNFTp"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file diff --git a/exercises/5_solutions.ipynb b/exercises/5_solutions.ipynb new file mode 100755 index 0000000..ad5a6e1 --- /dev/null +++ b/exercises/5_solutions.ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","metadata":{"id":"EeUBZA5JNW9j"},"source":["# Exercise 5 - Solutions"]},{"cell_type":"code","execution_count":18,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":898,"status":"ok","timestamp":1637060409073,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"},"user_tz":-60},"id":"gVgEcMXaNHOz","outputId":"21cc2676-fb2e-4519-c1b3-758c6bb945d5"},"outputs":[{"name":"stdout","output_type":"stream","text":["[['Abkhazia', 'Sukhumi'], ['Afghanistan', 'Kabul'], ['Akrotiri and Dhekelia', 'Episkopi Cantonment'], ['Albania', 'Tirana'], ['Algeria', 'Algiers'], ['American Samoa', 'Pago Pago'], ['Andorra', 'Andorra la Vella'], ['Angola', 'Luanda'], ['Anguilla', 'The Valley'], ['Antigua and Barbuda', \"St. John's\"], ['Argentina', 'Buenos Aires'], ['Armenia', 'Yerevan'], ['Aruba', 'Oranjestad'], ['Ascension Island', 'Georgetown'], ['Australia', 'Canberra'], ['Austria', 'Vienna'], ['Azerbaijan', 'Baku'], ['Bahamas', 'Nassau'], ['Bahrain', 'Manama'], ['Bangladesh', 'Dhaka'], ['Barbados', 'Bridgetown'], ['Belarus', 'Minsk'], ['Belgium', 'Brussels'], ['Belize', 'Belmopan'], ['Benin', 'Porto-Novo'], ['Bermuda', 'Hamilton'], ['Bhutan', 'Thimphu'], ['Bolivia', 'Sucre'], ['Bolivia', 'La Paz'], ['Bosnia and Herzegovina', 'Sarajevo'], ['Botswana', 'Gaborone'], ['Brazil', 'Brasília'], ['British Virgin Islands', 'Road Town'], ['Brunei', 'Bandar Seri Begawan'], ['Bulgaria', 'Sofia'], ['Burkina Faso', 'Ouagadougou'], ['Burundi', 'Bujumbura'], ['Cambodia', 'Phnom Penh'], ['Cameroon', 'Yaoundé'], ['Canada', 'Ottawa'], ['Cape Verde', 'Praia'], ['Cayman Islands', 'George Town'], ['Central African Republic', 'Bangui'], ['Chad', \"N'Djamena\"], ['Chile', 'Santiago'], ['China', 'Beijing'], ['Christmas Island', 'Flying Fish Cove'], ['Cocos (Keeling) Islands', 'West Island'], ['Colombia', 'Bogotá'], ['Comoros', 'Moroni'], ['Cook Islands', 'Avarua'], ['Costa Rica', 'San José'], ['Croatia', 'Zagreb'], ['Cuba', 'Havana'], ['Curaçao', 'Willemstad'], ['Cyprus', 'Nicosia'], ['Czech Republic', 'Prague'], [\"Côte d'Ivoire\", 'Yamoussoukro'], ['Democratic Republic of the Congo', 'Kinshasa'], ['Denmark', 'Copenhagen'], ['Djibouti', 'Djibouti'], ['Dominica', 'Roseau'], ['Dominican Republic', 'Santo Domingo'], ['East Timor (Timor-Leste)', 'Dili'], ['Easter Island', 'Hanga Roa'], ['Ecuador', 'Quito'], ['Egypt', 'Cairo'], ['El Salvador', 'San Salvador'], ['Equatorial Guinea', 'Malabo'], ['Eritrea', 'Asmara'], ['Estonia', 'Tallinn'], ['Ethiopia', 'Addis Ababa'], ['Falkland Islands', 'Stanley'], ['Faroe Islands', 'Tórshavn'], ['Federated States of Micronesia', 'Palikir'], ['Fiji', 'Suva'], ['Finland', 'Helsinki'], ['France', 'Paris'], ['French Guiana', 'Cayenne'], ['French Polynesia', 'Papeete'], ['Gabon', 'Libreville'], ['Gambia', 'Banjul'], ['Georgia', 'Tbilisi'], ['Germany', 'Berlin'], ['Ghana', 'Accra'], ['Gibraltar', 'Gibraltar'], ['Greece', 'Athens'], ['Greenland', 'Nuuk'], ['Grenada', \"St. George's\"], ['Guam', 'Hagåtña'], ['Guatemala', 'Guatemala City'], ['Guernsey', 'St. Peter Port'], ['Guinea', 'Conakry'], ['Guinea-Bissau', 'Bissau'], ['Guyana', 'Georgetown'], ['Haiti', 'Port-au-Prince'], ['Honduras', 'Tegucigalpa'], ['Hungary', 'Budapest'], ['Iceland', 'Reykjavík'], ['India', 'New Delhi'], ['Indonesia', 'Jakarta'], ['Iran', 'Tehran'], ['Iraq', 'Baghdad'], ['Ireland', 'Dublin'], ['Isle of Man', 'Douglas'], ['Israel', 'Jerusalem'], ['Italy', 'Rome'], ['Jamaica', 'Kingston'], ['Japan', 'Tokyo'], ['Jersey', 'St. Helier'], ['Jordan', 'Amman'], ['Kazakhstan', 'Astana'], ['Kenya', 'Nairobi'], ['Kiribati', 'Tarawa'], ['Kosovo', 'Pristina'], ['Kuwait', 'Kuwait City'], ['Kyrgyzstan', 'Bishkek'], ['Laos', 'Vientiane'], ['Latvia', 'Riga'], ['Lebanon', 'Beirut'], ['Lesotho', 'Maseru'], ['Liberia', 'Monrovia'], ['Libya', 'Tripoli'], ['Liechtenstein', 'Vaduz'], ['Lithuania', 'Vilnius'], ['Luxembourg', 'Luxembourg'], ['Macedonia', 'Skopje'], ['Madagascar', 'Antananarivo'], ['Malawi', 'Lilongwe'], ['Malaysia', 'Kuala Lumpur'], ['Maldives', 'Malé'], ['Mali', 'Bamako'], ['Malta', 'Valletta'], ['Marshall Islands', 'Majuro'], ['Mauritania', 'Nouakchott'], ['Mauritius', 'Port Louis'], ['Mexico', 'Mexico City'], ['Moldova', 'Chisinau'], ['Monaco', 'Monaco'], ['Mongolia', 'Ulaanbaatar'], ['Montenegro', 'Podgorica'], ['Montserrat', 'Plymouth'], ['Morocco', 'Rabat'], ['Mozambique', 'Maputo'], ['Myanmar', 'Naypyidaw'], ['Nagorno-Karabakh Republic', 'Stepanakert'], ['Namibia', 'Windhoek'], ['Nauru', 'Yaren'], ['Nepal', 'Kathmandu'], ['Netherlands', 'Amsterdam'], ['New Caledonia', 'Nouméa'], ['New Zealand', 'Wellington'], ['Nicaragua', 'Managua'], ['Niger', 'Niamey'], ['Nigeria', 'Abuja'], ['Niue', 'Alofi'], ['Norfolk Island', 'Kingston'], ['North Korea', 'Pyongyang'], ['Northern Cyprus', 'Nicosia'], ['United Kingdom Northern Ireland', 'Belfast'], ['Northern Mariana Islands', 'Saipan'], ['Norway', 'Oslo'], ['Oman', 'Muscat'], ['Pakistan', 'Islamabad'], ['Palau', 'Ngerulmud'], ['Palestine', 'Jerusalem'], ['Panama', 'Panama City'], ['Papua New Guinea', 'Port Moresby'], ['Paraguay', 'Asunción'], ['Peru', 'Lima'], ['Philippines', 'Manila'], ['Pitcairn Islands', 'Adamstown'], ['Poland', 'Warsaw'], ['Portugal', 'Lisbon'], ['Puerto Rico', 'San Juan'], ['Qatar', 'Doha'], ['Republic of China (Taiwan)', 'Taipei'], ['Republic of the Congo', 'Brazzaville'], ['Romania', 'Bucharest'], ['Russia', 'Moscow'], ['Rwanda', 'Kigali'], ['Saint Barthélemy', 'Gustavia'], ['Saint Helena', 'Jamestown'], ['Saint Kitts and Nevis', 'Basseterre'], ['Saint Lucia', 'Castries'], ['Saint Martin', 'Marigot'], ['Saint Pierre and Miquelon', 'St. Pierre'], ['Saint Vincent and the Grenadines', 'Kingstown'], ['Samoa', 'Apia'], ['San Marino', 'San Marino'], ['Saudi Arabia', 'Riyadh'], ['Scotland', 'Edinburgh'], ['Senegal', 'Dakar'], ['Serbia', 'Belgrade'], ['Seychelles', 'Victoria'], ['Sierra Leone', 'Freetown'], ['Singapore', 'Singapore'], ['Sint Maarten', 'Philipsburg'], ['Slovakia', 'Bratislava'], ['Slovenia', 'Ljubljana'], ['Solomon Islands', 'Honiara'], ['Somalia', 'Mogadishu'], ['Somaliland', 'Hargeisa'], ['South Africa', 'Pretoria'], ['South Georgia and the South Sandwich Islands', 'Grytviken'], ['South Korea', 'Seoul'], ['South Ossetia', 'Tskhinvali'], ['South Sudan', 'Juba'], ['Spain', 'Madrid'], ['Sri Lanka', 'Sri Jayawardenapura Kotte'], ['Sudan', 'Khartoum'], ['Suriname', 'Paramaribo'], ['Swaziland', 'Mbabane'], ['Sweden', 'Stockholm'], ['Switzerland', 'Bern'], ['Syria', 'Damascus'], ['São Tomé and Príncipe', 'São Tomé'], ['Tajikistan', 'Dushanbe'], ['Tanzania', 'Dodoma'], ['Thailand', 'Bangkok'], ['Togo', 'Lomé'], ['Tonga', 'Nukuʻalofa'], ['Transnistria', 'Tiraspol'], ['Trinidad and Tobago', 'Port of Spain'], ['Tristan da Cunha', 'Edinburgh of the Seven Seas'], ['Tunisia', 'Tunis'], ['Turkey', 'Ankara'], ['Turkmenistan', 'Ashgabat'], ['Turks and Caicos Islands', 'Cockburn Town'], ['Tuvalu', 'Funafuti'], ['Uganda', 'Kampala'], ['Ukraine', 'Kiev'], ['United Arab Emirates', 'Abu Dhabi'], ['United Kingdom; England', 'London'], ['United States', 'Washington, D.C.'], ['United States Virgin Islands', 'Charlotte Amalie'], ['Uruguay', 'Montevideo'], ['Uzbekistan', 'Tashkent'], ['Vanuatu', 'Port Vila'], ['Vatican City', 'Vatican City'], ['Venezuela', 'Caracas'], ['Vietnam', 'Hanoi'], ['Wales', 'Cardiff'], ['Wallis and Futuna', 'Mata-Utu'], ['Western Sahara', 'El Aaiún'], ['Yemen', 'Sanaá'], ['Zambia', 'Lusaka'], ['Zimbabwe', 'Harare']]\n"]}],"source":["# please note, the solutions can vary depending on your own dataset\n","\n","import csv\n","\n","FILENAME = 'capitals.csv'\n","\n","def read_data(filename):\n"," ''' Read data from a CSV file,\n"," Return a list (rows) of lists (columns)\n"," '''\n"," with open(filename) as csv_file:\n"," reader = csv.reader(csv_file, delimiter=',')\n"," return list(reader)\n","\n","\n","def get_column(data, column_index):\n"," ''' \n"," Return column at index as a list\n"," '''\n"," entire_column = []\n"," for row in data:\n"," column = row[column_index]\n"," entire_column.append(column)\n"," return entire_column\n","\n","def separate_headers(data):\n"," '''\n"," Separates headers from the data\n"," Returns both as separate lists\n"," '''\n"," headers = data[0]\n"," dataset = data[1:]\n"," return [headers, dataset]\n","\n","def inspect_data(filename):\n"," data = read_data(filename)\n"," \n"," last_row = data[-1]\n"," last_column = get_column(data, -1)\n"," first_row = data[0]\n","\n"," print(last_row)\n"," print(last_column)\n"," print(first_row)\n","\n"," # let's assume your csv file has headers\n"," headers, dataset = separate_headers(data)\n","\n"," # the first row with actual data\n"," print(actual_data[0])\n","\n","inspect_data(FILENAME)"]}],"metadata":{"colab":{"collapsed_sections":[],"name":"Exercise 5 - Solutions.ipynb","provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"name":"python"}},"nbformat":4,"nbformat_minor":0} diff --git a/teaching materials/0_Introduction.ipynb b/teaching materials/0_Introduction.ipynb new file mode 100755 index 0000000..1d1c268 --- /dev/null +++ b/teaching materials/0_Introduction.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Introduction.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyO6FzkfpZhr99ftIowKgUps"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"_Hnez8dT2xWz"},"source":["# CDH course \"Programming in Python\"\n","## Who are we?\n","\n","- Julian Gonggrijp & Jelte van Boheemen\n","- Developers at the Digital Humanities Lab\n","\n","\n","## Who are you?\n","- Entry level course, no previous experience required\n","- If you have some experience, parts of the course may be familiar. Challenge yourself with some difficult exercises!\n","\n","## Goals\n","- Introduce the basics of the Python programming language\n","- Write simple computer programs\n","- Simple data analysis on your own data\n","- Teach good practises that you can apply to all your future programming, and make you a self-reliant programmer\n","\n","\n","## Colab Notebooks\n","- Runs Python code in your web browser not installing required\n","- If you do not have a Google account, please make one\n","- Presenting straight from Colab, so you can follow our example\n","\n","## Course materials\n","https://tinyurl.com/cdh-python\n","\n","## Please interupt if you have any questions!\n","Let's get started! [To part 1](https://colab.research.google.com/drive/1Ip0vwlf22MNVWLJceCgrIMadtOLSlxdO?usp=sharing)"]}]} \ No newline at end of file diff --git a/teaching materials/1_1_python_basics.ipynb b/teaching materials/1_1_python_basics.ipynb new file mode 100755 index 0000000..0aab5c6 --- /dev/null +++ b/teaching materials/1_1_python_basics.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Day 1, part 1 - Python Basics.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyOYJ9fb6wAUFQzhBVQ8WcHi"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"8eZi1PYt1p-i"},"source":["# 1 - Python basics\n","\n"]},{"cell_type":"markdown","metadata":{"id":"mM1V09O2aBHD"},"source":["## 1.1 - Hello, world\n","- Run a simple program\n","- Output to the screen\n","- Lines starting with `#` are comments, they are ignored by the program"]},{"cell_type":"code","metadata":{"id":"gVfjgWZOY_fs","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636967556322,"user_tz":-60,"elapsed":230,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"e7d8a2d4-436f-4fc6-cbf9-001d7d8d09a5"},"source":["# Let's make the program say hello this is ignored\n","print(\"Hello, world!\")\n","print(\"Whatever you want\")"],"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["Whatever you want\n"]}]},{"cell_type":"markdown","metadata":{"id":"B-9asW2lw5Fk"},"source":["## Exercise 1\n","- Write your own hello world program and execute it"]},{"cell_type":"markdown","metadata":{"id":"AWR0hSuzbEmR"},"source":["## 1.2 - Types\n","\n","* Number\n"," * Integer: `0`, `1`, `2`, `-1`, `122803`\n"," * Floating point (decimal): `0.0`, `1.1`, `2.122803`\n","\n","\n","\n","\n","\n","\n"]},{"cell_type":"markdown","metadata":{"id":"mTn3zLre4U9S"},"source":[""]},{"cell_type":"code","metadata":{"id":"rJLSAQCr4KYw"},"source":["0\n","1\n","2\n","-1\n","\n","0.12\n","1.2\n","-0.1"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zSPHuo44QPWM"},"source":["* String (text)\n"," * Double quote: `\"b\"`, `\"Who says I can't use an apostrophe?\"`, `\"\"` (empty string)\n"," * Single quote: `'a'`, `'here is some text'`, `'1'`, `'2.3'`"]},{"cell_type":"code","metadata":{"id":"ScPTgkiG4bWc"},"source":["\"this is a string\"\n","'this is a string'\n","\n","'1'\n","1\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"L04bFM7iQQOb"},"source":["* Boolean\n"," * `True` or `False`"]},{"cell_type":"code","metadata":{"id":"fjpthJ_e41j8"},"source":["True\n","False\n","\n","\"True\"\n","print(True)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"lsTxBVVedyqS"},"source":["## 1.3 - Operators\n","\n","- Arithmetic\n"," - `1 + 1`\n"," - `2 - 1`\n"," - `3 * 4`\n"," - `10 / 2`"]},{"cell_type":"code","metadata":{"id":"WaBTyfRZPLUX"},"source":["10 / 2"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"VfhH8_1cPLlp"},"source":["- String operators\n"," - Concatenation: `\"Hello \" + \"world!\"`\n"," - Repetition: `\"Hello\" * 3`\n"]},{"cell_type":"code","metadata":{"id":"fF924rHaPP8P","colab":{"base_uri":"https://localhost:8080/","height":35},"executionInfo":{"status":"ok","timestamp":1636968024556,"user_tz":-60,"elapsed":224,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"30e5a678-e06b-46be-b569-ee4a41ddf2bd"},"source":["\"hello \" + \"world\"\n","\"hello \" * 3"],"execution_count":14,"outputs":[{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'hello hello hello '"]},"metadata":{},"execution_count":14}]},{"cell_type":"markdown","metadata":{"id":"6E5l_WAXPTcQ"},"source":["- Comparison\n"," - Evaluates to `True` or `False`\n"," - Equals `1 == 1`, `\"hello\" == \"hello\"`\n"," - Not equals `1 != 2`, `\"hello\" != \"goodbye\"`\n"," - Greater/smaller than `1 < 2`, `\"b\" > \"a\"`"]},{"cell_type":"code","metadata":{"id":"eCsTzljPPXc9"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"XKeTtLOGQpks"},"source":["## 1.4 - Variables\n","* \"Container\" for values\n"," * `a = 1`\n","* Can be reassigned to a new value\n"," * `a = 2`\n","* In Python, no strict types\n"," * `a = 1`\n"," * `a = \"hello\"`\n"," \n"," \n","\n","\n"]},{"cell_type":"code","metadata":{"id":"qaCMxJMpRvAP","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636968322355,"user_tz":-60,"elapsed":308,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"663281a9-51a3-436e-a73c-daa68d508a6e"},"source":[""],"execution_count":28,"outputs":[{"output_type":"stream","name":"stdout","text":["hello\n"]}]},{"cell_type":"markdown","metadata":{"id":"J5L_PjLYlhgR"},"source":["* Can also assign another variable, or an expression"]},{"cell_type":"code","metadata":{"id":"MI_JjAeDlh-w","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636968450575,"user_tz":-60,"elapsed":302,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"753f519e-b00a-4bfe-b32a-4663741f4365"},"source":["a = 1\n","b = a\n","print(b)\n","\n","a = 2\n","\n","print(b)\n","print(a)"],"execution_count":35,"outputs":[{"output_type":"stream","name":"stdout","text":["1\n","1\n","2\n"]}]},{"cell_type":"code","metadata":{"id":"ko6EZrIi9lVQ","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636968394959,"user_tz":-60,"elapsed":313,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"797fde4e-7197-4f85-9c66-0487cb2fe5a2"},"source":["b = 1\n","c = b + 1\n","print(c)"],"execution_count":32,"outputs":[{"output_type":"stream","name":"stdout","text":["2\n"]}]},{"cell_type":"code","metadata":{"id":"zfqHHZ129ld3","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636968419639,"user_tz":-60,"elapsed":311,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"309c73ae-e9f4-4383-fe92-81ce31dc0e71"},"source":["d = 5\n","e = 4\n","g = 8\n","f = d + e + 5 - g\n","print(f)"],"execution_count":33,"outputs":[{"output_type":"stream","name":"stdout","text":["6\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"QgVk2G187AFx","executionInfo":{"status":"ok","timestamp":1636968436948,"user_tz":-60,"elapsed":303,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"a841ec31-3376-4347-bf89-621917b9d0e4"},"source":["print(a)"],"execution_count":34,"outputs":[{"output_type":"stream","name":"stdout","text":["2\n"]}]},{"cell_type":"markdown","metadata":{"id":"QrJK8upSR9aR"},"source":["* Naming\n"," * Rules\n"," * contains only `a-z`, `A-Z`, `0-9`, `_`\n"," * does not start with a number\n"," * Conventions\n"," * meaningful name: `age = 33`, not `a = 30`\n"," * for multiple words, \"snake case\" : `age_in_years`, not `ageInYears`\n"," * either all lowercase `age` or all uppercase `AGE` (for constants). \n"," * Case sensitive\n"," * `age_in_years != ageInYears`\n"]},{"cell_type":"code","metadata":{"id":"Tp64GSb1T4xX","executionInfo":{"status":"ok","timestamp":1636968676150,"user_tz":-60,"elapsed":312,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}}},"source":[""],"execution_count":36,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"wv1N7FP-T5QQ"},"source":["## 1.5 - Lists\n","- Collection of values\n"," - `[1, 2, 3]`, `[\"hello\", \"world\"]`, `[]` (empty list)\n"," - Mixed types possible: `[\"hello\", 1, False]`"]},{"cell_type":"code","metadata":{"id":"0GWOcQgyUraG"},"source":["age = 33\n","age_list = [12, 20, age]\n","print(age_list)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"dpkcrvAe8Fqn","executionInfo":{"status":"ok","timestamp":1636968804868,"user_tz":-60,"elapsed":309,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"fbf409d8-ba04-4197-e4d9-46b7e660b7f4"},"source":["age = 33\n","age_someone_else = 34\n","list_of_ages = [age, age_someone_else, True, \"hello\", 2.0]\n","print(list_of_ages)"],"execution_count":40,"outputs":[{"output_type":"stream","name":"stdout","text":["[33, 34, True, 'hello', 2.0]\n"]}]},{"cell_type":"markdown","metadata":{"id":"YLKw03brU4It"},"source":["- Accessing list elements\n"," - *index*: from `0` to `length of list - 1`\n"," - Acces by index: `list[0]`, `list[1]`, `list[-1]` (last element)"]},{"cell_type":"code","metadata":{"id":"RZVOomcAVHej","colab":{"base_uri":"https://localhost:8080/","height":69},"executionInfo":{"status":"ok","timestamp":1636968936044,"user_tz":-60,"elapsed":301,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"e13c1d57-297d-4c8c-98f1-97ddef09a0cd"},"source":["word_list = [\"hello\", \"world\"]\n","print(word_list[0])\n","world = word_list[1]\n","print(world)\n","\n","word_list[-1]"],"execution_count":46,"outputs":[{"output_type":"stream","name":"stdout","text":["hello\n","world\n"]},{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'world'"]},"metadata":{},"execution_count":46}]},{"cell_type":"markdown","metadata":{"id":"Vd3kfXOeVfYV"},"source":[" - Assign like any variable"]},{"cell_type":"code","metadata":{"id":"SEkFg6knVivk"},"source":["word_list = [\"hello\", \"world\"]\n","word_list[1] = \"class\"\n","print(word_list)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"XSTx1cnGV9zM"},"source":[" - Add elements: `list + list` or `list.append(element)`\n"]},{"cell_type":"code","metadata":{"id":"ZOIwkafiWAbZ","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636901624757,"user_tz":-60,"elapsed":220,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"c578a3db-13e9-4d3a-ac44-a04c752c5cfa"},"source":["list1 = [\"hello\", \"world\"]\n","list1 = list1 + [\"!\"]\n","print(list1)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["['hello', 'world', '!']\n"]}]},{"cell_type":"code","metadata":{"id":"cHgMRhnl9F46"},"source":["list1 = ['hello', 'world']\n","list2 = [1, 2]\n","\n","big_list = list1 + list2\n","print(big_list)\n","\n","big_list + [\"another word\"]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"3N5kz5yO8Gy0"},"source":["list2 = [1, 2]\n","list2.append(3)\n","print(list2)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"VC8TZusM9cxJ","executionInfo":{"status":"ok","timestamp":1636969126104,"user_tz":-60,"elapsed":245,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"aad4c92d-0f2d-40d8-d481-19aaf40e9c1e"},"source":["list2.append([\"hello\", \"world\"])\n","print(list2)\n","print(list2[-1])"],"execution_count":53,"outputs":[{"output_type":"stream","name":"stdout","text":["[1, 2, 'hello', ['hello', 'world'], ['hello', 'world']]\n","['hello', 'world']\n"]}]},{"cell_type":"code","metadata":{"id":"C6LVb5nL8G8q"},"source":["list3 = [True, False]\n","list3 = list3 + False\n","print(list3)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uIdQ4lOfWnW4"},"source":[" - Accessing multiple values (\"slicing\")\n"," - `list[begin:end]`\n"," - *end* is **not** included\n"," - `list[begin:]` everything from begin to the end of the list (end is included)"]},{"cell_type":"code","metadata":{"id":"OpKtOpn7WtUn"},"source":["list1 = [\"hello\", \"world\", \"!\"]\n","hello_world = list1[0:2]\n","print(hello_world)\n","\n","list1[0:2] = [\"hello\", \"class\"]\n","print(list1)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":35},"id":"0ZhWcG_l92jE","executionInfo":{"status":"ok","timestamp":1636969327342,"user_tz":-60,"elapsed":230,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"3c050851-ec5d-4d08-f3c2-dc0dff2a8602"},"source":["list1 = [\"hello\", \"world\", \"!\", \"hello\", \"world\", \"!\", \"hello\", \"world\", \"!\", \"hello\", \"world\", \"!\"]\n"],"execution_count":63,"outputs":[{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'!'"]},"metadata":{},"execution_count":63}]},{"cell_type":"markdown","metadata":{"id":"cre-ptY-YTQO"},"source":["## 1.6 - Loops\n","- Go through a list, run some code for each element"]},{"cell_type":"code","metadata":{"id":"HmLA3LAMYnIK","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636969479273,"user_tz":-60,"elapsed":298,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"6e9f8331-9722-435e-c193-a8f44f322fdb"},"source":["list1 = [\"hello\", \"world\", \"!\"]\n","\n","\n","for value in list1:\n"," print(value)"],"execution_count":66,"outputs":[{"output_type":"stream","name":"stdout","text":["hello\n","world\n","!\n"]}]},{"cell_type":"code","metadata":{"id":"7EcmE2hr8vqw","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636969476108,"user_tz":-60,"elapsed":282,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"011d217e-85b1-437f-e137-293ac74ca5e9"},"source":["for word in list1:\n"," print(word)"],"execution_count":65,"outputs":[{"output_type":"stream","name":"stdout","text":["hello\n","world\n","!\n","hello\n","world\n","!\n","hello\n","world\n","!\n","hello\n","world\n","!\n"]}]},{"cell_type":"markdown","metadata":{"id":"KMq5PWx1Y-pM"},"source":["- Multiple lines of code go in an indented block"]},{"cell_type":"code","metadata":{"id":"9BVjK7BgZBrK"},"source":["list1 = [\"hello\", \"world\", \"!\"]\n","\n","for word in list1:\n"," new_word = word + \"_new\"\n"," print(\"printing the new word...\")\n"," print(new_word)\n"," print(new_word)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Y7ZQZ4koiU2a"},"source":["- Loop over string\n"]},{"cell_type":"code","metadata":{"id":"IBbyqfM5iYhB","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636969590895,"user_tz":-60,"elapsed":268,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"83cd316d-0f8f-43c9-83f0-d29b2c4434e2"},"source":["for letter in \"Hello, world!\":\n"," print(letter)"],"execution_count":69,"outputs":[{"output_type":"stream","name":"stdout","text":["H\n","e\n","l\n","l\n","o\n",",\n"," \n","w\n","o\n","r\n","l\n","d\n","!\n"]}]},{"cell_type":"markdown","metadata":{"id":"cuT-iJjXi_Oh"},"source":["- List of lists\n"]},{"cell_type":"code","metadata":{"id":"6eMzZgoKjBA5"},"source":["list_of_lists = [[\"hello\", \"world\"], [\"this\", \"is\", \"jelte\"]]\n","\n","for sublist in list_of_lists:\n"," print(sublist)\n"," for word in sublist:\n"," print(word)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"JcBr9iTs_mGc"},"source":["list_of_lists = [[\"hello\", \"world\"], [\"this\", \"is\", \"jelte\"]]\n","\n","\n","for sublist in list_of_lists:\n"," print(sublist)\n"," for word in sublist:\n"," print(word)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"K3HyYxuQZ4Dt"},"source":["## 1.7 - Conditionals\n","- Check if some statement is the case\n","- Run code if `True`"]},{"cell_type":"code","metadata":{"id":"tacCSBrDaFL2","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1636969842284,"user_tz":-60,"elapsed":245,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"4692f7a9-af1a-458b-d101-70425595d59a"},"source":["a = 4\n","\n","if a == 2:\n"," print(\"its 2\")\n","\n","if a == 4:\n"," print(\"a is four\")\n","\n","if a != 2:\n"," print(\"this will not get printed\")\n","\n","if a > 2:\n"," print(\"a is greater than 2\")"],"execution_count":77,"outputs":[{"output_type":"stream","name":"stdout","text":["a is four\n"]}]},{"cell_type":"markdown","metadata":{"id":"-E_oCGVcauCO"},"source":["- Add an `else` clause"]},{"cell_type":"code","metadata":{"id":"I3_CLdl_ay_0"},"source":["a = 4\n","\n","if a == 2:\n"," print(\"a is two\")\n","else:\n"," print(\"a is not two\")"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9G4hNJ7faeCR"},"source":["- Combining with lists"]},{"cell_type":"code","metadata":{"id":"8x-OiDOzakty"},"source":["list1 = [1, 2, 3, 4, 5]\n","\n","for element in list1:\n"," if element == 3:\n"," print(\"we have found 3!\")\n"," else:\n"," print(\"sadly, not 3\")"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"WqLkE5_RwG4R"},"source":["- `elif` (else-if) for multiple `if` statements with shared `else`\n","\n"]},{"cell_type":"code","metadata":{"id":"cov_FjfzwPES"},"source":["x = 6\n","\n","if x == 3:\n"," print(\"x is 3\")\n","elif x == 2:\n"," print(\"x is 2\")\n","elif x == 5:\n"," print('nope')\n","else:\n"," print(\"x is neither 2 nor 3\")"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"oTE5rxAGmr_6"},"source":["## 1.8 - Logical operators\n","- condition1 `and` condition2\n","- condition1 `or` condition2\n","- `not`"]},{"cell_type":"code","metadata":{"id":"b5XWif5oBZjD"},"source":["not True"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Zu5vgDPkmx8N"},"source":["a = 4\n","if a > 2 and a < 5:\n"," print(\"a between 2 and 5\")\n","\n","word = \"foobar\"\n","if word == \"foo\" and word == \"bar\":\n"," print(\"foo or bar\")\n","\n","age = 25\n","if age > 20 and not age == 21:\n"," print(\"older than 20, but not 21\")"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3BmPB2ubwDME"},"source":["## Exercise 2\n","- [To the exercise notebook](https://colab.research.google.com/drive/1P68w4Ewzptp5KLo5EMMowKOSywFMzoc8?usp=sharing)"]}]} \ No newline at end of file diff --git a/teaching materials/1_2_functions.ipynb b/teaching materials/1_2_functions.ipynb new file mode 100644 index 0000000..4f98bca --- /dev/null +++ b/teaching materials/1_2_functions.ipynb @@ -0,0 +1,296 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Functions.ipynb", + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "id": "aK5-pZuG6KGP", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "137f10ac-c515-4cb2-cec8-197d1705f3a8" + }, + "source": [ + "print(' ' * 3 + '*' * 1)\n", + "print(' ' * 2 + '*' * 3)\n", + "print(' ' * 1 + '*' * 5)\n", + "print(' ' * 0 + '*' * 7)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " *\n", + " ***\n", + "*****\n", + "*******\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8VJ7_bvx_B3P", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "90649abf-d87c-48b7-bc4a-4f81fe819e99" + }, + "source": [ + "def centered_stars(center, width):\n", + " half_width = width // 2\n", + " padding = center - half_width\n", + " return ' ' * padding + '*' * width\n", + "\n", + "centered_stars(3, 5)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "' *****'" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GjGFmJU3AX9U", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2d7ce865-f64c-400d-fb99-80d5967820dc" + }, + "source": [ + "print(centered_stars(2, 1))\n", + "print(centered_stars(2, 3))\n", + "print(centered_stars(2, 5))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " *\n", + " ***\n", + "*****\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jwfbo50wBLfF", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "059e8dbc-39e4-4dea-cbc4-c94c2d41b5d0" + }, + "source": [ + "widths = list(range(1, 6, 2))\n", + "for width in widths:\n", + " print(centered_stars(2, width))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " *\n", + " ***\n", + "*****\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NOo4cSHXDsUM", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 789 + }, + "outputId": "2e65679e-db57-4b4a-d33a-a7a1cfdc1073" + }, + "source": [ + "def triangle(height, center=None):\n", + " import pdb; pdb.set_trace()\n", + " widths = list(range(1, height * 2, 2))\n", + " if center == None:\n", + " center = height - 1\n", + " lines = []\n", + " for width in widths:\n", + " lines.append(centered_stars(center, width))\n", + " return '\\n'.join(lines)\n", + "\n", + "triangle(3)" + ], + "execution_count": null, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "> (3)triangle()\n", + "-> widths = list(range(1, height * 2, 2))\n", + "(Pdb) n\n", + "> (4)triangle()\n", + "-> if center == None:\n", + "(Pdb) \n", + "> (5)triangle()\n", + "-> center = height - 1\n", + "(Pdb) \n", + "> (6)triangle()\n", + "-> lines = []\n", + "(Pdb) \n", + "> (7)triangle()\n", + "-> for width in widths:\n", + "(Pdb) \n", + "> (8)triangle()\n", + "-> lines.append(centered_stars(center, width))\n", + "(Pdb) s\n", + "--Call--\n", + "> (1)centered_stars()\n", + "-> def centered_stars(center, width):\n", + "(Pdb) \n", + "> (2)centered_stars()\n", + "-> half_width = width // 2\n", + "(Pdb) p width\n", + "1\n", + "(Pdb) p center\n", + "2\n", + "(Pdb) n\n", + "> (3)centered_stars()\n", + "-> padding = center - half_width\n", + "(Pdb) p half_width\n", + "0\n", + "(Pdb) n\n", + "> (4)centered_stars()\n", + "-> return ' ' * padding + '*' * width\n", + "(Pdb) p padding\n", + "2\n", + "(Pdb) p ' ' * padding + '*' * width\n", + "' *'\n", + "(Pdb) u\n", + "> (8)triangle()\n", + "-> lines.append(centered_stars(center, width))\n", + "(Pdb) c\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "' *\\n ***\\n*****'" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rpqUgX6-foDt", + "outputId": "5c7da72b-38e4-4897-ffac-a15163d4aa50" + }, + "source": [ + "TRIANGLE_ALIGN_COLUMN = 0\n", + "\n", + "print(triangle(height=15, center=TRIANGLE_ALIGN_COLUMN))\n", + "\n", + "print(triangle(6, TRIANGLE_ALIGN_COLUMN))\n", + "\n", + "print(triangle(20, TRIANGLE_ALIGN_COLUMN))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "*\n", + "***\n", + "*****\n", + "*******\n", + "*********\n", + "***********\n", + "*************\n", + "***************\n", + "*****************\n", + "*******************\n", + "*********************\n", + "***********************\n", + "*************************\n", + "***************************\n", + "*****************************\n", + "*\n", + "***\n", + "*****\n", + "*******\n", + "*********\n", + "***********\n", + "*\n", + "***\n", + "*****\n", + "*******\n", + "*********\n", + "***********\n", + "*************\n", + "***************\n", + "*****************\n", + "*******************\n", + "*********************\n", + "***********************\n", + "*************************\n", + "***************************\n", + "*****************************\n", + "*******************************\n", + "*********************************\n", + "***********************************\n", + "*************************************\n", + "***************************************\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/teaching materials/2_1_documentation_reading_data.ipynb b/teaching materials/2_1_documentation_reading_data.ipynb new file mode 100755 index 0000000..2af95be --- /dev/null +++ b/teaching materials/2_1_documentation_reading_data.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Day 2 - Part 1: Documentation and reading data","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"gQG7nLEn_SET"},"source":["# Readable code and documentation\n","> “Code is more often read than written.”\n","> — Guido van Rossum (Python creator) \n","\n","Readable code is very important. For yourself, and for others.\n","\n","Good practises:\n","- Use descriptive names for variables, arguments and functions.\n","- Keep functions small.\n","- Make sure functions have only one task.\n","- Add comments if the code is difficult to follow.\n","- Add *docstrings* to functions to describe what they do.\n","\n","\n","\n","\n"]},{"cell_type":"code","metadata":{"id":"Pu7jeXynAdRi"},"source":["def do_something(a, y):\n"," z = [a,y]\n"," for e in z:\n"," e[-1] = e[-1] + 1\n"," print(e)\n","\n","do_something([1,2,3], [4,5,6])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"DHYIMRFqAM8f"},"source":["BY = 2\n","\n","def create_nested_list(list1, list2):\n"," return [list1, list2]\n","\n","def increment_last_el(list):\n"," list[-1] = list[-1] + BY\n"," return list\n","\n","def increment_last_element(list1, list2):\n"," \"\"\"\n"," Take two lists of numbers.\n"," Return them with incremented last value\n"," \"\"\"\n"," nested_list = create_nested_list(list1, list2)\n"," for sublist in nested_list:\n"," increment_last_el(sublist)\n"," return nested_list\n","\n","import math\n","answer = increment_last_element([1.0, \"a\"], [3, 5])\n","answer"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"J-BlN7p1F-cG"},"source":["# Reading data\n","\n","- Write a program that works with an existing data source\n","- In this course, we will use *comma separated value (CSV)* files"]},{"cell_type":"markdown","metadata":{"id":"z3QpNJC3LwCg"},"source":["## Upload and read a file"]},{"cell_type":"code","metadata":{"id":"oYF19uB-H2qW"},"source":["FILENAME = 'capitals.csv'\n","\n","def read_data():\n"," with open(FILENAME) as csv_file:\n"," return csv_file.read()\n","\n","print(read_data())"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"DTINkPwrIXHb"},"source":["## Use a library\n","- People before you have read CSV files\n","- No need to reinvent the wheel\n","- The `csv` module is part of the *standard library*, it comes with every Python installation\n","\n","\n"]},{"cell_type":"code","metadata":{"id":"591dGJTtJCz6"},"source":["import csv\n","\n","FILENAME = 'nogc_vietnamese.csv'\n","\n","def read_data():\n"," with open(FILENAME) as csv_file:\n"," reader = csv.reader(csv_file, delimiter=',')\n"," return list(reader)\n","\n","data = read_data()\n","data[10][0]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"fmaAa4v6mVaU"},"source":["- You can unpack a row into multiple variables, this is useful if you know what the columns are"]},{"cell_type":"code","metadata":{"id":"-jPkiLGdmbxA","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1637055787965,"user_tz":-60,"elapsed":216,"user":{"displayName":"Jelte van Boheemen","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GheLnTZKvy8P4D-Jl7hmsBhPGYHfOa1IeSLeP-Ynw=s64","userId":"10826796817228738014"}},"outputId":"917a53fd-f871-4b40-ab29-bda3f3512b91"},"source":["data = read_data()\n","\n","for row in data:\n"," first_column, *rest = row\n"],"execution_count":34,"outputs":[{"output_type":"stream","name":"stdout","text":["vietnamese\n","english natural_unnatural south middle north\n","Nam nhìn nó\n","Nam sees himself N 0.16 0.10 0.26\n","Nam nhìn nó\n","Nam sees himself UN 0.57 0.53 0.46\n","Nam nhìn mình\n","Nam sees himself’ N 0.37 0.17 0.33\n","Nam nhìn mình\n","Nam sees himself’ UN 0.30 0.48 0.44\n","Nam tự nhìn nó \n","Nam looks at himself N 0.50 0.50 0.56\n","Nam tự nhìn nó \n","Nam looks at himself UN 0.16 0.26 0.26\n","Tôi nhìn mình\n","I see myself N 0.49 0.36 0.40\n","Tôi nhìn mình\n","I see myself UN 0.49 0.37 0.38\n","Mày nhìn mình đấy à?\n","Are you looking at yourself? N 0.18 0.32 0.55\n","Mày nhìn mình đấy à?\n","Are you looking at yourself? UN 0.54 0.42 0.30\n","Chúng tôi khen mình\n","We praise ourselves N 0.41 0.25 0.37\n","Chúng tôi khen mình\n","We praise ourselves UN 0.41 0.49 0.28\n","Chúng nó khen mình\n","They praise themselves N 0.10 0.20 0.30\n","Chúng nó khen mình\n","They praise themselves UN 0.70 0.45 0.39\n","Chúng mày khen mình đó hả?\n","Are you (all) praising yourselves? N 0.20 0.19 0.31\n","Chúng mày khen mình đó hả?\n","Are you (all) praising yourselves? UN 0.70 0.49 0.39\n","Ho khen mình\n","They praise themselves N 0.16 0.17 0.22\n","Ho khen mình\n","They praise themselves UN 0.82 0.59 0.59\n","Chúng tôi khen chúng tôi \n","We praise ourselves N 0.33 0.16 0.08\n","Chúng tôi khen chúng tôi \n","We praise ourselves UN 0.49 0.63 0.60\n","Chúng nó khen chúng nó \n","They praise themselves N 0.41 0.39 0.43\n","Chúng nó khen chúng nó \n","They praise themselves UN 0.16 0.34 0.34\n","Họ khen họ \n","They praise themselves N 0.34 0.19 0.19\n","Họ khen họ \n","They praise themselves UN 0.27 0.51 0.52\n","Chúng mày khen chúng mày đó hả? \n","Are you all praising myself ? N 0.19 0.39 0.56\n","Chúng mày khen chúng mày đó hả? \n","Are you all praising myself ? UN 0.30 0.28 0.24\n","Chúng tôi tự khen chúng tôi \n","We praise ourselves N 0.41 0.31 0.31\n","Chúng tôi tự khen chúng tôi \n","We praise ourselves UN 0.50 0.48 0.54\n","Chúng nó tự khen chúng nó \n","They praise themselves N 0.75 0.60 0.57\n","Chúng nó tự khen chúng nó \n","They praise themselves UN 0.16 0.21 0.17\n","Chúng mày tự khen chúng mày đó hả ? \n","Are you all praising yourselves ? N 0.70 0.61 0.67\n","Chúng mày tự khen chúng mày đó hả ? \n","Are you all praising yourselves ? UN 0.20 0.15 0.12\n","Họ tự hen họ \n","They praise themselves N 0.90 0.64 0.64\n","Họ tự hen họ \n","They praise themselves UN 0.09 0.17 0.24\n"]}]}]} \ No newline at end of file diff --git a/teaching materials/cheatsheet_basics.ipynb b/teaching materials/cheatsheet_basics.ipynb new file mode 100755 index 0000000..6424376 --- /dev/null +++ b/teaching materials/cheatsheet_basics.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Basics - Cheatsheet.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"fliaJAWJqSo8"},"source":["# Python basics cheatsheet\n","## Types\n","#### numbers\n","```python\n","0\n","2.0\n","-8\n","```\n","####strings (text)\n","```python\n","# string (text)\n","\"this is a string\"\n","'this is also a string'\n","```\n","#### booleans\n","```python\n","True\n","False\n","```\n","\n","\n","\n","\n","\n","\n","\n","\n"]},{"cell_type":"markdown","metadata":{"id":"0EXsKLzkuIjl"},"source":["## Operators\n","#### arithmetic\n","```python\n","1 + 1\n","8 - 2\n","10 * 4\n","12 / 3\n","```\n","#### string operators\n","```python\n","# concatenation\n","\"hello\" + \"world\"\n","# repetition\n","\"hello\"*3\n","```\n","#### comparison\n","```python\n","1 == 1 # equals\n","\"hello\" != \"goodbye\" # not equals\n","2 > 1 # greater than\n","1 < 2 # smaller than\n","```"]},{"cell_type":"markdown","metadata":{"id":"yDVm1CyMuEfO"},"source":["## Variables\n","#### assigning and reassigning\n","```python\n","a = 20\n","a = \"hello\"\n","b = 2 + 1\n","c = 3 + b\n","```"]},{"cell_type":"markdown","metadata":{"id":"OO8XCq8_uCP_"},"source":["## Lists\n","```python\n","[]\n","[1, 2, 3]\n","[\"word\", True, 8.2]\n","```\n","#### accessing and setting elements\n","```python\n","list[index]\n","list[index] = \"new value\"\n","```\n","\n","#### adding elements\n","```python\n","list = list + [element]\n","list.append(element)\n","list = list + [element1, element2]\n","```\n","\n","#### slicing (x to y)\n","```python\n","list[x:y]\n","```\n","\n","#### slicing (x to end)\n","```python\n","list[x:]\n","```"]},{"cell_type":"markdown","metadata":{"id":"bsbP2Rgkt8_A"},"source":["\n","## Loops\n","#### loop over a list\n","```python\n","for element in list:\n"," # execute some code\n"," # and some more code\n","```\n","#### loop over a string\n","```python\n","for letter in \"string\":\n"," # execute some code\n","```\n","#### list of lists\n","```python\n","nested_list = [[element1, element2], [element3, element4]]\n","sublist1 = nested_list[0]\n","sublist2 = nested_list[1]\n","\n","for sublist in nested_list: \n"," for element in sublist:\n"," # execute some code\n","```\n","\n","\n","\n","\n"]},{"cell_type":"markdown","metadata":{"id":"rTjNruPOu5BI"},"source":["## Conditionals\n","#### if-elif-else\n","```python\n","if condition1:\n"," # execute some code\n","elif condition2:\n"," # execute some other code\n","else:\n"," # execute code if no conditions are true\n","```\n","#### logical operators\n","```python\n","condition1 and condition2\n","condition1 or condition2\n","not condition\n","```"]}]} \ No newline at end of file diff --git a/teaching materials/cheatsheet_functions.ipynb b/teaching materials/cheatsheet_functions.ipynb new file mode 100644 index 0000000..b71f55e --- /dev/null +++ b/teaching materials/cheatsheet_functions.ipynb @@ -0,0 +1,108 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Functions-cheatsheet.ipynb", + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "K10hhF1isHEh" + }, + "source": [ + "# Functions cheatsheet\n", + "\n", + "## Defining a function\n", + "\n", + "```python\n", + "def function_name(parameter1, parameter2):\n", + " # Statements that involve the parameters here.\n", + " # Variables defined here are only visible inside\n", + " # the function.\n", + " return # Expression with the end result here.\n", + "```\n", + "\n", + "## Calling a function\n", + "\n", + "```python\n", + "a_variable = function_name(argument1, argument2)\n", + "```\n", + "\n", + "## Optional parameters\n", + "\n", + "```python\n", + "# When defining the function\n", + "def function_name(required, optional=None):\n", + " if optional is None:\n", + " optional = # Insert expression here.\n", + " # Rest of function body.\n", + "\n", + "# When calling the function,\n", + "# ... including only the required parameter:\n", + "function_name(argument1)\n", + "# ... including the required and the optional parameter:\n", + "function_name(argument1, argument2)\n", + "```\n", + "\n", + "## Named arguments\n", + "\n", + "- Let you pass one optional argument while skipping many others.\n", + "- Let you pass arguments in a different order.\n", + "\n", + "```python\n", + "function_name(named_parameter=argument)\n", + "```\n", + "\n", + "## Constants\n", + "\n", + "```python\n", + "# Place this at the top of your program.\n", + "CONSTANT_NAME = 'value of the constant'\n", + "# Everywhere in your program, use CONSTANT_NAME instead\n", + "# of the literal 'value of the constant'.\n", + "# The value of CONSTANT_NAME must never change.\n", + "```\n", + "\n", + "## Setting a breakpoint\n", + "\n", + "```python\n", + "import pdb; pdb.set_trace()\n", + "```\n", + "\n", + "## Debugger commands\n", + "\n", + "```\n", + "p - print the value of the named variable\n", + "n - go to next line\n", + "s - step into next function call\n", + "u - continue until the current function returns\n", + "c - continue until the next breakpoint, or until the program ends\n", + "q - stop now without finishing the program\n", + "h - help, overview of commands\n", + "```" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "syk7ykOQiAZ7" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/teaching materials/tips.ipynb b/teaching materials/tips.ipynb new file mode 100644 index 0000000..3267450 --- /dev/null +++ b/teaching materials/tips.ipynb @@ -0,0 +1,2250 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Tips.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "eL_rkx7cIm77" + }, + "source": [ + "# Tips\n", + "\n", + "This notebook contains tips for the final exercise of the 2021 CDH entry level Python course at Utrecht University. Participants were asked to think of an analysis that they might want to perform and to submit a description of this analysis in advance of the course. The tips in this notebook were chosen based on those submissions.\n", + "\n", + "The tips in this notebook build on top of the [basics cheatsheet](https://colab.research.google.com/drive/1um6_5Pd6biIK0TPAP6bIGnx3qiM3iTRj?usp=sharing) and the [functions cheatsheet](https://colab.research.google.com/drive/1O0ARbDaX7_A7tNF1QWjQAVlu8jY4F24o?usp=sharing).\n", + "\n", + "While the notebook is written such that you can read it top to bottom, you are welcome to cherry-pick the topics that interest you. There is a table of contents in the left margin. Some sections contain internal links to other sections.\n", + "\n", + "As a general tip, you can get a complete overview of the Python standard library [over here][python-libs].\n", + "\n", + "[python-libs]: https://docs.python.org/3/library/index.html" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wBOMsp2twgbB" + }, + "source": [ + "## Converting between types\n", + "\n", + "String to int:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1aU45qPvwntM" + }, + "source": [ + "int('123')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cfrMiRd3wy9d" + }, + "source": [ + "Integer to string:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LNJEIXCtw6rq" + }, + "source": [ + "str(123)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RtGlRbICxf__" + }, + "source": [ + "Float to string:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ejGhZs8SxjUN" + }, + "source": [ + "str(0.5)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TdaVUNpBxmQ8" + }, + "source": [ + "String to float:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Nwk3D9VExoU_" + }, + "source": [ + "float('0.5')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6CYPccQYxwCm" + }, + "source": [ + "Boolean to string:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JJf6fjNGxzvC" + }, + "source": [ + "str(True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LV7o-rkDx3MY" + }, + "source": [ + "String to boolean:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UUPNXO4mx5eb" + }, + "source": [ + "print('Direct boolean from string does not work:', bool('False'))\n", + "\n", + "# So we have to write a function.\n", + "def boolean_from_string(string):\n", + " if string == 'False':\n", + " return False\n", + " else:\n", + " return True\n", + "\n", + "print(boolean_from_string('True'))\n", + "print(boolean_from_string('False'))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CfnNAUKmyhOj" + }, + "source": [ + "Integer to float:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "st9vZgf0yixm" + }, + "source": [ + "float(123)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gmw_vdGoyl3c" + }, + "source": [ + "Float to integer:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JZ_l3IdhynF-" + }, + "source": [ + "int(0.5)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "97z6FUGz8uAS" + }, + "source": [ + "## Strings\n", + "\n", + "Strings have a couple of tricks that may be useful for the final exercise. We illustrate them quickly below. A complete overview of all string methods can be found [here](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).\n", + "\n", + "[`str.startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith) and [`str.endswith`](https://docs.python.org/3/library/stdtypes.html#str.endswith) will tell you whether a string begins or ends with a particular other string, respectively." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "07ik0vd2-6tQ" + }, + "source": [ + "for word in ['magazine', 'kangaroo', 'rooster', 'broom']:\n", + " if word.startswith('roo'):\n", + " print('\"' + word + '\" starts with \"roo\"')\n", + " if word.endswith('roo'):\n", + " print('\"' + word + '\" ends with \"roo\"')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FVyAad6OAtNX" + }, + "source": [ + "[`str.lower`](https://docs.python.org/3/library/stdtypes.html#str.lower) and [`str.upper`](https://docs.python.org/3/library/stdtypes.html#str.upper) return a copy of a string that is converted to all lowercase or all uppercase, respectively. These functions are useful when you don't want case to influence comparisons." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9sjgompVA_Qi" + }, + "source": [ + "word1 = 'banana'\n", + "word2 = 'Banana'\n", + "\n", + "print('case-sensitive:', word1 == word2)\n", + "print('case-insensitive:', word1.lower() == word2.lower())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TyjWFAWR_0Dp" + }, + "source": [ + "[`str.join`](https://docs.python.org/3/library/stdtypes.html#str.join) can glue a sequence of strings together, as we have seen in [the Monday afternoon session](https://colab.research.google.com/drive/10H5QM0jBiDZJdjqPlvBia4oZMnIbxvuy?usp=sharing)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JlqAc5N8AQPu" + }, + "source": [ + "print(' + '.join(['1', '2', '3', '4']))\n", + "print(', '.join(['do', 're', 'mi', 'fa', 'sol', 'la', 'ti', 'do']))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g0x1VvE5B71w" + }, + "source": [ + "[`str.split`](https://docs.python.org/3/library/stdtypes.html#str.split) is the opposite of `str.join`: it will split a string by a given separator and return a list with the fragments. If you don't specify a separator, it will split by whitespace." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "R11wYmWFCVdb", + "outputId": "ca8804e9-80a5-4771-e3f0-1adee880f66b" + }, + "source": [ + "print('1 + 2 + 3 + 4'.split(' + '))\n", + "print('1 + 2 + 3 + 4'.split('+'))\n", + "print('1 + 2 + 3 + 4'.split())\n", + "print('1 2 3 4'.split())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "['1', '2', '3', '4']\n", + "['1 ', ' 2 ', ' 3 ', ' 4']\n", + "['1', '+', '2', '+', '3', '+', '4']\n", + "['1', '2', '3', '4']\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0csn-TVPC8qG" + }, + "source": [ + "[`str.splitlines`](https://docs.python.org/3/library/stdtypes.html#str.splitlines) is basically `str.split('\\n')`, but it cleverly recognizes many types of line endings (which might differ between platforms) and it has an option to keep the line endings in the resulting fragments." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k1xy1XmaED7X" + }, + "source": [ + "[`str.strip`](https://docs.python.org/3/library/stdtypes.html#str.strip) will return a new string with the whitespace removed from the beginning and end. You can also specify a different set of characters that should be removed. The default mode of removing whitespace is useful for cleaning up text that was downloaded from the internet or that was copypasted from some kind of document." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "djsFEC5DE6md" + }, + "source": [ + "\" This string isn't very tidy. \".strip()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G4tC_OiFFth3" + }, + "source": [ + "### Escapes\n", + "\n", + "There are some characters that normally aren't allowed to appear in a literal string. We can shoehorn them in anyway by placing a backslash `\\` in front of the character, or another letter that acts as a placeholder. This is called *escaping* the character. The combination of the backslash and the following character is called an *escape sequence*. Python also uses these escape sequences when echoing raw strings back at us. The following escape sequences occur often:\n", + "\n", + "`\\n` - linefeed (\"newline\") character.\n", + "\n", + "`\\r\\n` - carriage return plus linefeed. For archeological reasons I will not go into, these two characters together count as a single line separator in Microsoft Windows. So you are likely to find it in files that were created in Windows.\n", + "\n", + "`\\t` - tab character.\n", + "\n", + "`\\'` - straight single quote (escape not needed in strings delimited by double quotes).\n", + "\n", + "`\\\"` - straight double quote (escape not needed in strings delimited by single quotes).\n", + "\n", + "`\\\\` - the backslash itself.\n", + "\n", + "You can get a complete overview of escape sequences if you scroll down from [here](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mUmNSdNzMq_M" + }, + "source": [ + "### Format strings\n", + "\n", + "A format string is a perfectly normal string that contains some special notation, i.e., pairs of braces `{}`. While the presence of those braces does not oblige us to anything, we *can* use their presence (and insert them on purpose) in order to benefit from the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) method. For example, if I have the following string,\n", + "\n", + "```python\n", + "'Ta-da!'\n", + "```\n", + "\n", + "I can turn it into a format string simply by inserting a pair of braces, anywhere I like:\n", + "\n", + "```python\n", + "'Ta-da: {}'\n", + "```\n", + "\n", + "If I call `str.format` on a format string, it will interpret pairs of braces as placeholders and replace them by any arguments that I pass." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 37 + }, + "id": "NQmg3z2cPPFW", + "outputId": "c6647728-a0ac-4f50-e5ed-36e9b0c94cbf" + }, + "source": [ + "'Ta-da: {}'.format('this is Python!')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Ta-da: this is Python!'" + ] + }, + "metadata": {}, + "execution_count": 29 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZtwiQhMJPcAd" + }, + "source": [ + "You can insert as many placeholders and pass as many arguments as you like, as long as there are at least as many arguments as placeholders. Of course, you usually want the number of arguments to exactly match the number of placeholders in the format string, but `str.format` will simply skip any arguments that remain." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "x_f2iRsQQNqr" + }, + "source": [ + "print('Ta-da: {}'.format(1, 2, 3))\n", + "print('Ta{}da{} {}'.format('-', ':', 'success!'))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7GOzHkgLRGYi" + }, + "source": [ + "Format strings are a great way to compose strings out of some fixed parts and some variable parts, especially if the fixed parts aren't entirely regular. Consider the following code, which you might have written in [exercise 3]():" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mTVp_EOmR_nm" + }, + "source": [ + "YELL_START = 'Go go '\n", + "YELL_END = '!!!'\n", + "\n", + "def yell(name):\n", + " return YELL_START + name + YELL_END\n", + "\n", + "yell('Jelte')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fwTdCa4QSOuv" + }, + "source": [ + "Using a format string, this code would be a bit more explicit and a bit easier to read and write as well:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KvbXfNykSmiB" + }, + "source": [ + "YELL_FORMAT = 'Go go {}!!!'\n", + "\n", + "def yell(name):\n", + " return YELL_FORMAT.format(name)\n", + "\n", + "yell('Jelte')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TbnEKRdlTGTj" + }, + "source": [ + "The above discussion addresses about 90% of all string formatting needs, but for the remaining 10%, `str.format` is chock-full of bells and whistles. You can use named arguments, reuse the same argument in multiple places, align placeholders to a particular width with filler characters of choice, specify how to format numbers, and so forth and so on. You can read all about it [here](https://docs.python.org/3/library/string.html#formatstrings)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0FJ_vXwlwT_5" + }, + "source": [ + "### Cross-platform file paths\n", + "\n", + "Consider the location of the current notebook (`Tips.ipynb`). It is located inside a folder called `Colab Notebooks`, which is inside a folder called `My Drive`, which is inside my Google Drive account. In Windows, we write such paths as follows:\n", + "\n", + " My Drive\\Colab Notebooks\\Tips.ipynb\n", + "\n", + "In macOS and Linux, on the other hand, we separate the path components with forward slashes:\n", + "\n", + " My Drive/Colab Notebooks/Tips.ipynb\n", + "\n", + "In Python, we often need to create a string with a file path, which would be different depending on whether the code is running on Windows or elsewhere (backslashes appear twice because they need to be [escaped](#scrollTo=Escapes)):\n", + "\n", + "```py\n", + "windows_path = 'My Drive\\\\Colab Notebooks\\\\Tips.ipynb'\n", + "maclinux_path = 'My Drive/Colab Notebooks/Tips.ipynb'\n", + "```\n", + "\n", + "We generally want our code to work on all platforms, without having to change the path notation or having to introduce `if`/`else` statements everywhere to choose between alternative file paths. This is where the standard module [`os.path`][os.path] comes to our rescue. It provides a function called `join`, which glues the path components together with the separator that is appropriate for whatever platform the code is running on:\n", + "\n", + "```py\n", + "import os.path as op\n", + "\n", + "crossplatform_path = op.join('My Drive', 'Colab Notebooks', 'Tips.ipynb')\n", + "```\n", + "\n", + "The above snippet is repeated in the code cell below. If you run it, you will see that Google Colab runs on a system that uses the same path convention as macOS and Linux.\n", + "\n", + "Besides glueing path components together, [`os.path`][os.path] also provides tools for taking paths apart again and for converting back and forth between absolute and relative paths.\n", + "\n", + "[os.path]: https://docs.python.org/3/library/os.path.html" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "AfuPbq5iwRsR", + "outputId": "48cb5576-1a6d-4b82-fcb2-46c76c0bbb3d" + }, + "source": [ + "import os.path as op\n", + "\n", + "crossplatform_path = op.join('My Drive', 'Colab Notebooks', 'Tips.ipynb')\n", + "\n", + "print(crossplatform_path)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "My Drive/Colab Notebooks/Tips.ipynb\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zHyB8HjpU9hv" + }, + "source": [ + "## Tuples\n", + "\n", + "When you have two things, you can call them a pair. When you have three things, you can call them a triple. Four things, a quadruple. We continue with quintuple, sextuple, septuple, octuple. See where this is going? ;-)\n", + "\n", + "The general mathematical term for small groups like those is *N*-tuple, commonly abbreviated as \"tuple\". Python has a data type for tuples, which you almost never need to be aware of. It's a feature that [silently](https://docs.python.org/3/library/functions.html#func-tuple) makes things work, just like cleaners silently and inconspicuously prevent our society from collapsing.\n", + "\n", + "You don't need to create your own tuples for the final exercise (at least not consciously), but since the terminology is bound to come up, I will give a short explanation of what they are. A tuple is similar to a list, with two main differences:\n", + "\n", + "1. Tuples are intended for small(ish) groups of values, while lists will happily store millions of values for you.\n", + "2. Tuples are *immutable*: you cannot change their contents after creation.\n", + "\n", + "When you see values separated by commas, and they are not in a list, a parameter list or an argument list, you are probably looking at a tuple. I will mention it when they appear in other tips. Here, I will just mention two situations where tuples may appear all by themselves.\n", + "\n", + "Tuples are sometimes used to return two or more values from a function at the same time. For example, the built-in function [`divmod`](https://docs.python.org/3/library/functions.html#divmod) can tell you the quotient and remainder of two numbers in one call:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LrwtKmfpayq0" + }, + "source": [ + "quotient, remainder = divmod(29, 11)\n", + "\n", + "print('quotient:', quotient, 'check:', quotient == 29 // 11)\n", + "print('remainder:', remainder, 'check:', remainder == 29 % 11)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qm-KzHTdbtCq" + }, + "source": [ + "`divmod(29, 11)` was returning a tuple and we *unpacked* that tuple into the variables `quotient` and `remainder`. That's because their names where to the left of the assignment operator. Tuple unpacking was already briefly mentioned in [the Tuesday morning session](https://colab.research.google.com/drive/11mQhWcu5zLe05UhokHkD4s38SjTSqJjH#scrollTo=fmaAa4v6mVaU). On the other hand, when we put comma-separated values to the *right* of an assignment operator, it means that we *pack* those values into a new tuple. We can do a nice trick with this and swap two variables by packing them into a tuple and then immediately unpacking them again in reverse order:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ttECYgPLc6ra" + }, + "source": [ + "winner = 'Bert'\n", + "loser = 'Ernie'\n", + "\n", + "winner, loser = loser, winner\n", + "\n", + "print('winner:', winner)\n", + "print('loser:', loser)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hpnieEnwpmhK" + }, + "source": [ + "## Dictionaries\n", + "\n", + "On the first day of the course, we have seen lists, which can hold an arbitrary number of values. The values are numbered sequentially, with the first having index `0`. You can create an empty list by calling `list()` or by writing an empty pair of square brackets, `[]`.\n", + "\n", + "A [dictionary][dict] is also a data structure that can hold an arbitrary number of values. The difference from a list is that the values are not numbered sequentially. Instead, every value has an arbitrary unique **key** which you need to set explicitly. An empty dictionary is created by calling `dict()` or by writing an empty pair of braces, `{}`.\n", + "\n", + "Dictionaries are useful when you want to associate values with each other. For example, your dataset might have a nominal column called `fruit`, with the possible levels being `apple`, `banana` and `cherry`, and you might want to count for each level in how many rows it occurs. In this case your will be associating fruits with frequencies, and a dictionary is the appropriate data structure for storing those associations. In the code below, we illustrate how dictionaries work and how you might use one to tally frequencies.\n", + "\n", + "[dict]: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7JVH7h_Bu6mS" + }, + "source": [ + "# In the general case, we can create a dictionary and\n", + "# immediately set key-value pairs with the brace notation:\n", + "example_dict = {\n", + " 'apple': 'juicy',\n", + " 'banana': 'fragrant',\n", + " 'cherry': 'sweet',\n", + "}\n", + "\n", + "# Retrieving a value associated with a key can be done by\n", + "# placing the key between square brackets, similar to\n", + "# indexing a list:\n", + "example_dict['cherry']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zhIMlt2TxoGC" + }, + "source": [ + "# If you try to read a key that isn't present in the \n", + "# dictionary, you will get a KeyError:\n", + "example_dict['date']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vOk3VAnIyWgz" + }, + "source": [ + "# If we want to know whether a key is present in a dict,\n", + "# we can use the `in` operator:\n", + "print('cherry' in example_dict)\n", + "print('date' in example_dict)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "yE8urPkiy-Iw" + }, + "source": [ + "# If we want to retrieve the value for a key if it exists,\n", + "# and fall back to a default value otherwise, we can use `get`:\n", + "print(example_dict.get('cherry', 'oops'))\n", + "print(example_dict.get('date', 'oops'))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "BSXQQZfs0OfU" + }, + "source": [ + "# You can update and add key-value pairs by assignment.\n", + "example_dict['banana'] = 'yellow'\n", + "example_dict['date'] = 'wrinkly'\n", + "\n", + "# You can remove keys with the `del` operator.\n", + "del example_dict['apple']\n", + "\n", + "# Let's see what we have now.\n", + "example_dict" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VisTnjOjxodE" + }, + "source": [ + "In the next two examples, we use [tuple unpacking](#scrollTo=Tuples)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XnZcOLDM1zM-" + }, + "source": [ + "# We can iterate over the keys and values of dictionary.\n", + "for key, value in example_dict.items():\n", + " print('A', key, 'is', value)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PdcV9FVm2X48" + }, + "source": [ + "# Now let's see how we can use a dictionary to tally.\n", + "# Suppose we have the following table of fruit orders:\n", + "orders = [\n", + " ['2021-11-15', 'banana', 100],\n", + " ['2021-11-16', 'apple', 33],\n", + " ['2021-11-17', 'banana', 150],\n", + "]\n", + "\n", + "# We will pretend we haven't already seen those data and\n", + "# start with an empty dict.\n", + "fruit_tally = {}\n", + "\n", + "# Now we iterate over the orders and fill our dict.\n", + "for date, fruit, quantity in orders:\n", + " # First we retrieve how many times we have seen `fruit`\n", + " # before. If we haven't seen it before, the key isn't in the\n", + " # dict, so we provide 0 as a fallback value.\n", + " tally = fruit_tally.get(fruit, 0)\n", + " # Now we can add or update the tally for this `fruit`.\n", + " fruit_tally[fruit] = tally + 1\n", + "\n", + "# Did we count correctly?\n", + "fruit_tally" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g_90Pk6j4Plm" + }, + "source": [ + "For exercise, you can try adapting the above code example to compute total ordered quantities per fruit, or a list of order dates per fruit." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhu9T00mFSHd" + }, + "source": [ + "## Iterables\n", + "\n", + "In the [FizzBuzz exercise](https://colab.research.google.com/drive/1P68w4Ewzptp5KLo5EMMowKOSywFMzoc8#scrollTo=FMySjPAXx4v1), we saw the following notation for creating a list with the integers from `0` (inclusive) to `101` (exclusive):\n", + "\n", + "```python\n", + "list(range(0, 101))\n", + "```\n", + "\n", + "We later also saw this notation in [the Monday afternoon session](https://colab.research.google.com/drive/10H5QM0jBiDZJdjqPlvBia4oZMnIbxvuy#scrollTo=Jwfbo50wBLfF&line=1&uniqifier=1). In that case, we passed a third argument to `range` to set the step size to `2`, so that only every second integer was included in the range. For completeness, I'll mention that if you pass only one argument, `range` will interpret that as the end value and assume that you want to start at `0`.\n", + "\n", + "It's clear what such a `list(range(begin, end))` expression does, but the curious might wonder what you get if you leave off the outer `list()`. What is a `range` just by itself? In any case, you can still loop over it:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XTmiVuEAGJlL" + }, + "source": [ + "for number in range(0, 3):\n", + " print(number)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "axzQrGdBLWDF" + }, + "source": [ + "However, unlike with a list, we cannot add our own elements to it:" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 164 + }, + "id": "oHDOpj2GLevE", + "outputId": "2d0ae6bd-61b2-497a-e8da-b1a45d1c3b97" + }, + "source": [ + "range(0, 3).append(3)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "AttributeError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m: 'range' object has no attribute 'append'" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6q2Tt_4eKc_K" + }, + "source": [ + "If we try to print it, it remains mysterious:" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-MtW-ANdKn_d", + "outputId": "39abd51e-e1bf-4ef9-997a-52d88f452448" + }, + "source": [ + "print(range(0, 3))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "range(0, 3)\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X6KSp6FbMPRP" + }, + "source": [ + "I will no longer keep you in suspense. The value returned by [`range`][range] is a *generator*. You can think of a generator as a function that returns multiple times, each time producing the next value in some kind of sequence. This means that a generator, unlike a list, does not store all elements of its sequence at the same time; rather, it produces the next one when you ask for it.\n", + "\n", + "> For the final exercise, you don't need to write generators yourself, so we will not spend any text on how to do that. It is however useful to know how to use them, so we discuss that below.\n", + "\n", + "Python's `for` loop knows how to take one value at a time out of a generator, just like it knows how to take one value at a time out of a list. We call the more general class of things that `for` can loop over *iterables*. There are many more types of iterables besides lists and generators (including [tuples](#scrollTo=Tuples) and [dictionaries](#scrollTo=Dictionaries)) and `for` is able to deal with all of them, because all iterables follow the same **iterator convention**.\n", + "\n", + "In fact, this convention is not restricted to `for`. Most functions in the Python standard library that work with sequences, accept not just lists but any iterable. We have already seen this in action when we did `list(range(0, 101))`: `list` accepted an iterable, `range(0, 101)`, took out its values one by one, stored all of them in a list, and finally returned that list to you.\n", + "\n", + "Since generators don't store all of their values at the same time and we might not always need all values in a sequence, they are potentially more efficient than lists. For this reason, many standard library functions also *return* generators rather than lists.\n", + "\n", + "By combining functions that consume and return iterables, we can often replace loops by shorter expressions. In the next few subsections, we illustrate the most important functions on iterables.\n", + "\n", + "[range]: https://docs.python.org/3/library/functions.html#func-range" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RelVVKVzX9T7" + }, + "source": [ + "### `enumerate`\n", + "\n", + "The built-in [`enumerate`][enumerate] accepts any iterable and returns a generator. As the name suggests, it numbers the values in the input sequence, while also echoing back the values themselves (in [pairs](#scrollTo=Tuples)):\n", + "\n", + "[enumerate]: https://docs.python.org/3/library/functions.html#enumerate" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Zq8dHhH-Y-Cx" + }, + "source": [ + "example_list = ['Colorless', 'green', 'ideas', 'sleep', 'furiously']\n", + "\n", + "list(enumerate(example_list))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LMPR_En8Zhn7" + }, + "source": [ + "This can be useful when you are looping over a sequence, and you need not only the values but also their index in the sequence:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ADqR2KG2Zdkc" + }, + "source": [ + "for index, value in enumerate(example_list):\n", + " print('Word number', index, 'is', value)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cSdPo0RGbvre" + }, + "source": [ + "For comparison, this is what the above loop would look like without `enumerate`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tEOM7Iwwbz9r" + }, + "source": [ + "index = 0\n", + "for value in example_list:\n", + " print('Word number', index, 'is', value)\n", + " index = index + 1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vX59S1uDaBLI" + }, + "source": [ + "### `filter`\n", + "\n", + "The built-in [`filter`][filter] accepts a function and an iterable. It passes each value in the iterable to the function in a separate call. It returns a generator with only the values in the input sequence for which the function returned `True`.\n", + "\n", + "[filter]: https://docs.python.org/3/library/functions.html#filter" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YgQ2RCRbbJDY" + }, + "source": [ + "def odd(number):\n", + " return number % 2 == 1\n", + "\n", + "fibonacci_10 = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]\n", + "\n", + "list(filter(odd, fibonacci_10))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "30NBqwSZbqee" + }, + "source": [ + "For comparison, this is what the last line would look like without `filter`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UuzJGcjOcOL8" + }, + "source": [ + "result_list = []\n", + "for number in fibonacci_10:\n", + " if odd(number):\n", + " result_list.append(number)\n", + "result_list" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "81vNGdqgc1PI" + }, + "source": [ + "### `map`\n", + "\n", + "The built-in [`map`][map] accepts a function and an iterable. It passes each value in the iterable as the first argument to the function in a separate call. It returns a generator with the return values of each of those calls.\n", + "\n", + "[map]: https://docs.python.org/3/library/functions.html#map" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gdw4kFoCeQ4x" + }, + "source": [ + "def square(number):\n", + " return number ** 2\n", + "\n", + "list(map(square, range(10)))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CMWKdfZued1f" + }, + "source": [ + "For comparison, code without `map` that produces the same output as the last line:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I13FNH_ZeptA" + }, + "source": [ + "result_list = []\n", + "for number in range(10):\n", + " result_list.append(square(number))\n", + "result_list" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "S5NtKmJ21L_u" + }, + "source": [ + "You can also pass multiple iterables to `map`. In that case, `map` will take one value from each iterable per iteration and pass the value from the first iterable as the first argument to the function, the corresponding value from the second iterable as the second argument, and so on. In the following example, we use a [format string](#scrollTo=Format_strings)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5W6NwA8D2Kz3" + }, + "source": [ + "sentence = '{} {} {} {}.'.format\n", + "\n", + "# The following lists have been shuffled.\n", + "# For fun, you can try reordering them so the correct words\n", + "# from each list match up again. :-)\n", + "# (But run the code first so you know what it does.)\n", + "properties = ['Gentle', 'Playful', 'Stubborn', 'Thirsty']\n", + "people = ['camels', 'plumbers', 'giants', 'children']\n", + "verbs = ['tighten', 'devour', 'ruin', 'capture']\n", + "objects = ['nightmares', 'massage chairs', 'cactuses', 'rusty fittings']\n", + "\n", + "phrases = map(sentence, properties, people, verbs, objects)\n", + "for phrase in phrases:\n", + " print(phrase)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G7fup_SyBode" + }, + "source": [ + "Without `map` (and without `enumerate` or `range`), the last loop would have looked like this instead:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9a1XKPuFBtpF" + }, + "source": [ + "index = 0\n", + "for prop in properties:\n", + " group = people[index]\n", + " verb = verbs[index]\n", + " obj = objects[index]\n", + " print(sentence(prop, group, verb, obj))\n", + " index = index + 1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9DCBUR61DNRr" + }, + "source": [ + "If you pass iterables of unequal lengths, `map` will stop at the end of the shortest iterable. In the next subsection, we will take a quick look at how this can sometimes be useful." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pR-4fFPZDMOi" + }, + "source": [ + "# operator.mul is a function that multiplies two numbers. It\n", + "# does exactly the same thing as the `*` operator, but as a\n", + "# function so you can pass it as an argument to other functions.\n", + "# More about the operator module in the next subsection.\n", + "from operator import mul\n", + "\n", + "small = [1, 2, 3]\n", + "large = [5, 7, 11, 13, 17, 19, 23, 29]\n", + "\n", + "list(map(mul, small, large))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yBYdJR9VHLQU" + }, + "source": [ + "### More built-in functions\n", + "\n", + "`range`, `enumerate`, `filter` and especially `map` are the functions on iterables that you'll be using the most. There are however more built-in functions on iterables worth knowing about. Below, we briefly mention a few.\n", + "\n", + "- [`all`](https://docs.python.org/3/library/functions.html#all) consumes an iterable. If the sequence contains at least one `False` value, it returns `False`. Otherwise it returns `True` (including when it is empty). It's like the `and` operator, but operating on an arbitrary number of operands instead of exactly two.\n", + "- [`any`](https://docs.python.org/3/library/functions.html#any) is the exact opposite of `all`: it returns `True` if the sequence contains at least one `True` value, otherwise `False`. Can be thought of as the \"long form\" of the `or` operator.\n", + "- [`len`](https://docs.python.org/3/library/functions.html#len) can tell you the length of lists, strings, and some other iterables that can \"know\" their size in advance, including `range`.\n", + "- [`list`](https://docs.python.org/3/library/functions.html#func-list), as we have seen in previous examples, can store the values from any iterable into a list.\n", + "- [`max`](https://docs.python.org/3/library/functions.html#max) can be passed a single iterable argument, in which case it will return the maximum value of the sequence. You can however also pass multiple arguments, in which case it will simply compare them directly (e.g. `max(range(10))` will return `9` and `max(3, 4)` will return `4`). Likewise for [`min`](https://docs.python.org/3/library/functions.html#min).\n", + "- [`str.join`](https://docs.python.org/3/library/stdtypes.html#str.join), which I covered in more detail in [Strings](#scrollTo=Strings), can take the strings that it will glue together from any iterable sequence.\n", + "- [`sum`](https://docs.python.org/3/library/functions.html#sum), as the name suggests, will return the sum of all values in an iterable sequence." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LugfHklV0b4C" + }, + "source": [ + "### Operators\n", + "\n", + "Given two lists of numbers, we might want to create a third list where each element is the sum of the correponding elements of the first two lists. We cannot pass the `+` operator to `map`, because the `+` operator is not a function:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ddcs1QaK1APC" + }, + "source": [ + "first_list = [1, 2, 3]\n", + "second_list = [7, 7, 5]\n", + "\n", + "list(map(+, first_list, second_list))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ix-TPRvY1Pc-" + }, + "source": [ + "Fortunately, the [`operator`](https://docs.python.org/3/library/operator.html) standard module exports function versions of most operators, so we can do this instead:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UG_UUx8S1jQw" + }, + "source": [ + "from operator import add\n", + "\n", + "first_list = [1, 2, 3]\n", + "second_list = [7, 7, 5]\n", + "\n", + "list(map(add, first_list, second_list))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ezmNWLLM2G4w" + }, + "source": [ + "The operators that you would most likely use this way, and their corresponding functions exported from `operator`, are the following (full list [here](https://docs.python.org/3/library/operator.html#mapping-operators-to-functions)):\n", + "\n", + "`+` - `add` (for adding numbers)\n", + "\n", + "`+` - `concat` (for concatenating strings or lists)\n", + "\n", + "`-` - `neg` (unary minus to flip the sign of a number)\n", + "\n", + "`-` - `sub` (binary minus to subtract two numbers)\n", + "\n", + "`in` - `contains` (for checking whether a value appears in an iterable)\n", + "\n", + "`*` - `mul`\n", + "\n", + "`/` - `truediv` (`//` is `floordiv`)\n", + "\n", + "`%` - `mod`\n", + "\n", + "`**` - `pow`\n", + "\n", + "`<` - `lt`\n", + "\n", + "`>` - `gt`\n", + "\n", + "`==` - `eq`\n", + "\n", + "`!=` - `ne`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0SMYES0-gyBX" + }, + "source": [ + "### Bound methods\n", + "\n", + "In the `map` subsection, I used an [example](#scrollTo=5W6NwA8D2Kz3&line=1&uniqifier=1) with the notation `'{} {} {} {}.'.format`. I stored that in a variable and then passed that as a function to `map`. It turns out this is a general thing we can do in more situations, so let's briefly touch on how this works.\n", + "\n", + "The essence is that" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "51cj58Pdogj_" + }, + "source": [ + "'{} {} {} {}.'.format(1, 2, 3, 4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O-0kegzaonFi" + }, + "source": [ + "is equivalent to" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GqxgL5Rgorx3" + }, + "source": [ + "str.format('{} {} {} {}.', 1, 2, 3, 4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3DWOZQHKpClX" + }, + "source": [ + "We generally use the first form because it is more convenient, but Python is actually translating it to the second form behind our backs. The [format string](#scrollTo=Format_strings) `'{} {} {} {}.'` is being passed as the first argument to the function `str.format` in both cases.\n", + "\n", + "If we do `'{} {} {} {}.'.format` without actually calling the function and passing an argument list, Python understands that we want to use `'{} {} {} {}.'` as the first argument when we later make the call. It returns a special, *bound* version of `str.format` that already has our format string filled in, so we only need to supply the remaining arguments. This is a special form of *partial application* (we will see the more general form of partial application in the [next subsection](#scrollTo=partial)). Python's support for this special form has something to do with classes and objects, which you can optionally read more about in [a later section](#scrollTo=Classes_and_objects).\n", + "\n", + "With this theory out of the way, let's look at a couple more examples of how we can use both bound and unbound functions in `map` and similar functions. We use [string](#scrollTo=Strings) and [dictionary](#scrollTo=Dictionaries) function in this example." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xSptat6auBDW" + }, + "source": [ + "# We can map the unbound str.lower to lowercase a sequence of strings.\n", + "strings = ['Hawaii', 'Iceland', 'Hokkaido', 'Vanuatu']\n", + "print(list(map(str.lower, strings)))\n", + "\n", + "# We can filter by the bound dict.get to check for associated values.\n", + "topography = {\n", + " 'Iceland': 'volcanic',\n", + " 'Vanuatu': 'Melanesia',\n", + "}\n", + "# Give me only the islands I know something about.\n", + "print(list(filter(topography.get, strings)))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EqorhEmy6pxq" + }, + "source": [ + "With bound methods, we can achieve ultimate minimalism in our [example](#scrollTo=fwTdCa4QSOuv&line=1&uniqifier=1) from the format strings section, repeated here:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q49H5CvR7DbK" + }, + "source": [ + "YELL_FORMAT = 'Go go {}!!!'\n", + "\n", + "def yell(name):\n", + " return YELL_FORMAT.format(name)\n", + "\n", + "yell('Jelte')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kJnvBskM7GvW" + }, + "source": [ + "because we can suffice with this:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BHiKKKM77JKL" + }, + "source": [ + "yell = 'Go go {}!!!'.format\n", + "\n", + "yell('Jelte')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XO0Q3vhf74Nd" + }, + "source": [ + "### `itertools` and `functools`\n", + "\n", + "The [`itertools`](https://docs.python.org/3/library/itertools.html) and [`functools`](https://docs.python.org/3/library/functools.html) standard modules let you turn your iterable-fu up to 11. Most of the contents of these modules are a bit advanced, but there are a couple of tricks in there that might be useful during the final exercise. We quickly illustrate them below." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucNV6xs0Tr8x" + }, + "source": [ + "#### `repeat`\n", + "\n", + "[`itertools.repeat`](https://docs.python.org/3/library/itertools.html#itertools.repeat), as the name suggests, will keep repeating a value that you specify. *Forever*. That means you should definitely not try to loop over it! However, you can use it when you need to `map` a function that takes multiple arguments, where some arguments come from a (finite) iterable sequence while at least one argument is the same every time. Consider the following excerpt of the example code from [the Monday afternoon session](https://colab.research.google.com/drive/10H5QM0jBiDZJdjqPlvBia4oZMnIbxvuy?usp=sharing) (edited for brevity):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HbbMbbNvckz7" + }, + "source": [ + "def centered_stars(center, width):\n", + " padding = center - width // 2\n", + " return ' ' * padding + '*' * width\n", + "\n", + "lines = []\n", + "for width in range(1, 6, 2):\n", + " lines.append(centered_stars(2, width))\n", + "\n", + "print('\\n'.join(lines))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YHOPuLOwh3Qx" + }, + "source": [ + "We can replace the loop by an expression using `map` and `repeat`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I8NMn7KUh-3S" + }, + "source": [ + "from itertools import repeat\n", + "\n", + "lines = map(centered_stars, repeat(2), range(1, 6, 2))\n", + "\n", + "print('\\n'.join(lines))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PW2498IlmijJ" + }, + "source": [ + "#### `partial`\n", + "\n", + "[`functools.partial`](https://docs.python.org/3/library/functools.html#functools.partial) takes a function plus any number of other arguments, and then returns a new version of the function to which those arguments have already been applied." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "i_Pjs-rGnfH2" + }, + "source": [ + "from functools import partial\n", + "\n", + "# center_2_stars is a version of centered_stars when the first\n", + "# parameter (`center`) is fixed to the value 2. This version\n", + "# accepts only one argument, `width`.\n", + "center_2_stars = partial(centered_stars, 2)\n", + "\n", + "center_2_stars(3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yCkaJitHnrXk" + }, + "source": [ + "While `functools.partial` does not operate on iterables by itself, it can be really useful when you want to adjust functions before you pass them to `filter`, `map` and the like. We could have used it instead of `itertools.repeat` to eliminate the loop from our triangle example:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jjr_D7jpoYul" + }, + "source": [ + "lines = map(center_2_stars, range(1, 6, 2))\n", + "\n", + "print('\\n'.join(lines))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IRRPl6piyQn8" + }, + "source": [ + "`partial` also works with keyword arguments. In some cases, this makes it possible to partially apply arguments out of order. This doesn't work for the operators, but there are some workarounds possible. Consider writing a function that subtracts `3` from whatever number you pass to it:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WT3OJFSr18MW" + }, + "source": [ + "def minus_3(number):\n", + " return number - 3\n", + "\n", + "minus_3(4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MVmN8nSt2Ow1" + }, + "source": [ + "It would be nice if we could skip writing a function ourselves and instead just combine `operator.sub` with `functools.partial`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DK8Y4dWHzY_J" + }, + "source": [ + "from operator import sub\n", + "from functools import partial\n", + "\n", + "minus_3 = partial(sub, b=3)\n", + "\n", + "minus_3(4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pQpjPTbb2oNd" + }, + "source": [ + "The above code doesn't work, but we can avoid the problem by adding `-3` instead of subtracting `+3`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "H3MwPyuF27vg" + }, + "source": [ + "from operator import add\n", + "from functools import partial\n", + "\n", + "minus_3 = partial(add, -3)\n", + "\n", + "minus_3(4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-LH3TiwCpNbp" + }, + "source": [ + "#### `reduce`\n", + "\n", + "[`functools.reduce`](https://docs.python.org/3/library/functools.html#functools.reduce) is for when you want to combine (or *reduce*) all values from a sequence to a single value. It accepts a function, an iterable and an optional starting value. If you don't supply a starting value, it will use the first value in the sequence instead.\n", + "\n", + "`reduce` keeps a work-in-progress value of sorts, which is often called the *accumulator*. The accumulator is initially set to the starting value. For every remaining value in the iterable sequence, `reduce` calls the function with two arguments: the accumulator and the value itself. The return value from the function becomes the new accumulator. After the last value in the sequence, the latest accumulator is returned as the final result.\n", + "\n", + "For illustration, here is how you might use `reduce` to reverse a string:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kN5Uw_iB6QE6" + }, + "source": [ + "from functools import reduce\n", + "\n", + "def prepend_letter(accumulator, next_letter):\n", + " return next_letter + accumulator\n", + "\n", + "def reverse_string(string):\n", + " # In this case, we reduce a sequence of characters to a new string.\n", + " return reduce(prepend_letter, string)\n", + "\n", + "reverse_string('abcdef')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AxfCOlrU7H75" + }, + "source": [ + "And here is how we could write our own implementations of the built-in functions `max` and `sum` using `reduce`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "P1CltqPc7UvG" + }, + "source": [ + "from functools import reduce\n", + "from operator import add\n", + "\n", + "def greater(a, b):\n", + " if a < b:\n", + " return b\n", + " else:\n", + " return a\n", + "\n", + "def max(iterable):\n", + " return reduce(greater, iterable)\n", + "\n", + "def sum(iterable):\n", + " return reduce(add, iterable)\n", + "\n", + "numbers = [3, 5, 4]\n", + "\n", + "print('max:', max(numbers))\n", + "print('sum:', sum(numbers))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_zR8E_94YHCv" + }, + "source": [ + "## Calculations\n", + "\n", + "As we have written in the course manual, Python is \"batteries included\"—it comes with a lot of functionality out of the box. This is certainly also the case for numerical computations and even some basic statistics. We highlight some common tools below.\n", + "\n", + "- The built-in functions [`abs`][abs], [`max`][max], [`min`][min], [`pow`][pow], [`round`][round] and [`sum`][sum] do exactly what the names suggest. You can also use the `**` operator for powers.\n", + "- The built-in [`range`][range] function, which we have seen before, lets you generate linear series of numbers.\n", + "- The [`math`][math] standard module contains a wealth of general mathematical functions and constants, such as [`log`][math.log], [`sqrt`][math.sqrt], [`cos`][math.cos], [`pi`][math.pi] and [`tau`][math.tau].\n", + "- The [`random`][random] standard module covers most random number generating needs, as well as random shuffling and sampling.\n", + "- The [`statistics`][statistics] standard module includes the common staples of statistical analysis, such as [`mean`][statistics.mean], [`median`][statistics.median], [`mode`][statistics.mode], [`stdev`][statistics.stdev], [`variance`][statistics.variance], [`covariance`][statistics.covariance], [`correlation`][statistics.correlation] and even a simple [`linear_regression`][statistics.linear_regression]. Unfortunately, however, the latter three are not available in Google Colab, because they were only recently added to the Python standard library and Colab is not running the latest (\"bleeding edge\") version of Python. The next two subsections offer some alternatives.\n", + "\n", + "A complete overview of numerical functionality in the Python standard library can be found [here][python-numeric].\n", + "\n", + "[abs]: https://docs.python.org/3/library/functions.html#abs\n", + "[max]: https://docs.python.org/3/library/functions.html#max\n", + "[min]: https://docs.python.org/3/library/functions.html#min\n", + "[pow]: https://docs.python.org/3/library/functions.html#pow\n", + "[range]: https://docs.python.org/3/library/functions.html#func-range\n", + "[round]: https://docs.python.org/3/library/functions.html#round\n", + "[sum]: https://docs.python.org/3/library/functions.html#sum\n", + "[math]: https://docs.python.org/3/library/math.html\n", + "[math.log]: https://docs.python.org/3/library/math.html#math.log\n", + "[math.sqrt]: https://docs.python.org/3/library/math.html#math.sqrt\n", + "[math.cos]: https://docs.python.org/3/library/math.html#math.cos\n", + "[math.pi]: https://docs.python.org/3/library/math.html#math.pi\n", + "[math.tau]: https://docs.python.org/3/library/math.html#math.tau\n", + "[random]: https://docs.python.org/3/library/random.html\n", + "[statistics]: https://docs.python.org/3/library/statistics.html\n", + "[statistics.mean]: https://docs.python.org/3/library/statistics.html#statistics.mean\n", + "[statistics.median]: https://docs.python.org/3/library/statistics.html#statistics.median\n", + "[statistics.mode]: https://docs.python.org/3/library/statistics.html#statistics.mode\n", + "[statistics.stdev]: https://docs.python.org/3/library/statistics.html#statistics.stdev\n", + "[statistics.variance]: https://docs.python.org/3/library/statistics.html#statistics.variance\n", + "[statistics.covariance]: https://docs.python.org/3/library/statistics.html#statistics.covariance\n", + "[statistics.correlation]: https://docs.python.org/3/library/statistics.html#statistics.correlation\n", + "[statistics.linear_regression]: https://docs.python.org/3/library/statistics.html#statistics.linear_regression\n", + "[python-numeric]: https://docs.python.org/3/library/numeric.html" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xoBLhOpvmu2P", + "outputId": "f5fedaf1-74d0-4382-e343-e717cf567fd9" + }, + "source": [ + "!python --version" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Python 3.7.12\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKxMNbMMuMCw" + }, + "source": [ + "### Computing covariance and correlation yourself\n", + "\n", + "Given two equally long sequences of numbers and their averages (which you might have already computed because you needed them elsewhere), you can compute the sample covariance and correlation as follows using [iterables](#scrollTo=Iterables):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "moprSI-g90tZ" + }, + "source": [ + "from itertools import repeat\n", + "from operator import sub, mul\n", + "from statistics import mean, stdev\n", + "\n", + "def differences(series, average):\n", + " return map(sub, series, repeat(average))\n", + "\n", + "def covariance(series1, series2, average1=None, average2=None):\n", + " differences1 = differences(series1, average1 or mean(series1))\n", + " differences2 = differences(series2, average2 or mean(series2))\n", + " products = map(mul, differences1, differences2)\n", + " return sum(products) / (len(series1) - 1)\n", + "\n", + "def correlation(series1, series2, average1=None, average2=None):\n", + " '''Pearson's correlation coefficient.'''\n", + " cov = covariance(series1, series2, average1, average2)\n", + " stdev1 = stdev(series1, average1)\n", + " stdev2 = stdev(series2, average2)\n", + " return cov / (stdev1 * stdev2)\n", + "\n", + "column1 = [1, 2, 3, 4, 5, 6, 7]\n", + "column2 = [4, 5, 6, 5, 5, 8, 9]\n", + "column3 = [8, 7, 6, 5, 4, 3, 2]\n", + "\n", + "print('covariance 1-2:', covariance(column1, column2))\n", + "print('correlation 1-2:', correlation(column1, column2))\n", + "print('correlation 2-1:', correlation(column2, column1))\n", + "print('correlation 1-3:', correlation(column1, column3))\n", + "print('correlation 2-3:', correlation(column2, column3))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JYTbShRDBoS1" + }, + "source": [ + "### Using covariance and correlation from an external package\n", + "\n", + "[pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) provides implementations of [covariance](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#covariance), [correlation](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#correlation) and many other statistical functions. If you want to do serious statistics, you should probably use pandas or some other third-party package that can do the heavy lifting for you. If you choose this path, head over to [dataframes](#scrollTo=pandas_dataframes_and_read_csv) first." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F6mIIM3zLw1p" + }, + "source": [ + "## Classes and objects\n", + "\n", + "For the final exercise, you do not need to create your own classes. However, since you may encounter the terminology and the notation when using third-party packages, here is a *very* quick explanation.\n", + "\n", + "An **object** is a value with internal structure. For example, I might have a variable with the name `jack` that holds a description of my friend Jack. The parts of that description are called its **attributes**. In this example, `jack.name` might hold the string `'Jack'` and `jack.phone` might hold the string `'+31612345678'` (not his real phone number ;-). `jack` is the object and `name` and `phone` are its attributes.\n", + "\n", + "Attributes can hold all types of values, including functions. In the latter case, the attribute is also called a **method**. For example, `jack.call()` might dial Jack's number. Attributes might also themselves be objects with their own nested attributes, so you can have chains of names connected with dots, for example `house.kitchen.sink`. In the latter dotted name, `kitchen` and `sink` are both attributes, and `house` and `kitchen` must both be objects, though `sink` might be an object as well.\n", + "\n", + "Whenever you see two names connected with a dot, the left one must be an object. You have already encountered a few types of objects. Every list has an `.append()` method, so lists are objects. When you call `csv.reader()`, the `csv` module is an object (though we usually don't use the words \"attribute\" and \"method\" to refer to the parts of a module; we rather say that `csv.reader` is a *qualified name*). In fact, *nearly everything* in Python is an object; this is why it is called an *object-oriented* programming language.\n", + "\n", + "Objects are generally created using a **class**. You can think of a class as a template for creating objects of a particular shape. For example, our `jack` object might have been created using the `Person` class. We say that `jack` is an **instance** of `Person` and also that we **instantiated** `Person` when we created `jack`. Typically, you can expect all instances of a class to have the same attributes and methods. Packages often organize their documentation by class, listing the methods of its instances and their usage.\n", + "\n", + "Instantiating a class looks exactly like calling a function and storing the return value in a variable. The only visible clue that you're instantiating a class rather than calling a function, is that classes usually have a name starting with an uppercase letter:\n", + "\n", + "```py\n", + "jack = Person(name='Jack')\n", + "```\n", + "\n", + "There is no danger in thinking of a class instantiation as a function call, or in instantiating a class without knowing it because its name starts with a lowercase letter. In reality, however, a class is not a function but an object!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MoyqHwBhvBTH" + }, + "source": [ + "## Working with times and calendar dates\n", + "\n", + "The [`datetime`][datetime] standard module provides tools for working with dates and times. It exports the `date`, `time` and `datetime` [classes](#scrollTo=Classes_and_objects), which let you represent a date, a time or both, exactly as the names suggest.\n", + "\n", + "[datetime]: https://docs.python.org/3/library/datetime.html" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W5ZlB-uXkC6S" + }, + "source": [ + "### Parsing dates from text\n", + "\n", + "If you are working with dates for the final course exercise, it is most likely that you are extracting date strings from a CSV. Such a string might, for example, look like `'2021/11/15'`. In some cases, you may need to extract specific information from those dates, such as the year, month, hour or even weekday. For such use cases, it is advisable to convert the date string to a `date`, `time` or `datetime` object first by *parsing* it.\n", + "\n", + "The [`datetime.strptime`][datetime.strptime] function can do this job for you. It takes two parameters: the date string that needs to be parsed, and a second string that describes the *format* of the date. Our example above consisted of four digits that identify the year, a slash, two digits that identify the month, another slash, and finally two digits that identify the day of the month. We write that as `'%Y/%m/%d'`. In such a format string, a `%` with a letter after it is a placeholder for a particular piece of the date or time, while all other characters simply represent themselves. You can find a list of all available placeholders [here][datetime-formats].\n", + "\n", + "[datetime.strptime]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior\n", + "[datetime-formats]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cOo_KnhWsR2m" + }, + "source": [ + "from datetime import datetime as dt\n", + "\n", + "yesterday_str = '2021/11/15'\n", + "date_format = '%Y/%m/%d'\n", + "\n", + "yesterday_obj = dt.strptime(yesterday_str, date_format)\n", + "print('datetime:', yesterday_obj)\n", + "\n", + "# dt.strptime always returns a full datetime, even if the input\n", + "# string and the format string contain only a date or only a time.\n", + "# You can reduce the datetime object to just a date or just a time\n", + "# by calling a method of the same name:\n", + "print('date: ', yesterday_obj.date())\n", + "print('time: ', yesterday_obj.time())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iDp5QxvpxZIo" + }, + "source": [ + "### Extracting information from date and time objects\n", + "\n", + "Once you have a `date`, `time`, or `datetime` object, extracting information from it is very easy, as we demonstrate below. [`datetime`][datetime.datetime] objects have all date-related attributes and methods of `date` as well as all time-related attributes and methods of `time`. You can find the most important attributes [here][datetime-attributes] and the most important methods [here][datetime-methods] (you can also scroll up from the latter link for some additional methods related to time zones).\n", + "\n", + "[datetime.datetime]: https://docs.python.org/3/library/datetime.html#datetime-objects\n", + "[datetime-attributes]: https://docs.python.org/3/library/datetime.html#datetime.datetime.year\n", + "[datetime-methods]: https://docs.python.org/3/library/datetime.html#datetime.datetime.utcoffset" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "M9pQ2otg0EQU" + }, + "source": [ + "# Year, month etcetera attributes are all represented as numbers.\n", + "print('year: ', yesterday_obj.year)\n", + "print('month: ', yesterday_obj.month)\n", + "print('hour: ', yesterday_obj.hour)\n", + "\n", + "# Python starts the week on Monday and starts numbering at zero.\n", + "print('weekday: ', yesterday_obj.weekday())\n", + "# The ISO 8601 standard also starts on Monday, but starts numbering at one.\n", + "print('isoweekday:', yesterday_obj.isoweekday())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GLBue3palj8M" + }, + "source": [ + "## Sorting\n", + "\n", + "Python has a built-in function [`sorted`][sorted], which can sort anything that you can loop over (including the key-value pairs of a [dictionary](#scrollTo=Dictionaries)). It always returns a list with the result.\n", + "\n", + "By default, it will sort ascending, i.e., by increasing order of magnitude. Numbers are compared by value, strings are compared lexicographically (with all uppercase letters sorting before all lowercase letters), and lists are compared by the first item, with subsequent items being used as tie breakers if previous items compared equal.\n", + "\n", + "[sorted]: https://docs.python.org/3/library/functions.html#sorted\n", + "[sorting-howto]: https://docs.python.org/3/howto/sorting.html#sortinghowto" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FIw_4XyNn9UK" + }, + "source": [ + "list_of_numbers = [5, 3, 4]\n", + "list_of_strings = ['Good', 'day', 'to', 'you']\n", + "list_of_lists = [\n", + " [6, 'zucchini'],\n", + " [5, 'eggs'],\n", + " [6, 'broccoli'],\n", + "]\n", + "\n", + "print(sorted(list_of_numbers))\n", + "print(sorted(list_of_strings))\n", + "print(sorted(list_of_lists))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_uzxS1S1setr" + }, + "source": [ + "### Sorting in descending order\n", + "\n", + "If you want to sort descending instead, pass the named argument `reverse=True`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e2RSQaXFszpT" + }, + "source": [ + "sorted(list_of_numbers, reverse=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YKx3ObxltgXz" + }, + "source": [ + "### Custom comparison\n", + "\n", + "If you want `sorted` to perform a different kind of comparison in order to decide on the order of the items, you can pass a function as the named `key` argument. This function should take one item as its parameter and return a new value that `sorted` will use for the comparison instead.\n", + "\n", + "Below, we use the function `str.lower` to do a case-insensitive sort:" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5_w-T2ryuknR", + "outputId": "d9f68252-3e93-48c0-fdf9-f001a107fdf2" + }, + "source": [ + "sorted(list_of_strings, key=str.lower)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['day', 'Good', 'to', 'you']" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "92yLh2OWvO-q" + }, + "source": [ + "The [`operator`][operator] standard module exports several useful functions that let you create instant simple functions for the purpose of sorting. Most importantly, [`itemgetter`][operator.itemgetter] lets you sort sequences by a different item than the first and [`attrgetter`][operator.attrgetter] lets you sort [objects](#scrollTo=Classes_and_objects) by a particular attribute. There is also [`methodcaller`][operator.methodcaller] which lets you sort by the result of a method call.\n", + "\n", + "Below, we use `itemgetter` to sort the key-value pairs of a [dictionary](#scrollTo=Dictionaries) by value instead of by key:\n", + "\n", + "[operator]: https://docs.python.org/3/library/operator.html#module-operator\n", + "[operator.itemgetter]: https://docs.python.org/3/library/operator.html#operator.itemgetter\n", + "[operator.attrgetter]: https://docs.python.org/3/library/operator.html#operator.attrgetter\n", + "[operator.methodcaller]: https://docs.python.org/3/library/operator.html#operator.methodcaller" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HigJMisJydem" + }, + "source": [ + "from operator import itemgetter\n", + "\n", + "example_dict = {'banana': 'yellow', 'cherry': 'sweet', 'date': 'wrinkly'}\n", + "\n", + "sorted(example_dict.items(), key=itemgetter(1))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P16V-uttzQXw" + }, + "source": [ + "And below, we use `attrgetter` to sort [dates](#scrollTo=Working_with_times_and_calendar_dates) by month:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jG1RMiBzzpky" + }, + "source": [ + "from operator import attrgetter\n", + "from datetime import date\n", + "\n", + "list_of_dates = [\n", + " date(year=2021, month=11, day=16),\n", + " date(year=2022, month=3, day=17),\n", + " date(year=2020, month=5, day=18),\n", + "]\n", + "\n", + "sorted(list_of_dates, key=attrgetter('month'))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7rsvpuMn1kSl" + }, + "source": [ + "## `pandas` dataframes and `read_csv`\n", + "\n", + "[pandas][pandas] is a package that provides general data structures and data analysis tools for Python. If you venture into statistics or datamining with Python, it is likely that you will sooner or later encounter [`pandas.DataFrame`][pandas.DataFrame] (which is a [class](#scrollTo=Classes_and_objects)). It holds tabular data in which each column might have a different type. Other packages often use this data structure, too.\n", + "\n", + "If you encounter `pandas` during the final course exercise, it is probably because you are using a function from a third-party package that expects you to pass in the data as a `DataFrame`. In this case, it is useful to know that `pandas` provides the [`read_csv`][pandas.read_csv] function. It accepts a file or file name as its first parameter and returns the contents of the CSV file as a `DataFrame`. You can then pass this object to the function that expects a `DataFrame`. The `read_csv` function also accepts a couple of optional parameters that let you specify details about the way the CSV file is formatted, its columns, etcetera.\n", + "\n", + "In the example code below, we illustrate how you can create a `DataFrame` using `read_csv`. Note that we define a [cross-platform path using `os.path`](#scrollTo=Cross_platform_file_paths). In the [next section](#scrollTo=Clustering_with_scikit_learn), we illustrate how the `data` object may be passed to a third-party analysis function.\n", + "\n", + "[pandas]: https://pandas.pydata.org/pandas-docs/stable/index.html\n", + "[pandas.DataFrame]: https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe\n", + "[pandas.read_csv]: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-read-csv-table" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 142 + }, + "id": "1m-b-UVLF_rM", + "outputId": "0b542154-2e25-4f71-c445-856836f0a749" + }, + "source": [ + "#requires pandas\n", + "\n", + "import os.path as op\n", + "import pandas\n", + "\n", + "file_path = op.join('sample_data', 'california_housing_test.csv')\n", + "\n", + "data = pandas.read_csv(file_path)\n", + "# `data` is an instance of pandas.DataFrame with several columns\n", + "# containing geographic center points of neighborhoods in\n", + "# California as well as demographics about the inhabitants and\n", + "# their houses.\n", + "\n", + "# You may sometimes want to pass only a subset of the dataset\n", + "# to a function. For this purpose, dataframes can be sliced in\n", + "# a way that is similar to lists. The following example will\n", + "# contain only the 'total_rooms' column:\n", + "data.loc[:, 'total_rooms']\n", + "\n", + "# The following example will include two columns, in a different\n", + "# order than they appeared in the CSV:\n", + "data.loc[:, ['households', 'population']]\n", + "# You can also use this notation if you want to use a subset of\n", + "# multiple columns.\n", + "\n", + "# For slicing rows by position, you use the `iloc` attribute\n", + "# instead of `loc`:\n", + "data.iloc[0:3]\n", + "\n", + "# Both ways of slicing can be combined:\n", + "data.loc[:, ['households', 'population']].iloc[0:3]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
householdspopulation
0606.01537.0
1277.0809.0
2495.01484.0
\n", + "
" + ], + "text/plain": [ + " households population\n", + "0 606.0 1537.0\n", + "1 277.0 809.0\n", + "2 495.0 1484.0" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EYVhwfCP2oEK" + }, + "source": [ + "## Clustering with scikit-learn\n", + "\n", + "[scikit-learn][sklearn] is a package that provides many data mining and machine learning tools, including cluster analysis. You can find the documentation [here][sklearn.cluster]. We give a very minimal example of hierarchical clustering with Ward linkage below. You can find a more extensive example [here][sklearn-example]. Note that we are using the `data` object, which is a `pandas.DataFrame`, from the [previous section](#scrollTo=pandas_dataframes_and_read_csv).\n", + "\n", + "[sklearn]: https://scikit-learn.org/stable/index.html\n", + "[sklearn.cluster]: https://scikit-learn.org/stable/modules/clustering.html\n", + "[sklearn-example]: https://scikit-learn.org/stable/auto_examples/cluster/plot_digits_linkage.html#sphx-glr-auto-examples-cluster-plot-digits-linkage-py" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "upo6gPd46sVt", + "outputId": "6938e3d2-8104-4f8c-fca6-f10b70d36dbb" + }, + "source": [ + "#requires sklearn\n", + "\n", + "from sklearn.cluster import AgglomerativeClustering\n", + "\n", + "# We start by creating the object that will *eventually* contain\n", + "# the clustering.\n", + "clustering = AgglomerativeClustering()\n", + "# Next, we feed in the data through the `fit` method. We will\n", + "# cluster the neighborhoods by geographical location.\n", + "clustering.fit(data.loc[:, ['latitude', 'longitude']])\n", + "# The clustering is now established. We illustrate below by\n", + "# printing the cluster assignment of the first 20 rows in the\n", + "# dataset.\n", + "print(clustering.labels_[:20])\n", + "# In a more serious application, we would probably inspect the\n", + "# cluster dendrogram or plot the data with a coloring to indicate\n", + "# the cluster assignment of each data point. The scikit-learn\n", + "# documentation explains how to do such things." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1]\n" + ] + } + ] + } + ] +} \ No newline at end of file