Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate individual sections with titles and prerequisites as needed #19

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
382b9f9
Obtain terms from new endpoint (#4)
nathangong Mar 15, 2023
97922d5
Fetch Data of a Given Term with Banner 9 API (#9)
samarth52 Mar 15, 2023
06c4a3b
Migrate Fetch Course Description & Prerequisites (#10)
nathangong Mar 15, 2023
7b39b89
Temporarily change NUM_TERMS for first load
Mar 15, 2023
c2f6d7c
Scrape summer and fall 22
Mar 15, 2023
971115d
Scrape summer and fall 2021
Mar 15, 2023
babf2b3
Scrape fall 20 and spring 21
Mar 15, 2023
6aef1d1
Revert back to default
Mar 15, 2023
03abbc7
Fix error where building was "null null" (#11)
nathangong Mar 16, 2023
7443eaa
Finals Time Migration to Crawler v2 (#12)
samarth52 Mar 16, 2023
b546bde
Create manual workflow to scrape only a specified term
Mar 17, 2023
7a3de13
Update Final Parsing Logic (#14)
nhatnghiho Mar 24, 2023
a8a3027
Fixed location bug (#15)
yatharth-b Mar 28, 2023
ccb9261
Group crawling workflows into the same concurrency group to prevent r…
Apr 22, 2023
0686405
Edit specified crawling to allow multiple terms
Apr 22, 2023
b26b21d
Fixed Error in Fetching Courses From Banner (#17)
samarth52 Apr 30, 2023
ca136ed
Update cookie-generating url
Jun 5, 2023
ccd1b7f
Change workflow's variable name to avoid overlapping with system's va…
Aug 3, 2023
b2db8f7
Update endpoint that fetches list of available terms
nhatnghiho Aug 3, 2023
f0f5f1f
Update README links
nhatnghiho Aug 15, 2023
cff88e9
Add Fall 2023 Final Exam matrix (#20)
twixupmysleeve Aug 22, 2023
8724dd9
Fixed finals matrix parsing bug (#21)
samarth52 Aug 22, 2023
a88b9fc
Update mkl-fft version
nhatnghiho Aug 31, 2023
bc7d458
Fix SSL Issue (#24)
yatharth-b Nov 2, 2023
10d2153
Fix ssl issue (#25)
yatharth-b Nov 2, 2023
3c90588
Fix SSL Issue Locally (#26)
samarth52 Nov 2, 2023
a795897
Completed section-specific prerequisite tracking - debugging required
aeluro1 Feb 4, 2024
991d3d5
Completed backend prerequisite fetching + attaching
aeluro1 Feb 7, 2024
539f5a2
Fixed minor typos
aeluro1 Feb 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
src/steps/prereqs/grammar/**/*
3 changes: 3 additions & 0 deletions .github/workflows/crawling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ on:
- main
schedule:
- cron: "*/30 * * * *"
concurrency:
group: crawling

jobs:
crawling:
Expand Down Expand Up @@ -42,6 +44,7 @@ jobs:
ALWAYS_SCRAPE_CURRENT_TERM: 1
DETAILS_CONCURRENCY: 256
DATA_FOLDER: ./data
NODE_EXTRA_CA_CERTS: ${{ github.workspace }}/intermediate.pem

- name: Revision
run: python ./src/Revise.py
Expand Down
64 changes: 64 additions & 0 deletions .github/workflows/specified_crawling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Specified Crawling
on:
workflow_dispatch:
inputs:
term:
description: 'Enter terms to scrape, separated by commas'
type: string
required: true
concurrency:
group: crawling

jobs:
crawling:
concurrency: ci-${{ github.ref }}
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
persist-credentials: false

- name: Checkout data
uses: actions/checkout@v2
with:
persist-credentials: false
ref: gh-pages
path: ./data

- name: Install
run: yarn install --frozen-lockfile

- name: Pip
uses: actions/setup-python@v4
with:
python-version: '3.9'
cache: 'pip' # caching pip dependencies

- name: Pip Install
run: pip install -r requirements.txt

- name: Crawling
run: yarn start
env:
LOG_FORMAT: json
NUM_TERMS: 1
SPECIFIED_TERM: ${{ inputs.term }}
ALWAYS_SCRAPE_CURRENT_TERM: 0
DETAILS_CONCURRENCY: 256
DATA_FOLDER: ./data
NODE_EXTRA_CA_CERTS: ${{ github.workspace }}/intermediate.pem

- name: Revision
run: python ./src/Revise.py

- name: Upload
uses: JamesIves/github-pages-deploy-action@releases/v4
with:
token: ${{ secrets.CRAWLER_DEPLOY_PERSONAL_ACCESS_TOKEN }}
branch: gh-pages
folder: ./data
clean: true
single-commit: true
git-config-name: gt-scheduler-bot
git-config-email: [email protected]
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ data
.DS_Store
.antlr
*.log
.vscode
*/__pycache__
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> A periodic web crawler to feed course data into [GT Scheduler](https://bitsofgood.org/scheduler).

Sample: [202008.json](https://gt-scheduler.github.io/crawler/202008.json)
Sample: [202302.json](https://gt-scheduler.github.io/crawler-v2/202302.json)

To report a bug or request a new feature, please [create a new Issue in the GT Scheduler website repository](https://github.com/gt-scheduler/website/issues/new/choose).

Expand Down Expand Up @@ -93,7 +93,7 @@ The Registrar publishes a PDF with the Finals schedule at the start of each seme
The page with the PDF for the Fall 2022 semester can be found [here](https://registrar.gatech.edu/info/final-exam-matrix-fall-2022)

The `matrix.json` file contains a mapping from term to the pdf file.
<br>The key is one of the terms identified by the scraper [here](https://gt-scheduler.github.io/crawler/index.json).
<br>The key is one of the terms identified by the scraper [here](https://gt-scheduler.github.io/crawler-v2/index.json).
<br>The value is the direct address for the PDF file such as [this](https://registrar.gatech.edu/files/202208%20Final%20Exam%20Matrix.pdf)

This mapping needs to be updated each semester when a new schedule is posted
Expand Down
133 changes: 133 additions & 0 deletions intermediate.pem
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
83:5b:76:15:20:6d:2d:6e:09:7e:0b:6e:40:9f:ef:c0
Signature Algorithm: sha384WithRSAEncryption
Issuer: C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
Validity
Not Before: Nov 16 00:00:00 2022 GMT
Not After : Nov 15 23:59:59 2032 GMT
Subject: C = US, O = Internet2, CN = InCommon RSA Server CA 2
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (3072 bit)
Modulus:
00:89:f0:5c:c4:38:ba:d0:34:57:af:97:55:a0:f4:
22:43:fc:3e:18:11:3a:db:6d:7a:52:21:06:31:d6:
d4:b7:b7:92:88:86:85:8f:f8:99:ff:18:85:a2:9d:
2b:5a:e1:f8:04:21:49:de:44:af:40:5f:9a:22:11:
2c:3a:7b:97:47:a9:95:89:2a:54:c7:9d:c7:33:90:
29:23:31:48:55:b7:78:1a:a6:3a:b6:0c:1a:3f:3b:
bf:5d:12:3f:e0:39:b3:fa:1a:0b:5b:f8:bf:cc:3d:
7d:89:7b:d2:f7:9a:9f:35:4f:2a:3f:bf:f7:fd:44:
9f:db:f5:4d:49:43:66:b8:c2:a5:69:18:30:92:8b:
ae:7b:4b:ac:89:d6:0a:ed:5f:16:df:37:be:ad:31:
6f:59:1d:89:b5:62:8d:4c:89:dc:37:25:83:dc:68:
55:cb:fe:c6:d3:d3:f0:4c:0b:bb:87:4a:aa:47:24:
e4:11:32:df:fb:3e:c5:5a:d7:3c:73:5d:9f:f9:27:
ef:98:a1:ca:15:5a:8a:a4:d3:ed:80:c9:2b:c2:ac:
1a:3a:03:8e:0f:84:34:d0:08:a1:55:3f:94:cc:9e:
8c:9a:13:4f:1a:0f:bf:5d:fd:01:6a:f9:97:28:21:
83:4e:fe:6e:cd:07:8e:74:3d:f9:a3:f6:70:d7:a5:
78:0b:82:78:b6:88:f5:58:b6:3b:86:45:61:af:32:
86:f9:45:89:89:29:fc:1e:fd:dd:51:38:f8:76:49:
de:24:13:50:ad:47:dc:21:f4:c7:57:78:02:b4:ac:
17:9f:57:97:9a:bc:61:1f:eb:56:bb:d4:55:c2:c0:
de:81:11:b4:b3:6f:0e:31:d5:5e:3b:09:63:66:f6:
2b:52:34:68:9a:eb:4d:3b:91:b3:ca:7b:de:57:12:
55:0a:7d:c2:6e:7e:da:73:82:fe:e6:fc:0f:36:0b:
34:e0:37:4e:00:6c:cd:61:d1:b9:b7:aa:f2:c9:83:
e8:b1:22:c7:d8:1f:2a:0c:dc:f1
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
keyid:53:79:BF:5A:AA:2B:4A:CF:54:80:E1:D8:9B:C0:9D:F2:B2:03:66:CB

X509v3 Subject Key Identifier:
EF:4C:00:92:A6:FB:76:2E:5E:95:E2:C9:5F:87:1B:19:D5:4D:E2:D9
X509v3 Key Usage: critical
Digital Signature, Certificate Sign, CRL Sign
X509v3 Basic Constraints: critical
CA:TRUE, pathlen:0
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.6449.1.2.2.103
Policy: 2.23.140.1.2.2

X509v3 CRL Distribution Points:

Full Name:
URI:http://crl.usertrust.com/USERTrustRSACertificationAuthority.crl

Authority Information Access:
CA Issuers - URI:http://crt.usertrust.com/USERTrustRSAAAACA.crt
OCSP - URI:http://ocsp.usertrust.com

Signature Algorithm: sha384WithRSAEncryption
26:80:0d:34:e4:1e:ae:22:be:af:3e:a6:e2:84:f9:c6:b7:25:
b1:f7:db:2f:a8:75:c2:6a:82:ac:c3:b6:ce:5b:82:c6:a9:06:
cc:11:63:2a:63:99:72:de:97:5d:50:d9:4e:b0:af:24:a5:76:
52:23:05:10:d9:f0:08:7c:34:eb:3c:e4:0e:8c:28:94:0b:69:
4f:6a:1f:34:72:1b:ac:36:51:04:f3:47:0c:76:b1:e6:37:d0:
c9:2c:dd:97:48:7b:da:e3:b3:9a:c4:62:58:88:3a:1f:43:c3:
2f:30:51:32:71:5f:39:98:7f:f0:35:1a:4a:78:24:9a:74:c4:
88:42:55:1d:60:09:23:97:e4:95:ba:d7:ce:64:c2:27:76:e3:
66:ec:2e:6d:2f:09:00:40:03:fa:d0:83:1b:cb:a4:8b:59:84:
2f:54:4b:fa:f7:de:58:2d:5f:f7:18:17:30:78:8c:63:9d:f9:
7b:36:b0:40:14:94:6c:ae:f2:0a:cb:a2:16:21:92:05:8d:ea:
1a:b2:a0:57:4e:a6:6a:e5:f3:2b:bb:09:21:95:ee:09:95:41:
ff:6f:8b:05:41:0c:82:a6:fb:6c:cb:0e:8f:e7:85:19:24:f3:
10:34:05:bd:41:a8:fc:f2:6c:f1:12:49:58:78:cb:9a:d9:e5:
bc:c1:e0:ba:36:60:dd:3a:d4:75:7d:f8:70:e7:9c:80:c1:7d:
f3:48:89:c0:02:76:fe:09:1b:21:9f:a5:b4:ba:c6:c8:b7:50:
23:75:e7:2a:5a:1b:8d:cf:26:a4:34:52:70:50:0e:e4:7a:d2:
2a:35:02:97:92:36:46:21:91:a1:d0:f5:39:3f:d0:2e:00:f8:
43:37:31:6f:ca:16:e5:39:dd:e1:cb:56:55:fd:b2:cd:62:1b:
60:09:7d:59:2d:69:9d:a5:fd:26:d8:ee:9c:bc:25:46:0c:90:
bf:e3:a9:90:51:8c:d9:03:ea:ca:ec:9a:92:7a:ba:d5:0c:98:
09:6d:ee:6d:7e:71:35:fc:eb:f5:44:05:ce:43:a7:d5:5f:b8:
3e:a1:35:b3:4a:0d:28:3b:63:1c:84:55:a0:6a:04:4b:4d:e5:
da:69:8f:8c:52:88:2a:ec:e8:bc:4b:1e:73:68:de:b1:bc:54:
94:5f:35:54:1d:80:56:cc:6f:b7:4e:20:1a:24:92:5c:df:99:
4e:bd:95:2d:24:83:2c:f6:99:93:09:99:6d:86:fe:18:44:75:
d7:49:58:78:77:15:c2:e2:d8:c6:9e:62:23:95:44:5a:cb:1e:
d2:6f:5c:47:5f:d9:a1:1a:67:42:ce:6f:65:e8:df:33:ba:04:
9b:e3:5e:57:6f:db:0a:0d
-----BEGIN CERTIFICATE-----
MIIGSjCCBDKgAwIBAgIRAINbdhUgbS1uCX4LbkCf78AwDQYJKoZIhvcNAQEMBQAw
gYgxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpOZXcgSmVyc2V5MRQwEgYDVQQHEwtK
ZXJzZXkgQ2l0eTEeMBwGA1UEChMVVGhlIFVTRVJUUlVTVCBOZXR3b3JrMS4wLAYD
VQQDEyVVU0VSVHJ1c3QgUlNBIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTIy
MTExNjAwMDAwMFoXDTMyMTExNTIzNTk1OVowRDELMAkGA1UEBhMCVVMxEjAQBgNV
BAoTCUludGVybmV0MjEhMB8GA1UEAxMYSW5Db21tb24gUlNBIFNlcnZlciBDQSAy
MIIBojANBgkqhkiG9w0BAQEFAAOCAY8AMIIBigKCAYEAifBcxDi60DRXr5dVoPQi
Q/w+GBE62216UiEGMdbUt7eSiIaFj/iZ/xiFop0rWuH4BCFJ3kSvQF+aIhEsOnuX
R6mViSpUx53HM5ApIzFIVbd4GqY6tgwaPzu/XRI/4Dmz+hoLW/i/zD19iXvS95qf
NU8qP7/3/USf2/VNSUNmuMKlaRgwkouue0usidYK7V8W3ze+rTFvWR2JtWKNTInc
NyWD3GhVy/7G09PwTAu7h0qqRyTkETLf+z7FWtc8c12f+SfvmKHKFVqKpNPtgMkr
wqwaOgOOD4Q00AihVT+UzJ6MmhNPGg+/Xf0BavmXKCGDTv5uzQeOdD35o/Zw16V4
C4J4toj1WLY7hkVhrzKG+UWJiSn8Hv3dUTj4dkneJBNQrUfcIfTHV3gCtKwXn1eX
mrxhH+tWu9RVwsDegRG0s28OMdVeOwljZvYrUjRomutNO5GzynveVxJVCn3Cbn7a
c4L+5vwPNgs04DdOAGzNYdG5t6ryyYPosSLH2B8qDNzxAgMBAAGjggFwMIIBbDAf
BgNVHSMEGDAWgBRTeb9aqitKz1SA4dibwJ3ysgNmyzAdBgNVHQ4EFgQU70wAkqb7
di5eleLJX4cbGdVN4tkwDgYDVR0PAQH/BAQDAgGGMBIGA1UdEwEB/wQIMAYBAf8C
AQAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMCIGA1UdIAQbMBkwDQYL
KwYBBAGyMQECAmcwCAYGZ4EMAQICMFAGA1UdHwRJMEcwRaBDoEGGP2h0dHA6Ly9j
cmwudXNlcnRydXN0LmNvbS9VU0VSVHJ1c3RSU0FDZXJ0aWZpY2F0aW9uQXV0aG9y
aXR5LmNybDBxBggrBgEFBQcBAQRlMGMwOgYIKwYBBQUHMAKGLmh0dHA6Ly9jcnQu
dXNlcnRydXN0LmNvbS9VU0VSVHJ1c3RSU0FBQUFDQS5jcnQwJQYIKwYBBQUHMAGG
GWh0dHA6Ly9vY3NwLnVzZXJ0cnVzdC5jb20wDQYJKoZIhvcNAQEMBQADggIBACaA
DTTkHq4ivq8+puKE+ca3JbH32y+odcJqgqzDts5bgsapBswRYypjmXLel11Q2U6w
rySldlIjBRDZ8Ah8NOs85A6MKJQLaU9qHzRyG6w2UQTzRwx2seY30Mks3ZdIe9rj
s5rEYliIOh9Dwy8wUTJxXzmYf/A1Gkp4JJp0xIhCVR1gCSOX5JW6185kwid242bs
Lm0vCQBAA/rQgxvLpItZhC9US/r33lgtX/cYFzB4jGOd+Xs2sEAUlGyu8grLohYh
kgWN6hqyoFdOpmrl8yu7CSGV7gmVQf9viwVBDIKm+2zLDo/nhRkk8xA0Bb1BqPzy
bPESSVh4y5rZ5bzB4Lo2YN061HV9+HDnnIDBffNIicACdv4JGyGfpbS6xsi3UCN1
5ypaG43PJqQ0UnBQDuR60io1ApeSNkYhkaHQ9Tk/0C4A+EM3MW/KFuU53eHLVlX9
ss1iG2AJfVktaZ2l/SbY7py8JUYMkL/jqZBRjNkD6srsmpJ6utUMmAlt7m1+cTX8
6/VEBc5Dp9VfuD6hNbNKDSg7YxyEVaBqBEtN5dppj4xSiCrs6LxLHnNo3rG8VJRf
NVQdgFbMb7dOIBokklzfmU69lS0kgyz2mZMJmW2G/hhEdddJWHh3FcLi2MaeYiOV
RFrLHtJvXEdf2aEaZ0LOb2Xo3zO6BJvjXldv2woN
-----END CERTIFICATE-----
Loading