Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Python 3.11 #70

Open
wants to merge 75 commits into
base: staging
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
1e0d83b
fixed initial python versioning + dependency issues
zacrh Feb 18, 2025
4319a12
in a decently working state
zacrh Feb 22, 2025
94542f1
fix CONTRIBUTING.md admin panel instructions to reflect Django 5+
zacrh Feb 22, 2025
303495c
Fix static files (on dev at least)
zacrh Feb 22, 2025
439678b
fix hijacking
zacrh Feb 22, 2025
011a701
remove comment
zacrh Feb 24, 2025
6ee293f
more fixes
zacrh Feb 24, 2025
d849f9e
django 5 migrations + added testing domain to allowed_hosts
zacrh Feb 24, 2025
8fb1414
update test domain
zacrh Feb 24, 2025
78d26bf
try to fix static
zacrh Feb 24, 2025
8aa78e3
fix hijack stylesheet
zacrh Feb 24, 2025
524a8b7
revert pipeline setting
zacrh Feb 24, 2025
af3be9e
pls work now
zacrh Feb 24, 2025
51574f3
add yuglify to pipeline config
zacrh Feb 24, 2025
fdc9fe0
try again
zacrh Feb 24, 2025
0cc9857
this should not work
zacrh Feb 24, 2025
abadf89
deploy w nodejs buildpack
zacrh Feb 24, 2025
c30a459
try this
zacrh Feb 24, 2025
693b68c
try again
zacrh Feb 24, 2025
f9b3fb1
pls
zacrh Feb 24, 2025
61aa723
(slightly) giving up on getting pipeline to work — assuming the issue…
zacrh Feb 25, 2025
9c3f55e
fix old pyreact to work w/ python 3
zacrh Feb 25, 2025
436304f
try this
zacrh Feb 25, 2025
256376b
try django's updated storage instructions for v5.1
zacrh Feb 25, 2025
c7d9b05
update nodejs version being installed
zacrh Feb 25, 2025
6044d16
move nodejs installation to pre_compile script
zacrh Feb 25, 2025
89fb626
update jquery highlight to work with node 18 and yuglify in strict mode
zacrh Feb 25, 2025
8f8bcb6
try updating jquery version to 3.7.1
zacrh Feb 25, 2025
d143d80
check if it's a jquery issue
zacrh Feb 25, 2025
e366711
Revert "check if it's a jquery issue"
zacrh Feb 25, 2025
862560e
delete sourcemappingURL line from bootstrap.min.css
zacrh Feb 25, 2025
e9fc85e
clean up setting.py
zacrh Feb 25, 2025
67b6065
try to use thinner pyreact fork + freeze version at commit hash
zacrh Feb 25, 2025
248f288
revert back to other fork of pyreact since it includes a newer JSXTra…
zacrh Feb 25, 2025
220fc20
for some reason that broke everything
zacrh Feb 25, 2025
aa9a7e4
upgrade to bootstrap v5.3.3
zacrh Feb 25, 2025
898eae0
remove sourcemappings for bootstrap files
zacrh Feb 25, 2025
7d5315f
Revert "remove sourcemappings for bootstrap files"
zacrh Feb 25, 2025
65bd848
Revert "upgrade to bootstrap v5.3.3"
zacrh Feb 25, 2025
15f872a
Reapply "upgrade to bootstrap v5.3.3"
zacrh Feb 25, 2025
d693b38
Reapply "remove sourcemappings for bootstrap files"
zacrh Feb 25, 2025
14e9d8f
Moved UI to Bootstrap 5
zacrh Feb 25, 2025
7847f68
fix dropdowns and collapsing not working
zacrh Feb 26, 2025
3161008
Revert "fix dropdowns and collapsing not working"
zacrh Feb 26, 2025
eb7a667
Revert "Moved UI to Bootstrap 5"
zacrh Feb 26, 2025
82c7340
Reapply "Moved UI to Bootstrap 5"
zacrh Feb 26, 2025
6b4470c
try this?
zacrh Feb 26, 2025
dbd1067
Reapply "fix dropdowns and collapsing not working"
zacrh Feb 26, 2025
1dd7f98
try to see if we can cache requirements like this
zacrh Feb 26, 2025
ef57d9e
bring back post_compile step? idk
zacrh Feb 26, 2025
0938e1b
last try
zacrh Feb 26, 2025
464b1cb
fix settings / post_compile
zacrh Feb 26, 2025
5d8b6cc
mayve?
zacrh Feb 26, 2025
de629aa
try this again
zacrh Feb 26, 2025
0aaa49e
go back to normal
zacrh Feb 26, 2025
cff5d95
make bin/python dir at pre_compile step bc compile hasn't run yet and…
zacrh Feb 26, 2025
cadb8ff
try to fix npm issue
zacrh Feb 26, 2025
8afb585
try this?
zacrh Feb 26, 2025
b799e11
try adding binary
zacrh Feb 26, 2025
ac924e8
this should work?
zacrh Feb 26, 2025
49eb964
maybe
zacrh Feb 26, 2025
421bb25
clean up heroku build files
zacrh Feb 26, 2025
58a5f9b
now test build step without explicitly setting binary
zacrh Feb 26, 2025
a6a508f
add explicit path back
zacrh Feb 26, 2025
0043718
clean up comments in timetable crawler
zacrh Feb 26, 2025
78b69ec
add pub sub back in preparation for pr on main
zacrh Feb 26, 2025
4e45924
update contributing guidelines / make installation instructions clearer
zacrh Feb 26, 2025
6ff604a
remove (occasionally) inaccurate comments about timetable structure i…
zacrh Feb 26, 2025
4629591
try explicitly setting yuglify location one more time
zacrh Feb 26, 2025
37681db
Revert "add pub sub back in preparation for pr on main"
zacrh Feb 26, 2025
94d687b
Revert "try explicitly setting yuglify location one more time"
zacrh Feb 26, 2025
f5a728b
Reapply "add pub sub back in preparation for pr on main"
zacrh Feb 26, 2025
cdc51a6
fix root_files 404ing
zacrh Feb 28, 2025
8cd3a2b
comment out pub sub for new change on prod (temporary)
zacrh Feb 28, 2025
dbabfcb
Revert "comment out pub sub for new change on prod (temporary)"
zacrh Feb 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Feel free to email <a href="mailto:[email protected]">[email protected]<
Local Setup (macOS or OS X)
-----------------
#### Installation
* Use Python 2.7.16
* Use Python 3.11.4 (any 3.11 version should work).
* Install [Homebrew](http://brew.sh/), [node.js](https://nodejs.org/en/), and Postgres (we recommend [Postgres.app](http://postgresapp.com/) with their [CLI Tools](http://postgresapp.com/documentation/cli-tools.html)).
* Install the [Heroku CLI](https://cli.heroku.com). You don't need a Heroku account, they just offer good tools for configuration.
* Install Redis using `brew install redis`.
Expand All @@ -18,23 +18,23 @@ Local Setup (macOS or OS X)
* Run `easy_install pip` if you do not have pip.
* Run `pip install virtualenv` if you do not have virtualenv.
* Run `virtualenv venv` to create a Python virtual environment.
* Run `createdb layuplist`.
* Run `createdb layuplist` (installed with Postgres.app's [CLI Tools](https://postgresapp.com/documentation/cli-tools.html) in step 2)
* [Clone](https://help.github.com/articles/cloning-a-repository/) the main repository. `git clone https://github.com/layuplist/layup-list.git`.
* Create a `.env` file in the root directory of the repository (fill out the items in brackets):

```bash
DATABASE_URL=postgres://[YOUR_USERNAME]@localhost:5432/layuplist
REDIS_URL=redis://[YOUR_USERNAME]@localhost:6379
SECRET_KEY=[SOME_LONG_RANDOM_STRING]
SECRET_KEY=[SOME_LONG_RANDOM_STRING] # generate one at https://generate-secret.vercel.app/64
DEBUG=True
CURRENT_TERM=20X
CURRENT_TERM=25S
OFFERINGS_THRESHOLD_FOR_TERM_UPDATE=100
```

* Run `source ./scripts/dev/environment.sh` to set up the heroku development environment.
* Run `source ./scripts/dev/virtualize.sh` to activate the virtual environment.
* Install Python dependencies using `pip install -r requirements.txt`.
* Initialize the database with `python manage.py migrate`.
* Initialize the database with `python manage.py migrate` (must have Postgres running, see below).

Developing
----------
Expand Down Expand Up @@ -84,7 +84,7 @@ u.save()
If you'd like to have access to the admin panel at `/admin`, also run these before `u.save()`:
```python
u.is_staff = True
u.is_admin = True
u.is_superuser = True
```

Linting code and Running Tests
Expand Down
9 changes: 8 additions & 1 deletion apps/analytics/templates/eligible_for_recommendations.html
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,14 @@ <h1> Eligible For Recommendations ({{ users_and_votes | length }})</h1>
<tr>
<td>{{ vote_count }}</td>
<td>{{ user }}</td>
<td><a href="/hijack/{{ user_id }}">Hijack {{ user }}</a></td>
<td>
<form action="{% url 'hijack:acquire' %}" method="POST">
{% csrf_token %}
<input type="hidden" name="user_pk" value="{{ user_id }}">
<button type="submit">Hijack {{ user }}</button>
<input type="hidden" name="next" value="{{ request.path }}">
</form>
</td>
</tr>
{% endfor %}
</tbody>
Expand Down
2 changes: 1 addition & 1 deletion apps/recommendations/admin.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from django.contrib import admin
from models import Recommendation
from apps.recommendations.models import Recommendation

admin.site.register(Recommendation)
4 changes: 2 additions & 2 deletions apps/recommendations/migrations/0001_initial.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ class Migration(migrations.Migration):
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('creator', models.CharField(choices=[('docsim', 'Document Similarity')], max_length=16)),
('weight', models.FloatField(null=True)),
('course', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, related_name='recommendations', to='web.Course')),
('recommendation', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, related_name='recommenders', to='web.Course')),
('course', models.ForeignKey(on_delete=models.CASCADE, related_name='recommendations', to='web.Course')),
('recommendation', models.ForeignKey(on_delete=models.CASCADE, related_name='recommenders', to='web.Course')),
],
),
]
7 changes: 3 additions & 4 deletions apps/recommendations/migrations/0002_auto_20160810_0506.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@

import datetime
from django.db import migrations, models
from django.utils.timezone import utc

from datetime import timezone # django 5 removed their timezone package since python 3 has it built in

class Migration(migrations.Migration):

Expand All @@ -17,13 +16,13 @@ class Migration(migrations.Migration):
migrations.AddField(
model_name='recommendation',
name='created_at',
field=models.DateTimeField(auto_now_add=True, default=datetime.datetime(2016, 8, 10, 5, 6, 8, 21139, tzinfo=utc)),
field=models.DateTimeField(auto_now_add=True, default=datetime.datetime(2016, 8, 10, 5, 6, 8, 21139, tzinfo=timezone.utc)),
preserve_default=False,
),
migrations.AddField(
model_name='recommendation',
name='updated_at',
field=models.DateTimeField(auto_now=True, default=datetime.datetime(2016, 8, 10, 5, 6, 14, 645022, tzinfo=utc)),
field=models.DateTimeField(auto_now=True, default=datetime.datetime(2016, 8, 10, 5, 6, 14, 645022, tzinfo=timezone.utc)),
preserve_default=False,
),
]
6 changes: 3 additions & 3 deletions apps/recommendations/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,16 +71,16 @@ class Recommendation(models.Model):
(DOCUMENT_SIMILARITY, "Document Similarity"),
)

course = models.ForeignKey("web.Course", related_name="recommendations")
course = models.ForeignKey("web.Course", related_name="recommendations", on_delete=models.CASCADE)
recommendation = models.ForeignKey(
"web.Course", related_name="recommenders")
"web.Course", related_name="recommenders", on_delete=models.CASCADE)

creator = models.CharField(max_length=16, choices=CREATORS)
weight = models.FloatField(null=True)

created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)

def __unicode__(self):
def __str__(self):
return "{} {} -> {}".format(
self.weight, self.course.short_name(), self.recommendation)
30 changes: 15 additions & 15 deletions apps/recommendations/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
@task_utils.email_if_fails
def generate_course_description_similarity_recommendations():
t0 = time()
print "loading word jumbles into memory..."
print("loading word jumbles into memory...")
course_ids = []
reverse_course_ids = {}
course_descriptions = []
Expand All @@ -47,36 +47,36 @@ def generate_course_description_similarity_recommendations():
word_jumble.append(_clean_text_to_raw_words(review.comments))
course_descriptions.append(" ".join(word_jumble))
i += 1
print "finished in {}".format(time() - t0)
print("finished in {}".format(time() - t0))

t0 = time()
print "fitting to count vectorizer..."
print("fitting to count vectorizer...")
count_vect = CountVectorizer()
corpus = count_vect.fit_transform(course_descriptions)
print "shape is {}".format(corpus.shape)
print "finished in {}".format(time() - t0)
print("shape is {}".format(corpus.shape))
print("finished in {}".format(time() - t0))

# words -> indices
# print count_vect.vocabulary_

if PERFORM_TFIDF:
t0 = time()
print "tfidf transform..."
print("tfidf transform...")
tfidf_transformer = TfidfTransformer()
corpus = tfidf_transformer.fit_transform(corpus)
print "shape is {}".format(corpus.shape)
print "finished in {}".format(time() - t0)
print("shape is {}".format(corpus.shape))
print("finished in {}".format(time() - t0))

# TODO: try applying PCA, see if it improves performance

t0 = time()
print "compute cosine similarity "
print("compute cosine similarity ")
pairwise_similarity = corpus * corpus.T
print "shape is {}".format(pairwise_similarity.shape)
print "finished in {}".format(time() - t0)
print("shape is {}".format(pairwise_similarity.shape))
print("finished in {}".format(time() - t0))

t0 = time()
print "calculating and creating recommendations..."
print("calculating and creating recommendations...")
psarray = pairwise_similarity.toarray()

# zero out columns corresponding to thesis, research, independent, and grad
Expand All @@ -98,7 +98,7 @@ def generate_course_description_similarity_recommendations():
# zero out crosslistings and same titles, so only one rep for each
# crosslisting
covered_ids = set()
for i in xrange(psarray.shape[1]):
for i in range(psarray.shape[1]):
if i in covered_ids:
continue
course_id = course_ids[i]
Expand All @@ -116,7 +116,7 @@ def generate_course_description_similarity_recommendations():
covered_ids.add(i)

recommendations_to_create = []
for i in xrange(psarray.shape[0]):
for i in range(psarray.shape[0]):
current_class = Course.objects.get(id=course_ids[i])

# zero out the diagonal
Expand Down Expand Up @@ -155,7 +155,7 @@ def generate_course_description_similarity_recommendations():
creator=Recommendation.DOCUMENT_SIMILARITY).delete()
Recommendation.objects.bulk_create(recommendations_to_create)

print "finished in {}".format(time() - t0)
print("finished in {}".format(time() - t0))


def _clean_text_to_raw_words(text):
Expand Down
10 changes: 6 additions & 4 deletions apps/spider/crawlers/medians.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import sys
import urllib2
import urllib.request
import urllib.parse
import functools
from bs4 import BeautifulSoup

from apps.web.models import Course, CourseMedian
Expand All @@ -25,7 +27,7 @@ def crawl_median_page_urls():

def _retrieve_term_medians_urls_from_soup(soup):
return [
urllib2.urlparse.urljoin("http://www.dartmouth.edu", a["href"])
urllib.parse.urljoin("http://www.dartmouth.edu", a["href"])
for a in soup.find_all("a", href=True)
if _is_term_page_url(a["href"])
]
Expand All @@ -41,7 +43,7 @@ def crawl_term_medians_for_url(url):
table_rows = soup.find("table").find("tbody").find_all("tr")
medians = [
_convert_table_row_to_dict(table_row) for table_row in table_rows]
medians.sort(cmp=_median_dict_sorter)
medians.sort(key=functools.cmp_to_key(_median_dict_sorter))
return medians


Expand Down Expand Up @@ -103,7 +105,7 @@ def import_median(median_data):
subnumber=median_data["course"]["subnumber"],
)
except Course.DoesNotExist:
print "Could not find course for {}".format(median_data["course"])
print("Could not find course for {}".format(median_data["course"]))
return
median, _ = CourseMedian.objects.update_or_create(
course=course,
Expand Down
14 changes: 7 additions & 7 deletions apps/spider/crawlers/orc.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from datetime import datetime
import re
from urllib2 import urlparse
from urllib.parse import urlparse, urljoin

from apps.web.models import Course
from apps.spider.utils import (
Expand All @@ -11,11 +11,11 @@


BASE_URL = "http://dartmouth.smartcatalogiq.com/"
ORC_BASE_URL = urlparse.urljoin(BASE_URL, "/en/current/orc/")
ORC_BASE_URL = urljoin(BASE_URL, "/en/current/orc/")
ORC_UNDERGRAD_SUFFIX = "Departments-Programs-Undergraduate"
ORC_GRADUATE_SUFFIX = "Departments-Programs-Graduate"
UNDERGRAD_URL = urlparse.urljoin(ORC_BASE_URL, ORC_UNDERGRAD_SUFFIX)
GRADUATE_URL = urlparse.urljoin(ORC_BASE_URL, ORC_GRADUATE_SUFFIX)
UNDERGRAD_URL = urljoin(ORC_BASE_URL, ORC_UNDERGRAD_SUFFIX)
GRADUATE_URL = urljoin(ORC_BASE_URL, ORC_GRADUATE_SUFFIX)
INSTRUCTOR_TERM_REGEX = re.compile("^(?P<name>\w*)\s?(\((?P<term>\w*)\))?")

SUPPLEMENT_URL = (
Expand Down Expand Up @@ -51,7 +51,7 @@ def crawl_program_urls():
def _get_department_urls_from_url(url):
soup = retrieve_soup(url)
linked_urls = [
urlparse.urljoin(BASE_URL, a["href"])
urljoin(BASE_URL, a["href"])
for a in soup.find_all("a", href=True)
]
return set(
Expand All @@ -73,7 +73,7 @@ def _is_department_url(candidate_url, base_url):
def _get_program_urls_from_department_url(url):
soup = retrieve_soup(url)
linked_urls = [
urlparse.urljoin(BASE_URL, a["href"])
urljoin(BASE_URL, a["href"])
for a in soup.find_all("a", href=True)
]
program_urls = set()
Expand Down Expand Up @@ -101,7 +101,7 @@ def _is_program_url(candidate_url, department_url):
def crawl_courses_from_program_page_url(url, program_code):
soup = retrieve_soup(url)
linked_urls = [
urlparse.urljoin(BASE_URL, a["href"])
urljoin(BASE_URL, a["href"])
for a in soup.find_all("a", href=True)
]
course_urls = sorted(
Expand Down
20 changes: 10 additions & 10 deletions apps/spider/crawlers/timetable.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,10 @@

DATA_TO_SEND = (
"distribradio=alldistribs&depts=no_value&periods=no_value&"
"distribs=no_value&distribs_i=no_value&distribs_wc=no_value&deliverymodes=no_value&pmode=public&"
"distribs=no_value&distribs_i=no_value&distribs_wc=no_value&distribs_lang=no_value&deliverymodes=no_value&pmode=public&" # added distribs_lang required parameter (orc returns invalid params if not passed)
"term=&levl=&fys=n&wrt=n&pe=n&review=n&crnl=no_value&classyear=2008&"
"searchtype=Subject+Area%28s%29&termradio=selectterms&terms=no_value&"
"searchtype=Subject+Area%28s%29&termradio=selectterms&terms=no_value&terms={term}&" # terms with proper value must be right after `&terms=no_value` or else timetable will be empty
"deliveryradio=selectdelivery&subjectradio=selectsubjects&hoursradio=allhours&sortorder=dept"
"&terms={term}"
)

COURSE_TITLE_REGEX = re.compile(
Expand Down Expand Up @@ -55,16 +54,16 @@ def crawl_timetable(term):
preprocess=lambda x: re.sub("</tr>", "", x),
)
num_columns = len(soup.find(class_="data-table").find_all("th"))
assert num_columns == 20
assert num_columns == 21 # more than 20 after dartmouth added the lang requirement column

tds = soup.find(class_="data-table").find_all("td")
assert len(tds) % num_columns == 0

td_generator = (td for td in tds)
for _ in xrange(len(tds) / num_columns):
tds = [next(td_generator) for _ in xrange(num_columns)]
for _ in range(len(tds) // num_columns): # switch to range instead of xrange and `//` instead of `/` for integer division in python 3
tds = [next(td_generator) for _ in range(num_columns)]

number, subnumber = parse_number_and_subnumber(tds[3].get_text())
number, subnumber = parse_number_and_subnumber(tds[3].get_text()) # -1 from og index
crosslisted_courses = _parse_crosslisted_courses(
tds[7].get_text(strip=True))

Expand Down Expand Up @@ -92,9 +91,10 @@ def crawl_timetable(term):
"instructor": _parse_instructors(tds[12].get_text(strip=True)),
"world_culture": tds[13].get_text(strip=True),
"distribs": _parse_distribs(tds[14].get_text(strip=True)),
"limit": int_or_none(tds[15].get_text(strip=True)),
# "enrollment": int_or_none(tds[16].get_text(strip=True)),
"status": tds[17].get_text(strip=True),
# "langreq": tds[15].get_text(strip=True)), # language requirement, new in the timetable, haven't added to models yet
"limit": int_or_none(tds[16].get_text(strip=True)), # +1 from og index
# "enrollment": int_or_none(tds[17].get_text(strip=True)),
"status": tds[18].get_text(strip=True), # +1 from og index
})
return course_data

Expand Down
4 changes: 2 additions & 2 deletions apps/spider/migrations/0001_initial.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ class Migration(migrations.Migration):
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('resource', models.CharField(db_index=True, max_length=128, unique=True)),
('data_type', models.CharField(choices=[(b'medians', b'Medians')], max_length=32)),
('pending_data', django.contrib.postgres.fields.jsonb.JSONField()),
('current_data', django.contrib.postgres.fields.jsonb.JSONField(null=True)),
('pending_data', models.JSONField()),
('current_data', models.JSONField(null=True)),
('created_at', models.DateTimeField(auto_now_add=True)),
('updated_at', models.DateTimeField(auto_now=True)),
],
Expand Down
18 changes: 18 additions & 0 deletions apps/spider/migrations/0004_alter_crawleddata_data_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 5.1.6 on 2025-02-24 19:08

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('spider', '0003_add_timetable_data_type'),
]

operations = [
migrations.AlterField(
model_name='crawleddata',
name='data_type',
field=models.CharField(choices=[('medians', 'Medians'), ('orc_department_courses', 'ORC Department Courses'), ('course_timetable', 'Course Timetable')], max_length=32),
),
]
6 changes: 3 additions & 3 deletions apps/spider/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ class CrawledData(models.Model):

resource = models.CharField(max_length=128, db_index=True, unique=True)
data_type = models.CharField(max_length=32, choices=DATA_TYPE_CHOICES)
pending_data = JSONField()
current_data = JSONField(null=True)
pending_data = models.JSONField()
current_data = models.JSONField(null=True)

created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)

def __unicode__(self):
def __str__(self):
return "[{data_type}] {resource}".format(
data_type=self.data_type,
resource=self.resource,
Expand Down
2 changes: 2 additions & 0 deletions apps/spider/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ def import_pending_crawled_data(crawled_data_pk):
@task_utils.email_if_fails
def crawl_medians():
median_page_urls = medians.crawl_median_page_urls()
print("Crawling median pages: {0}".format(median_page_urls))
print("Length of median pages: {0}".format(len(median_page_urls)))
assert len(median_page_urls) == 10 # the registrar medians web page always keeps a list links to the past ten academic terms
for url in median_page_urls:
crawl_term_median_page.delay(url)
Expand Down
Loading