output | ||||
---|---|---|---|---|
|
This is the shared git repo for our Project 3 Team.
Group members include John Ferrara, Alinzon Simon, Akeem Lawrence, Anthony Roman, and Ben Wolin. Members are listed in no particular order.
As a group, our main communication tools are iMessage and Slack. There may be other means of communication used, but so far these have been our main methods.
Other than collaborating on Slack, other tools to be leveraged for code sharing and project documentation are GitHub.
For this project, our data will live in a MySQL database hosted on CloudSQL. The languages used to analyze this data will be R and SQL.
Our group has chosen to work with a Kaggle-sourced dataset that examines job postings on LinkedIn. This data contains information such as the locations of the entities hiring, the companies performing the hiring, the job titles for the open positions, along with additional information related to the position. Additional information, and the data set itself can be found here. Lastly, the dataset files and their respective column names can be found in Table 1 below.
File Name | Columns |
---|---|
job_postings | job_link, last_processed_time, last_status, got_summary, got_ner, is_being_worked, job_title, company, job_location, first_seen, search_city, search_country, search_position, job_level, job_type |
job_skills | job_link, job_skills |
job_summary | job_link, job_summary |
The proposed normalized tables for structuring the data within the MySQL database can be seen in Figure 1 below. The image also lives here with the acutal file here
Currently, the data loading process can be seen in the following file on GitHub within our shared public repo for this project.