Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create function that reformats data into correct form for calculate_emissions function #14

Open
lilyclements opened this issue Jul 19, 2022 · 0 comments

Comments

@lilyclements
Copy link
Collaborator

lilyclements commented Jul 19, 2022

Want an automated system to create a data set fit for use to calculate all emissions from.

This data could come from some expense report data. Big issue can be in the nuances when others write in their expenses, or in different software. This is only based on Zoho so far.

  1. Take the description column and recategorise as a flight, train, etc (see example below)
  2. Failing the description column, take the category_name column to recategorise
  3. If it cannot be automatically categorised, run checks (see example below)
  4. Take the description column and add in other details. E.g., flight from, to, via, number on flight, etc. How can we do this? Automatically take the "description" column and read it in, or run checks? If the description says "return flight", then can we assume it is a return? What if the description says "non-return flight"?
    If the description says "for me and X", how do we read that?
    If the description says "for two nights", do we check for words around "nights"
    Translating qualitative data to be quantitative - requires thought.
  5. Relabel to correct format; for example, "London Heathrow" to "LHR" for airport data (similarly for ferry and train emissions)
  6. For vehicle distances, mileage is given in mileage_distance variable in Zoho. To what extent to depend on this? If it is empty, do we have another check?
  7. Office emissions - how to offer this. Read in a data set with employees number of days/hours

E.g., with expense_report data to 1.

plane_match <- c("flight", "plane", "airport", "airplane", "aeroplane")
hotel_match <- c("hotel", "accomodation", "nights", "night", "stay", "guesthouse", "airbnb")
taxi_match <- c("taxi", "cab")

expense_report <- expense_report %>%
  dplyr::mutate(emission =
                  ifelse(grepl("train", description, ignore.case = TRUE), "Train",
                         ifelse(grepl(paste(plane_match, collapse= "|"), description, ignore.case = TRUE), "Flight",
                                ifelse(grepl(paste(hotel_match, collapse= "|"), description, ignore.case = TRUE), "Hotel",
                                       ifelse(grepl(paste(taxi_match, collapse= "|"), description, ignore.case = TRUE), "Taxi", 99)))))

E.g. to 3.

na_emission <- which(expense_report$emission == 99)
which_emission <- function(description_var = expense_report$description[i]){
  input <- menu(c("Accomodation", "Materials", "Office", "Transport", "None of the above"),
                title=paste("What do you want to assign to", description_var))
  if (input == 1){
    expense_report$emission[i] <- "Hotel"
  } else if (input == 2){
    expense_report$emission[i] <- "Materials"
  } else if (input == 3){
    expense_report$emission[i] <- "Office"
  } else if (input == 4){
    input2 <- menu(c("Ferry", "Flight", "Train", "Vehicle", "None of the above"),
                   title=paste("Which transport type?"))
    if (input2 == 1){
      expense_report$emission[i] <- "Ferry"
    } else if (input2 == 2){
      expense_report$emission[i] <- "Flight"
    } else if (input2 == 3){
      expense_report$emission[i] <- "Train"
    } else if (input2 == 4){
      input3 <- menu(c("Bus", "Car", "Coach", "Taxi", "Tube", "None of the above"),
                     title=paste("Which vehicle type?"))
      if (input3 == 1){
        expense_report$emission[i] <- "Bus"
      } else if (input3 == 2){
        expense_report$emission[i] <- "Car"
      } else if (input3 == 3){
        expense_report$emission[i] <- "Coach"
      } else if (input3 == 4){
        expense_report$emission[i] <- "Taxi"
      } else if (input3 == 5){
        expense_report$emission[i] <- "Tube"
      } else if (input3 == 6){
        expense_report$emission[i] <- NA
      }
    } else if (input2 == 5){
      expense_report$emission[i] <- NA
    }
  } else if (input == 5){
    expense_report$emission[i] <- "NA"
  }
}

for (i in na_emission){
  which_emission()
}
expense_report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant