title | description |
---|---|
Analyzing Results |
This guide offers detailed guidelines and code examples to assist users in effectively categorizing survey responses. These categories are based on predefined flags from automated checks. By classifying responses into "Accept", "Review", or "Reject", users can streamline the review process and maintain consistency in response handling.
Survey responses are categorized according to the nature of the flags assigned:
- Accept: Responses are accepted if they have no flags or flags that do not compromise the response’s quality.
- Review: Responses flagged for "Cross-duplicate", "Off-topic", "Low-effort", or "GPT" require further human review as they may still contain useful insights.
- Reject: Responses flagged as "Gibberish", "Programmatic Entry", "Response Pasted", or "Self-duplicate" are typically deemed unreliable or irrelevant and should be excluded from further analysis.
Below are example code snippets in Python and R to automate the decision-making process based on the flags. Here we
import pandas as pd
# Load the dataset
data = pd.read_csv('/path/to/your/file.csv')
# Define the categorization function based on flags
def categorize_response(flag):
review_flags = [
"Automated test: Off-topic", "Automated test: Low-effort",
"Automated test: GPT", "GPT paste artifacts", "Cross-duplicate response"
]
reject_flags = [
"Automated test: Gibberish", "Programmatic entry",
"Response pasted", "Self-duplicate response"
]
if pd.isna(flag) or flag == "None":
return "Accept"
elif flag in review_flags:
return "Review"
elif flag in reject_flags:
return "Reject"
# Apply the function to each flag column
data['Q1_decision'] = data['Q1-flags'].apply(categorize_response)
data['Q2_decision'] = data['Q2-flags'].apply(categorize_response)
# Display the modified dataset
print(data[['participantID', 'Q1', 'Q1-flags', 'Q1_decision', 'Q2', 'Q2-flags', 'Q2_decision']].head())
library(dplyr)
# Load the dataset
data <- read.csv('/path/to/your/file.csv')
# Define the categorization function based on flags
categorize_response <- function(flag) {
review_flags <- c(
"Automated test: Off-topic", "Automated test: Low-effort",
"Automated test: GPT", "GPT paste artifacts", "Cross-duplicate response"
)
reject_flags <- c(
"Automated test: Gibberish", "Programmatic entry",
"Response pasted", "Self-duplicate response"
)
if (is.na(flag) || flag == "None") {
return("Accept")
} else if (flag %in% review_flags) {
return("Review")
} else if (flag %in% reject_flags) {
return("Reject")
}
}
# Apply the function to each flag column
data <- data %>%
mutate(
Q1_decision = sapply(Q1-flags, categorize_response),
Q2_decision = sapply(Q2-flags, categorize_response)
)
# Print the updated dataset
print(select(data, participantID, Q1, Q1-flags, Q1_decision, Q2, Q2-flags, Q2_decision))
For more complex scenarios, such as reviewing any questions where at least two responses are flagged, consider the following:
import pandas as pd
# Load the dataset
data = pd.read_csv('/path/to/your/file.csv')
# Define the categorization function based on flags
def categorize_response(flag):
review_flags = ["Automated test: Off-topic", "Automated test: Low-effort", "Automated test: GPT",
"GPT paste artifacts", "Cross-duplicate response"]
reject_flags = ["Automated test: Gibberish", "Programmatic entry", "Response pasted", "Self-duplicate response"]
if pd.isna(flag) or flag == "None":
return "Accept"
elif any(flag == rf for rf in review_flags):
return "Review"
elif any(flag == rf for rf in reject_flags):
return "Reject"
# Apply the function to each flag column for three questions
data['Q1_decision'] = data['Q1-flags'].apply(categorize_response)
data['Q2_decision'] = data['Q2-flags'].apply(categorize_response)
data['Q3_decision'] = data['Q3-flags'].apply(categorize_response)
# Add a column to count the number of "Review" decisions
data['Review_count'] = data[['Q1_decision', 'Q2_decision', 'Q3_decision']].apply(lambda x: (x == 'Review').sum(), axis=1)
# Filter participants with at least two responses flagged for review
review_participants = data[data['Review_count'] >= 2]
# Save or display the filtered dataset
print(review_participants[['participantID', 'Q1', 'Q1_decision', 'Q2', 'Q2_decision', 'Q3', 'Q3_decision', 'Review_count']].head())
library(dplyr)
library(tidyr)
# Load the dataset
data <- read.csv('/path/to/your/file.csv')
# Define the categorization function based on flags
categorize_response <- function(flag) {
review_flags <- c("Automated test: Off-topic", "Automated test: Low-effort", "Automated test: GPT",
"GPT paste artifacts", "Cross-duplicate response")
reject_flags <- c("Automated test: Gibberish", "Programmatic entry", "Response pasted", "Self-duplicate response")
if (is.na(flag) || flag == "None") {
return("Accept")
} else if (flag %in% review_flags) {
return("Review")
} else if (flag %in% reject_flags) {
return("Reject")
}
}
# Apply the function to each flag column for three questions
data <- data %>%
mutate(Q1_decision = sapply(Q1-flags, categorize_response),
Q2_decision = sapply(Q2-flags, categorize_response),
Q3_decision = sapply(Q3-flags, categorize_response))
# Add a column to count the number of "Review" decisions
data <- data %>%
mutate(Review_count = rowSums(select(., Q1_decision, Q2_decision, Q3_decision) == "Review"))
# Filter participants with at least two responses flagged for review
review_participants <- filter(data, Review_count >= 2)
# Print the filtered dataset
print(select(review_participants, participantID, Q1, Q1_decision, Q2, Q2_decision, Q3, Q3_decision, Review_count))