Skip to content

Latest commit

 

History

History
177 lines (133 loc) · 6.04 KB

analyze-v014.mdx

File metadata and controls

177 lines (133 loc) · 6.04 KB
title description
Analyzing Results

Overview

This guide offers detailed guidelines and code examples to assist users in effectively categorizing survey responses. These categories are based on predefined flags from automated checks. By classifying responses into "Accept", "Review", or "Reject", users can streamline the review process and maintain consistency in response handling.

Sample Decision Procedure

Survey responses are categorized according to the nature of the flags assigned:

  • Accept: Responses are accepted if they have no flags or flags that do not compromise the response’s quality.
  • Review: Responses flagged for "Cross-duplicate", "Off-topic", "Low-effort", or "GPT" require further human review as they may still contain useful insights.
  • Reject: Responses flagged as "Gibberish", "Programmatic Entry", "Response Pasted", or "Self-duplicate" are typically deemed unreliable or irrelevant and should be excluded from further analysis.

Code Snippets

Below are example code snippets in Python and R to automate the decision-making process based on the flags. Here we

import pandas as pd

# Load the dataset
data = pd.read_csv('/path/to/your/file.csv')

# Define the categorization function based on flags
def categorize_response(flag):
    review_flags = [
        "Automated test: Off-topic", "Automated test: Low-effort", 
        "Automated test: GPT", "GPT paste artifacts", "Cross-duplicate response"
    ]
    reject_flags = [
        "Automated test: Gibberish", "Programmatic entry", 
        "Response pasted", "Self-duplicate response"
    ]

    if pd.isna(flag) or flag == "None":
        return "Accept"
    elif flag in review_flags:
        return "Review"
    elif flag in reject_flags:
        return "Reject"

# Apply the function to each flag column
data['Q1_decision'] = data['Q1-flags'].apply(categorize_response)
data['Q2_decision'] = data['Q2-flags'].apply(categorize_response)

# Display the modified dataset
print(data[['participantID', 'Q1', 'Q1-flags', 'Q1_decision', 'Q2', 'Q2-flags', 'Q2_decision']].head())
library(dplyr)

# Load the dataset
data <- read.csv('/path/to/your/file.csv')

# Define the categorization function based on flags
categorize_response <- function(flag) {
  review_flags <- c(
    "Automated test: Off-topic", "Automated test: Low-effort", 
    "Automated test: GPT", "GPT paste artifacts", "Cross-duplicate response"
  )
  reject_flags <- c(
    "Automated test: Gibberish", "Programmatic entry", 
    "Response pasted", "Self-duplicate response"
  )

  if (is.na(flag) || flag == "None") {
    return("Accept")
  } else if (flag %in% review_flags) {
    return("Review")
  } else if (flag %in% reject_flags) {
    return("Reject")
  }
}

# Apply the function to each flag column
data <- data %>%
  mutate(
    Q1_decision = sapply(Q1-flags, categorize_response),
    Q2_decision = sapply(Q2-flags, categorize_response)
  )

# Print the updated dataset
print(select(data, participantID, Q1, Q1-flags, Q1_decision, Q2, Q2-flags, Q2_decision))

Advanced Code Snippets

For more complex scenarios, such as reviewing any questions where at least two responses are flagged, consider the following:

import pandas as pd

# Load the dataset
data = pd.read_csv('/path/to/your/file.csv')

# Define the categorization function based on flags
def categorize_response(flag):
    review_flags = ["Automated test: Off-topic", "Automated test: Low-effort", "Automated test: GPT", 
                    "GPT paste artifacts", "Cross-duplicate response"]
    reject_flags = ["Automated test: Gibberish", "Programmatic entry", "Response pasted", "Self-duplicate response"]

    if pd.isna(flag) or flag == "None":
        return "Accept"
    elif any(flag == rf for rf in review_flags):
        return "Review"
    elif any(flag == rf for rf in reject_flags):
        return "Reject"

# Apply the function to each flag column for three questions
data['Q1_decision'] = data['Q1-flags'].apply(categorize_response)
data['Q2_decision'] = data['Q2-flags'].apply(categorize_response)
data['Q3_decision'] = data['Q3-flags'].apply(categorize_response)

# Add a column to count the number of "Review" decisions
data['Review_count'] = data[['Q1_decision', 'Q2_decision', 'Q3_decision']].apply(lambda x: (x == 'Review').sum(), axis=1)

# Filter participants with at least two responses flagged for review
review_participants = data[data['Review_count'] >= 2]

# Save or display the filtered dataset
print(review_participants[['participantID', 'Q1', 'Q1_decision', 'Q2', 'Q2_decision', 'Q3', 'Q3_decision', 'Review_count']].head())
library(dplyr)
library(tidyr)

# Load the dataset
data <- read.csv('/path/to/your/file.csv')

# Define the categorization function based on flags
categorize_response <- function(flag) {
  review_flags <- c("Automated test: Off-topic", "Automated test: Low-effort", "Automated test: GPT", 
                    "GPT paste artifacts", "Cross-duplicate response")
  reject_flags <- c("Automated test: Gibberish", "Programmatic entry", "Response pasted", "Self-duplicate response")
  
  if (is.na(flag) || flag == "None") {
    return("Accept")
  } else if (flag %in% review_flags) {
    return("Review")
  } else if (flag %in% reject_flags) {
    return("Reject")
  }
}

# Apply the function to each flag column for three questions
data <- data %>%
  mutate(Q1_decision = sapply(Q1-flags, categorize_response),
         Q2_decision = sapply(Q2-flags, categorize_response),
         Q3_decision = sapply(Q3-flags, categorize_response))

# Add a column to count the number of "Review" decisions
data <- data %>%
  mutate(Review_count = rowSums(select(., Q1_decision, Q2_decision, Q3_decision) == "Review"))

# Filter participants with at least two responses flagged for review
review_participants <- filter(data, Review_count >= 2)

# Print the filtered dataset
print(select(review_participants, participantID, Q1, Q1_decision, Q2, Q2_decision, Q3, Q3_decision, Review_count))