Prerequisites

Install the required libraries using pip:
```
pip install pandas openpyxl
```

Task 1: Event Type Classification from Text Files

Purpose

This script processes a series of text files containing timestamped data and classifies each event based on a metadata file that defines event types with start and end times. The classified event data is saved to an Excel file for further analysis.

Place the following files in a directory called ./Task1/:
- metadata.xlsx: Contains the metadata that defines the start and end times for events.
- Multiple .txt files: Contain timestamped data that will be classified based on the metadata.

Script Flow

The script loads the metadata from the metadata.xlsx file. The metadata includes:
- StartTime: The start time of the event.
- EndTime: The end time of the event.
- EventType: The event classification.
It processes each .txt file in the ./Task1/ folder and reads the data, which should include a Timestamp column and other columns representing data for that event.
The script compares the timestamp of each row with the start and end times from the metadata to classify the event type.
The classified data is then saved into an Excel file classified_events.xlsx.

How to Test

Data Setup

Place your metadata in the metadata.xlsx file in the ./Task1/ folder. The metadata should be structured with columns:
- StartTime: The event start time.
- EndTime: The event end time.
- EventType: The classification for that event.
Place your .txt files in the ./Task1/ folder. The data in these files should have the following structure:
- Timestamp: The timestamp in the format YYYY-DD-MM-DD-hh:mm:ss:fff.
- Other columns such as x, y, and z which will be processed for event classification.

Testing Process

Ensure the necessary packages (pandas, openpyxl) are installed.
Run the script to classify the data and save it in an Excel file.

python Task1.py

Task 2: Data Integrity Check and Missing Data Detection

Load Data

The dataset is expected to be in the ./Task2/ folder with the filename data.xlsx. The dataset should contain columns named:
- Index
- Timestamp
- Incremental_Index
- Acceleration_X
- Acceleration_Y
- Acceleration_Z

Purpose

This script processes a dataset to check for missing data points based on timestamps and incremental index values. The key objectives are to:

Identify timestamps with fewer data points than the specified fault tolerance.
Detect gaps in the incremental index (0-255) sequence.
Generate a report detailing missing data points for each identified timestamp.

Parameters

Fault Tolerance: Defined by fault_tolerance (default: 20). Any timestamp with fewer data points than this threshold is flagged for missing data.
Incremental Index Range: Set from 0 to 255. The script checks for gaps in this range and flags missing index values.

Run the Script

Execute the script by running:

python Task2.py

Output

The script will print reports detailing:

Timestamps with missing data (if consecutive data points fall below the fault tolerance threshold).
Gaps in the incremental index sequence, indicating data loss.

Example Output

Missing data detected from index 105, Timestamp at 2023-11-06 12:45:10 only has 18 data points.
Data loss found at index 210, total data loss: 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prerequisites

Task 1: Event Type Classification from Text Files

Purpose

Script Flow

How to Test

Data Setup

Testing Process

Task 2: Data Integrity Check and Missing Data Detection

Purpose

Parameters

Run the Script

Output

Example Output

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Task1		Task1
Task2		Task2
README.md		README.md
Task1.py		Task1.py
Task2.py		Task2.py
classified_events.xlsx		classified_events.xlsx

kurichon/dataengineertask

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Task 1: Event Type Classification from Text Files

Purpose

Script Flow

How to Test

Data Setup

Testing Process

Task 2: Data Integrity Check and Missing Data Detection

Purpose

Parameters

Run the Script

Output

Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages