- Source: Kaggle Titanic Dataset
- Direct Download: train.csv
- Goal: Predict whether a passenger survived (
1
) or died (0
). - Features (input columns):
Pclass
: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)Sex
: Gender (male
orfemale
)Age
: Age of the passengerSibSp
: Number of siblings/spouses aboardParch
: Number of parents/children aboardFare
: Ticket fareEmbarked
: Port of embarkation (C
,Q
,S
)
- Label (output column):
Survived
:0
(died) or1
(survived)
Download the dataset using the direct URL:
wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv -O titanic.csv
This will save the file as titanic.csv
.
The Titanic dataset contains both numerical and categorical features. Neural networks work best with numerical data, so we’ll preprocess it:
- Convert categorical features (e.g.,
Sex
,Embarked
) into numbers.Sex
:male
→1
,female
→0
.Embarked
:C
→0
,Q
→1
,S
→2
.
- Fill missing values in
Age
andEmbarked
columns with the median. - Split the data into training and testing datasets.
Save the following Python script as prepare_titanic_data.py
:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load the Titanic dataset
df = pd.read_csv("titanic.csv")
# Select useful columns
features = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
label = "Survived"
# Preprocess the dataset
df["Sex"] = df["Sex"].map({"male": 1, "female": 0}) # Convert Sex to 1 (male) and 0 (female)
df["Embarked"] = df["Embarked"].map({"C": 0, "Q": 1, "S": 2}) # Convert Embarked to numbers
df["Age"] = df["Age"].fillna(df["Age"].median()) # Fill missing ages with median
df["Embarked"] = df["Embarked"].fillna(2) # Fill missing Embarked with the most common value
df["Fare"] = df["Fare"].fillna(df["Fare"].median()) # Fill missing Fare with median
# Extract features and label
X = df[features]
y = df[label]
# Standardize the features for better performance
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Save the training and test data
train_data = pd.DataFrame(X_train, columns=features)
train_data["Survived"] = y_train.values # Add label column
train_data.to_csv("titanic_train.csv", index=False)
test_data = pd.DataFrame(X_test, columns=features)
test_data.to_csv("titanic_test.csv", index=False)
test_labels = pd.DataFrame(y_test, columns=["Survived"])
test_labels.to_csv("titanic_test_labels.csv", index=False)
print("Prepared training data saved to 'titanic_train.csv'")
print("Prepared test data saved to 'titanic_test.csv'")
print("Test labels saved to 'titanic_test_labels.csv'")
Run the script to generate:
titanic_train.csv
: Training data (features + label).titanic_test.csv
: Test data (features only, no labels).titanic_test_labels.csv
: Test labels for evaluation.
We’ll create a neural network to classify passengers as survived or not.
sygnals-nn create \
--layers 7,16,8,1 \
--activation relu,relu,sigmoid \
--loss binary_crossentropy \
--optimizer adam \
--output titanic_model.keras
--layers 7,16,8,1
:- 7 input neurons (one for each feature:
Pclass
,Sex
,Age
,SibSp
,Parch
,Fare
,Embarked
). - Hidden layer 1: 16 neurons (ReLU activation).
- Hidden layer 2: 8 neurons (ReLU activation).
- Output layer: 1 neuron (Sigmoid activation for binary classification).
- 7 input neurons (one for each feature:
--activation relu,relu,sigmoid
:- ReLU activation for hidden layers.
- Sigmoid activation for the output layer.
--loss binary_crossentropy
:- Binary cross-entropy is used for binary classification problems.
--optimizer adam
:- Adam optimizer for efficient training.
Model saved to titanic_model.keras
Use the prepared training data (titanic_train.csv
) to train the model.
sygnals-nn train \
--model titanic_model.keras \
--data titanic_train.csv \
--epochs 100 \
--batch-size 32 \
--learning-rate 0.001
--model titanic_model.keras
: Load the model created in Step 3.--data titanic_train.csv
: Use the Titanic training dataset.--epochs 100
: Train for 100 iterations over the dataset.--batch-size 32
: Process 32 samples at a time during training.--learning-rate 0.001
: Set a small learning rate for stable convergence.
Epoch 1/100
1/1 [==============================] - 0s 20ms/step - loss: 0.690 - accuracy: 0.500
...
Epoch 100/100
1/1 [==============================] - 0s 2ms/step - loss: 0.410 - accuracy: 0.850
Trained model saved to titanic_model.keras
Use the trained model to predict survival probabilities for passengers in the test set (titanic_test.csv
).
sygnals-nn run \
--model titanic_model.keras \
--input titanic_test.csv \
--output titanic_predictions.csv
--model titanic_model.keras
: Load the trained model.--input titanic_test.csv
: Use the test data (features only, no labels).--output titanic_predictions.csv
: Save predictions to a CSV file.
Predictions saved to titanic_predictions.csv
Open titanic_predictions.csv
. It contains survival probabilities:
0.89
0.12
0.75
0.05
...
0.89
: High probability of survival.0.12
: Low probability of survival.
Compare predictions in titanic_predictions.csv
with true labels in titanic_test_labels.csv
. Use the following Python script to calculate accuracy:
import pandas as pd
import numpy as np
# Load predictions and true labels
predictions = pd.read_csv("titanic_predictions.csv", header=None)
true_labels = pd.read_csv("titanic_test_labels.csv")
# Convert probabilities to binary predictions (1 if p > 0.5 else 0)
predicted_classes = (predictions.values > 0.5).astype(int)
# Calculate accuracy
accuracy = np.mean(predicted_classes == true_labels.values.ravel())
print(f"Accuracy: {accuracy * 100:.2f}%")
The trained model should achieve around 80-85% accuracy.
-
Download the Titanic dataset:
wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv -O titanic.csv
-
Prepare the data:
python prepare_titanic_data.py
-
Create the model:
sygnals-nn create \ --layers 7,16,8,1 \ --activation relu,relu,sigmoid \ --loss binary_crossentropy \ --optimizer adam \ --output titanic_model.keras
-
Train the model:
sygnals-nn train \ --model titanic_model.keras \ --data titanic_train.csv \ --epochs 100 \ --batch-size 32 \ --learning-rate 0.001
-
Run inference:
sygnals-nn run \ --model titanic_model.keras \ --input titanic_test.csv \ --output titanic_predictions.csv
-
Evaluate the results: Use the provided Python script to calculate accuracy.
- Select data for training/testing: Choose a dataset with inputs (features) and outputs (labels).
- Transform the data: Clean, normalize, and format the data into CSV files.
- Create the neural network: Define the structure with
sygnals-nn create
. - Train the neural network: Teach the network using training data.
- Test the neural network: Evaluate it with testing data.
- Infer with real data: Use the trained model for predictions.