Skip to content

The Real-Time Expense Tracker is a distributed big data project designed to process, analyze, and store credit card expense data in real-time. Leveraging technologies like Apache Kafka, Apache Spark, Apache Cassandra, and Hadoop HDFS, the system ensures efficient data flow and scalability for high-volume financial transactions.

Notifications You must be signed in to change notification settings

omerfeyzioglu/real-time-expense-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Expense Tracker

A robust real-time expense tracking system built with Apache Kafka, Apache Spark, and Apache Cassandra, featuring a Spring Boot web interface. This project implements a complete data pipeline for tracking and analyzing credit card expenses in real-time.

🚀 System Architecture

The system consists of several interconnected components:

Data Flow

img

Components

  • Data Generator: Generates synthetic credit card transaction data every second

  • Apache Kafka: Message queue system for data streaming with dedicated topics (v3.9.0)

  • Apache Spark: Real-time data processing and analytics (v3.5.3)

  • Apache Cassandra: NoSQL database for storing transaction data (v4.1.7)img_4

  • Spring Boot: Web interface for viewing employee data and expenses (v3.2.0)

  • PostgreSQL: Stores employee and department informationimg_2

  • HDFS: Stores employee images (AWS S3 or Google Drive can be used alternatively)

💡 Features

  • Real-time expense tracking and processing (1-second intervals)
  • Automated data generation for all employees
  • Per-user expense tracking and storage
  • Instant cumulative expense reporting
  • Employee information display with images
  • Department-wise expense tracking
  • Comprehensive logging with SLF4J
  • Employee management (CRUD operations)
  • Image storage and retrieval via HDFS
  • Manager hierarchy tracking

🛠 Technical Requirements

  • Java JDK 11
  • Apache Kafka 3.9.0
  • Apache Spark 3.5.3
  • Apache Cassandra 4.1.7
  • Spring Boot 3.2.0
  • PostgreSQL
  • Maven
  • Lombok

📦 Installation & Setup

  1. Clone the Repository

    git clone https://github.com/omerfeyzioglu/real-time-expense-tracker.git
    cd real-time-expense-tracker
  2. Database Setup

    -- Cassandra Setup
    CREATE KEYSPACE IF NOT EXISTS your_keyspace 
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    
    CREATE TABLE expenses (
      empno text,
      date_time timestamp,
      description text,
      type text,
      count int,
      payment double,
      PRIMARY KEY (empno, date_time)
    );
  3. Start Services

    # Start Zookeeper
    bin/zookeeper-server-start.sh config/zookeeper.properties
    
    # Start Kafka
    bin/kafka-server-start.sh config/server.properties
    
    # Start Cassandra
    cassandra -f
  4. Build and Run

    mvn clean install
    java -jar target/project3-0.0.1-SNAPSHOT.jar

🔧 Configuration

Application Properties

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/your_database
    username: your_username
    password: your_password
  
  cassandra:
    keyspace-name: your_keyspace
    contact-points: localhost
    local-datacenter: datacenter1
    port: 9042

  servlet:
    multipart:
      max-file-size: 10MB
      max-request-size: 10MB

kafka:
  bootstrap-servers: localhost:9092
  topic: expenses-topic
  group-id: expense-group

🌐 API Endpoints

Employee Management

  • GET /employee: List all employees with their expenses
    • Optional query param: empnos (List) for specific employees
  • POST /employee/add: Add new employee
    • Requires employee details and optional image file
  • POST /employee/update: Update existing employee
    • Requires employee details and optional new image
  • POST /employee/delete: Delete employee and related data
    • Requires empno
  • GET /employee/image: Retrieve employee image
    • Requires imageName

📝 Outputs

Kafka Consumer

The datas created from data-generator being consumed by consumers in consumer-sh . img_3

📊 Data Models

Employee

{
    Integer empno;      // Employee number (auto-generated)
    String ename;       // Employee name
    String job;         // Job title
    Integer mgr;        // Manager's employee number
    Double sal;         // Salary
    Double comm;        // Commission
    Integer deptno;     // Department number
    String img;         // Image filename
}

Expense

{
    String empno;       // Employee number
    String dateTime;    // Transaction timestamp
    String description; // Expense description
    String type;        // Expense type
    Integer count;      // Quantity
    Double payment;     // Amount paid
}

ExpenseSummary

{
    Double totalAmount;     // Total expenses
    Integer transactionCount; // Number of transactions
    List<Expense> expenses; // Detailed expense records
}

Error Handling

The application includes comprehensive error handling:

  • Image upload/download validation
  • Department validation
  • Employee existence checks
  • Cascade deletion (employee, expenses, images)
  • Transaction logging
  • Proper error responses with meaningful messages

👥 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

The Real-Time Expense Tracker is a distributed big data project designed to process, analyze, and store credit card expense data in real-time. Leveraging technologies like Apache Kafka, Apache Spark, Apache Cassandra, and Hadoop HDFS, the system ensures efficient data flow and scalability for high-volume financial transactions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published