Skip to content

Latest commit

 

History

History
49 lines (28 loc) · 908 Bytes

README.md

File metadata and controls

49 lines (28 loc) · 908 Bytes

An SQL engine on top of Hadoop

There are 3 files

  1. selectreducer.py : this does the reducing job.
  2. selectmapper.py : this does the mappping job.
  3. implementation.py : this file contains the implemenations of SQL queries and loading function which calls the mapper and reducer.

The SQL Queries Currently Supported :

  1. select
  2. project
  3. load
  4. min
  5. max
  6. sum
  7. where

What are requirements ?

hadoop python3

How to run the interpreter ?

python3 implementation.py

Syntax

  1. LOAD

load bigdata/csv_file_name.csv AS (column_name1:data_type,column_name2:data_type,...)

  1. SELECT

select column_name1 from table_name where condition

Note : data_type can be int,float,str

What's is done ?

  1. Implement select and project
  2. Implement aggregate functions MAX, COUNT, SUM
  3. Loading of the csv file onto the hadoop.