Internet Traffic
There are two scripts for loading the data, both of which are under the load/
directory. They expect the two CSV files (1) and (2) to be located at /var/lib/mysql-files
. However if you are running this on windows then by default the csv files must be in C:\ProgramData\MySQL\MySQL Server 8.0\Uploads
.
To load the data, first create a database (i.e. CREATE DATABASE InternetTraffic;
followed by USE InternetTraffic;
). Then, run the script load_base_tables.sql
first and finally, run load_specialized_tables.sql
.
For example:
CREATE DATABASE InternetTraffic;
USE InternetTraffic;
SOURCE load_base_tables.sql;
SOURCE load_specialized_tables.sql;
and the data should be loaded without errors or warnings.
The client application was created using Python and the MySQL Python connector library. Documentation for the library can be found here. Furthermore, this application requires that you have setup the database previously discussed and that it is running on 'localhost'. All the source code can be found in the "main.py" file. Lastly, the certain libraries must be installed for the application to run. Please use the following commands to do so:
pip install mysql-connector-python
pip install csv
pip install datetime
Note that if there are anymore libraries required to be installed it is likely that you can do so by running "pip install "
After getting the database setup and install the libraries you can begin to start using the application. By default your MySql server should have a root user. When you run the application for the first time the application will prompt you for your username and password. You must enter "root" for the username and the corresponding password when it asks for it. Once you have done so you can proceed to using the application as an admin because the "root" user by default has unrestricted access to all of the database. There are two user types in this application and we will discuss them in the following.
Admins are able to use all the features of this application. They can create new users (admin or regular users), query, update, insert, and delete data and edit permissions of regular users. These commands will be discussed later in the documentation.
Users are regular users who cannot create new users. When they are first created by an admin, they do not have any privilege to access the database. The admin must manually grant these permissions. However, if given the permission to do so, regular users may query, update, insert, and delete data from the database.
In this section we will outline all the different commands that are available in this application.
The show command is used to display network statistics to the user. In addition, a csv file will be generated containing the results.
-a: show all data
-flowbytes: show flow bytes data (cannot be used with -a)
-flowflags: show flow flags data (cannot be used with -a)
-flowiat: show inter-arrival times (cannot be used with -a)
-flowinfo: show additional flow info such as time active, time idle, etc (cannot be used with -a)
-flowpackets: show flow packets info (cannot be used with -a)
-protocol: show flow protocol info (cannot be used with -a)
-clients: show all clients (cannot be used with any other options)
-limit <int> : specifiy the limit on the number of results returned (default set to 50)
-filename: <string> specifies the limit on the output filename (defaulted to out.csv)
-d <start date> <end date>: show network data between a specified start date and end date
-sortasc <sort parameter>: sort data in ascending order based on sort parameter (one of 'timestamp', 'protocol_id', 'label', 'bytes_per_second', 'syn_flag_count', 'duration')
NOTE: you can only use bytes_per_second when used in conjuction with -a or -flowbytes
NOTE: you can only use syn_flag_count when used in conjuction with -a or -flowflags
-sortdesc <sort parameter>: sort data in descending order based on sort parameter (one of 'timestamp', 'protocol_id', 'label', 'bytes_per_second', 'syn_flag_count','duration')
NOTE: you can only use bytes_per_second when used in conjuction with -a or -flowbytes
NOTE: you can only use syn_flag_count when used in conjuction with -a or -flowflags
-ddos: show network data that corresponded to a DDOS attack
-source <source ip>: show network data corresponding to a particular source
-dest <destination ip>: show network data corresponding to a particlar destination
#Here are some example show commands
show -a -limit 100 -filename test.csv #shows the earliest 100 flow entries in the database
show -clients #shows all the clients
show -flowbytes -flowflags -sortasc timestamp -d 2017-04-26 11:11:11 2017-04-27 11:11:11 #shows flow byte and flag data between two dates sorted in ascending order based on timestamp
The update command is used to update the information for a pre-existing flow.
flowid: the id for the flow you are trying to update
Following the flowid are the series of attributes followed by the update values.
-source_ip : update the source ip for the given flow
-source_port : update that source port for the given flow
-destination_ip : update the destination ip for the given flow
-destination_port : update the destination port for the given flow
-protocol_id : update the protocol id for the given flow
-timestamp : update the timestamp (eg datetime 2017-04-26 11:11:11) for the given flow
-duration : update the duration for the given flow
-label : update the label for the given flow
-bytes_per_second : update the bytes per second for the given flow
-fwd_bytes_bulk_avg : update the average bytes bulk rate in the forward direction for the given flow
-bwd_bytes_bulk_avg : update the average bytes bulk rate in the backward direction for the given flow
-fwd_subflow_bytes_avg : update the average bytes in a subflow in the forward direction for the given flow
-bwd_subflow_bytes_avg : update the average byutes in a subflow in the backward direction for the given flow
-fwd_init_win_bytes : update the total bytes sent in the initial window in the forward direction for the given flow
-bwd_init_win_bytes : update the total bytes sent in the initial window in the backward direction for the given flow
-fwd_psh_flags : update the number of packets sent in the forward direction that had the PSH flag set to 1 for the given flow
-bwd_psh_flags : update the number of packets sent in the backward direction that had the PSH flag set to 1 for the given flow
-fwd_urg_flags : update the number of packets sent in the forward direction that had the URG flag set to 1 for the given flow
-bwd_urg_flags : update the number of packets sent in the backward direction that had the URG flag set to 1 for the given flow
-fin_flag_count : update the number of packets sent in the flow that had the FIN flag set to 1 for the given flow
-syn_flag_count : update the number of packets sent in the flow that had the SYN flag set to 1 for the given flow
-rst_flag_count : update the number of packets sent in the flow that had the RST flag set to 1 for the given flow
-psh_flag_count : update the number of packets sent in the flow that had the PSH flag set to 1 for the given flow
-ack_flag_count : update the number of packets sent in the flow that had the ACK flag set to 1 for the given flow
-urg_flag_count : update the number of packets sent in the flow that had the URG flag set to 1 for the given flow
-cwe_flag_count : update the number of packets sent in the flow that had the CWE flag set to 1 for the given flow
-ece_flag_count : update the number of packets sent in the flow that had the ECE flag set to 1 for the given flow
-iat_mean : update the mean inter-arrival time of the flow
-iat_std : update the standard inter-arrival time of the flow
-iat_max : update the maximum inter-arrival time of the flow
-iat_min : update the minimum inter-arrival time of the flow
-fwd_iat_total : update the total inter-arrival time in the forward direction of the flow
-bwd_iat_total : update the total inter-arrival time in the backward direction of the flow
-fwd_iat_mean : update the mean inter-arrival time in the forward direction of the flow
-bwd_iat_mean : update the mean inter-arrival time in the backward direction of the flow
-fwd_iat_std : update the standard inter-arrival time in the forward direction of the flow
-bwd_iat_std : update the standard inter-arrival time in the backward direction of the flow
-fwd_iat_max : update the maximum inter-arrival time in the forward direction of the flow
-bwd_iat_max : update the maximum inter-arrival time in the backward direction of the flow
-fwd_iat_min : update the minimum inter-arrival time in the forward direction of the flow
-bwd_iat_min : update the minimum inter-arrival time in the backward direction of the flow
-fwd_header_length : update the forward header length for the given flow
-bwd_header_length : update the backward header length for the given flow
-down_up_ratio : update the download/upload ration for the given flow
-fwd_segment_size_avg : update the average segment size in the forward direction for the given flow
-bwd_segment_size_avg : update the average segment size in the backward direction for the given flow
-fwd_bulk_rate_avg : update the average number of bulk rate in the forward
direction for the given flow
-bwd_bulk_rate_avg : update the average number of bulk rate in the backward
direction for the given flow
-fwd_segment_size_min : update the minimum segment size in the forward direction for the given flow
-active_time_mean : update the mean time the flow was active before becoming idle for the given flow
-active_time_std : update the standard time the flow was active before becoming idle for the given flow
-active_time_max : update the maximum time the flow was active before becoming idle for the given flow
-active_time_min : update the minimum time the flow was active before becoming idle for the given flow
-idle_time_mean : update the mean time the flow was idle for the given flow
-idle_time_std : update the standard time the flow was idle for the given flow
-idle_time_max : update the maximum time the flow was idle for the given flow
-idle_time_min : update the minimum time the flow was idle for the given flow
-fwd_packets : update the number of packets in the forward direction for the given flow
-bwd_packets : update the number of packets in the backward direction for the given flow
-fwd_packets_bytes : update the number of packets in the forward direction in bytes for the given flow
-bwd_packets_bytes : update the number of packets in the backward direction in bytes for the given flow
-fwd_packets_bytes_max : update the maximum value in bytes of a packet in the forward direction for the given flow
-fwd_packets_bytes_min : update the minimum value in bytes of a packet in the forward direction for the given flow
-fwd_packets_bytes_mean : update the mean value in bytes of a packet in the forward direction for the given flow
-fwd_packets_bytes_std : update the standard value in bytes of a packet in the forward direction for the given flow
-bwd_packets_bytes_max : update the maximum value in bytes of a packet in the backward direction for the given flow
-bwd_packets_bytes_min : update the minimum value in bytes of a packet in the backward direction for the given flow
-bwd_packets_bytes_mean : update the mean value in bytes of a packet in the backward direction for the given flow
-bwd_packets_bytes_std : update the standard value in bytes of a packet in the backward direction for the given flow
-packets_per_second : update the packets per second for the given flow
-fwd_packets_per_second : update the packets per second in the forward direction for the given flow
-bwd_packets_per_second : update the packets per second in the backward direction for the given flow
-packet_length_min : update the minimum packet length for the given flow
-packet_length_max : update the maximum packet length for the given flow
-packet_length_mean : update the mean packet length for the given flow
-packet_length_std : update the standard packet length for the given flow
-packet_length_variance : update the packet length variance for the given flow
-packet_size_avg : update the average packet size for the given flow
-fwd_packets_bulk_avg : update the average packets bulk in the forward direction for the given flow
-bwd_packets_bulk_avg : update the average packets bulk in the backward direction for the given flow
-fwd_subflow_packets_avg : update the average packets in subflow in the forward direction for the given flow
-bwd_subflow_packets_avg : update the average packets in subflow in the backward direction for the given flow
-fwd_act_data_packets : update the number of packets in the forward direction with at lease one byte of TCP data payload for the given flow
#Here are some sample update commands
update 420 -duration 9000 #updates the duration for flow 420 with 9000
update 888 -fwd_psh_flags 2 -bwd_psh_flags 2 -psh_flag_count 4 #updates fwd_psh_flags, bwd_psh_flags, and psh_flag_count with 2, 2, and 4 respectively
The delete command deletes all data pertaining to a particular flow id
flowid: the id for the flow you are trying to delete
#Here are example delete commands
delete 333 #delete flow with id 333
The insert command is used to insert new information into the database
infoType: specify the type of data being inserted (one of "flow", "flowbytes", "flowflags", "flowiat", "flowinfo", "flowpackets")
flowdata: depending on the information type that is being inserted, the arguments following the infoType must include all of the information pertaining to the information type (see the update section for the parameters required and note that they must follow the same order)
#Here are example insert commands
insert flow 10.200.7.7 6969 172.19.1.46 42069 1 2017-04-26 11:11:11 4234444 BENIGN #inserts a new flow
insert flowbytes 3830175 12000000.0000000000000000 0.0000 0.0000 12.0000000000000000 0.0000000000000000 490 -1 #inserts information for flow with id 3830175
The createuser command is used to create a new regular user
username: the username of the new user (must be unique)
password: the password of the new user
#example createuser command
createuser martinguo iloveECE356EvenMoreThanECE240 #creates a new user
The createadmin command is the same as createuser but instead creates an admin
username: the username of the new admin (must be unique)
password: the password of the new admin
#example createuser command
createadmin Huanyi best356TA2021 #creates a new admin
The grantuserpermission command grants permissions to a regular user (must be an admin to use this command)
privilege type: one of 'select' (allows viewing), 'update' (allows use of update command), 'insert' (allows use of insert command), and 'delete' (allows use of delete command)
information types: type of information that permission pertains to (one or more of
'flow', 'flowbytes', 'flowflags', 'flowiat', 'flowinfo', 'flowpackets', 'source', 'protocol')
username: username of the user that you are granting permission to
#example grantuserpermission command
grantuserpermission select flow flowbytes MartiniGuo #should allow MartiniGuo to query on flowbytes and flow given the user exists
The revokeuserpermission command works the same way as the grantuserpermission command but instead it revokes permission for that user
privilege type: one of 'select' (revokes viewing), 'update' (revokes use of update command), 'insert' (revokes use of insert command), and 'delete' (revokes use of delete command)
information types: type of information that permission pertains to (one or more of
'flow', 'flowbytes', 'flowflags', 'flowiat', 'flowinfo', 'flowpackets', 'source', 'protocol')
username: username of the user that you are revoking permission from
#example revokeuserpermission command
revokeuserpermission select flow flowbytes MartiniGuo #should remove MartiniGuo's permissions to view flow and flowbytes
All the data mining files can be found in the "Data-Mining" folder. We have included the output results in the folder as well because running the data mining code can be time consuming. Note that running the data mining code should yield the same results. The data mining program requires that the database previously discussed has been created. In addition the following python libraries need to be installed: pandas, numpy, matplotlib, sklearn, mysql.connector, csv, six, IPython.display, and pydotplus. All of the python libraries were installed using pip. The command issued would be "pip install ". However, it was noted that there was an isssue with graphviz which is used by the sklearn library. Please follow the instructions on this link to fix the issue (assumes that you are running Windows).