Skip to content

Freely available data for machine learning applied to telecommunications

Ailton Oliveira edited this page Jan 5, 2021 · 21 revisions

Reference

If you use any data or code, please cite:

[1] "5G MIMO Data for Machine Learning: Application to Beam-Selection using Deep Learning", Aldebaro Klautau, Pedro Batista, Nuria Gonzalez-Prelcic, Yuyang Wang and Robert W. Heath Jr, ITA'2018 (available at http://ita.ucsd.edu/workshop/18/files/paper/paper_3313.pdf).

Bibtex entry:
@inproceedings{Klautau18,
  author    = {Aldebaro Klautau and Pedro Batista and Nuria Gonzalez-Prelcic and Yuyang Wang and Robert W. {Heath Jr.}},
  title     = {{5G} {MIMO} Data for Machine Learning: Application to Beam-Selection using Deep Learning},
  booktitle = {2018 Information Theory and Applications Workshop, San Diego},
  pages     = {1--1},
  year      = {2018},
  url       = {http://ita.ucsd.edu/workshop/18/files/paper/paper_3313.pdf}
}

Research databases

Read about the datasets in Raymobtime Other Datasets can be Downloaded here This directory contains a table with the specifications of the simulations, and figures of the scenarios colored by height. Informations and code to interpret the HDF5: Download here

5GMdata as SQLAlchemy/SQLite databases (outputs of Stage 2)

urban_cannyon_v2i.5gmv1

Name: urban_cannyon_v2i.5gmv1 download here. Creation date: Feb. 02, 2018.

Information about dataset

This dataset (see reference [1]) corresponds to mmWave (60 GHz) simulated channel data in a V2I setup. It has 116 episodes with 50 scenes each, totaling 5,800 ray-tracing simulations that led to 41,023 channels between a transmitter and mobile receiver. There is 1 transmitter (Tx, at the RSU) and up to 10 receivers (Rx, antennas on top of cars) per scene. When a receiver is not present in the analysis region, its data is NaN. If all 116 episodes had 50 scenes with 10 valid receivers, the total number of valid Tx / Rx pairs (with a channel characterized by 25 rays) would be 116 x 50 x 10 = 58,000. But there are only 41,023 valid pairs in this dataset given that in some cases the receiver of interest is not within the analysis region (e.g., it turned and is not in the observed street anymore). The sampling time (interval among scenes in an episode) was 100 ms. Tx and Rx antennas are omnidirectional. For each pair of Tx and Rx, the strongest 25 rays were collected.

5GMdata as files in formats friendly to machine learning softwares or further processing (outputs of Stage 3)

urban_canyon_v2i_5gmv1_rays

Name: urban_canyon_v2i_5gmv1_rays. Creation date: Mar. 05, 2018.

The urban_canyon_v2i_5gmv1_rays dataset was created from urban_cannyon_v2i.5gmv1 using convert5gmv1ToChannels.py. See more information in the description of urban_cannyon_v2i.5gmv1. The urban_canyon_v2i_5gmv1_rays dataset basically has the information about the 25 rays for each Tx / Rx pair, such that one can e.g. estimate MIMO channels. The information in urban_cannyon_v2i_5gmv1_rays is split into 116 files, with one episode per file. The code parse_urban_canyon_v2i_5gmv1_rays.m shows how to interpret the dataset.

Version using HDF5 files (e.g., for Matlab):

urban_canyon_v2i_5gmv1_rays.hdf5.zip download here

Version using NPZ files (numPy arrays) for Python:

urban_canyon_v2i_5gmv1_rays.npz.zip Download here

Raw data from ray-tracing (Remcom Wireless Insite) simulations

Some of the ray-tracing simulators outputs (the .p2m output files, etc.) corresponding to urban_cannyon_v2i.5gmv1 are available here: allresults-09022017.zip.