________ .__ __________.___ ________ / _____/___________ ______ | |__\______ \ |/ _____/ / \ __\_ __ \__ \ \____ \| | \| | _/ / \ ___ \ \_\ \ | \// __ \| |_> > Y \ | \ \ \_\ \ \______ /__| (____ / __/|___| /______ /___|\______ / \/ \/|__| \/ \/ \/
GraphBIG provides a set of datasets with various data types and sizes (refer to wiki for more details). However, it is also possible to use your own datasets. In most cases, it requires only negligible efforts to use external graph datasets.
There're two major requirements for the datasets:
- data format: the data format has to be standard CSV file with header.
- graph representation: in the data files, the graph should be represented in the form of edge list, that is, each line has a pair of vertices, representing an edge.
When using default configurations, GraphBIG requires a bit more things (but, that can be changed):
- vertex list: besides edge list, also needs a vertex list named as "vertex.csv"
- separator: the csv file separator by default is "|"
- rename your edge file as "edge.csv". If you have a vertex file, rename it as "vertex.csv".
- put "edge.csv" and/or "vertex.csv" in the same directory and use the path as the parameter for "--dataset" argument. If your dataset doesn't have vertex list, enable the "EDGES=1" compile flag.
- if your dataset files are not using "|" as separator, specify your separator in the argument "--separator".
- make sure that in the edge file, source vertex is in column 0 and destination vertex is in column 1.
- GPU workloads need a compact dataset format. GraphBIG provides a tool (csr_bench/tool_genCSR) that can generator the required data from standard CSV format (previously mentioned "edge.csv" and/or "vertex.csv").