This repo contains the Java and Python code used for evaluation in "Finding Proofs for Critical Situations Over Sensor Data Streams"
.
├── LICENSE
├── proof_extractor
│ ├── pom.xml
│ ├── proof_output
│ └── src
│ └── main
└── meteor
├── meteor_reasoner
│ ├── __init__.py
│ ├── __pycache__
│ ├── canonical
│ ├── classes
│ ├── datagenerator
│ ├── entailment
│ ├── graphutil
│ ├── materialization
│ ├── stream
│ ├── utils
│ └── version.py
└── tu_dresden
├── data
├── generate_facts.py
├── generate_owl.py
├── json_1
├── output
├── pipeline.py
├── programs
├── proof_extractor.jar
├── results
└── scripts
-
The proof_extractor folder contains build information and the Java code for constructing proofs
-
meteor/ contains MeTeoR reasoner's modules:
- meteor/meteor_reasoner: modified code of the original MeTeoR reasoner to enable tracing
pip install "numpy==1.16.0" "pandas==0.24.0" "urllib3==1.24.0" "scikit-learn==0.20.0" "networkx==2.6.3" "rdflib==4.2.2" "outdated==0.2.0"
pip install -U meteor_reasoner
- meteor/tu_dresden: Code used in the experiments of the paper
Make sure that the "drone-ontologies" repo (https://gitlab.perspicuous-computing.science/project-e2/drone-ontologies.git)
is in the same superdirectory as "meteor-proofs"
From the meteor/tu_dresden directory run:
./scripts/test.sh
We follow similar steps for generating temporal data as proposed in the original project of MeTeoR:
- Download the data generator (UBA) from SWAT Projects - Lehigh University Benchmark (LUBM) website. In particular, we used UBA1.7.
export CLASSPATH="$CLASSPATH:your package path"
==================
USAGES
==================
command:
edu.lehigh.swat.bench.uba.Generator
[-univ <univ_num>]
[-index <starting_index>]
[-seed <seed>]
[-daml]
-onto <ontology_url>
options:
-univ number of universities to generate; 1 by default
-index starting index of the universities; 0 by default
-seed seed used for random data generation; 0 by default
-daml generate DAML+OIL data; OWL data by default
-onto url of the univ-bench ontology
We found some naming and storage issues when using the above command provided by the official documentation. To provide a more user-friendly way, we wrote a script which can be directly used to generate required owl files by passing some simple arguments. An example is shown as follows,
from meteor_reasoner.datagenerator import generate_owl
univ_nume = 1 # input the number of universities you want to generate
dir_name = "./data" # input the directory path used for the generated owl files.
generate_owl.generate(univ_nume, dir_name)
In ./data, you should obtain a serial of owl files like below,
University0_0.owl
University0_12.owl
University0_1.owl
University0_4.owl
.....
Then, we need to convert the owl files to datalog-like facts. We also prepare a script that can be directly used to do the conversion.
from meteor_reasoner.datagenerator import generate_datalog
owl_path = "data" # input the dir_path where owl files locate
out_dir = "./output" # input the path for the converted datalog triplets
generate_datalog.extract_triplet(owl_path, out_dir)
In ./output, you should see a ./output/data containing data in the form of
UndergraduateStudent(ID0)
undergraduateDegreeFrom(ID1,ID2)
takesCourse(ID3,ID4)
undergraduateDegreeFrom(ID5,ID6)
UndergraduateStudent(ID7)
name(ID8,ID9)
......
File path: meteor/meteor_reasoner/datagenerator/add_intervals.py
--datalog_file_path DATALOG_FILE_PATH
--factnum FACTNUM
--intervalnum INTERVALNUM
--unit UNIT
--punctual specify whether we only add punctual intervals
--openbracket specify whether we need to add open brackets
--min_val MIN_VAL
--max_val MAX_VAL
For example, to create a dataset containing 10000 facts and each facts has at most 2 intervals, run the following command
python add_intervals.py --datalog_file_path your_folder/your_datalog_data.txt --factnum 10000 --intervalnum 2
In the datalog/10000.txt, there should be 10000 facts, each of which is in the form P(a,b)@\varrho, and a sample of facts are shown as follows,
UndergraduateStudent(ID0)@[1,18]
undergraduateDegreeFrom(ID1,ID2)@[7,18]
takesCourse(ID3,ID4)@[12,46]
undergraduateDegreeFrom(ID5,ID6)@[21,24]
UndergraduateStudent(ID7)@[3,10]
name(ID8,ID9)@[5,22]
Note: The script only generates new facts of the last materialisation round. Assuming you have a data file containing 10000 facts under **output/10000.txt", to generate facts that are only seen in the 6th round, use:
python generate_facts.py --facts 10000 --iteration 6
The script will write in data/10000.txt the facts entailed from the last materialisation round
For simplicity, the script uses the first fact in the data/10000.txt for evaluation Assuming your data file is under output/ and the facts for entailment checking in data/, to run experiment with tracing enabled, use:
python pipeline.py --facts 10000 --glassbox 1
If tracing is enabled/disabled, the script will write into trace_time.txt/time.txt a new line of the following format(in second):
nr of facts | load time | reasoning time | parse time | total time | Size of D after entailment checking | Size of the Parser
If tracing is enabled and entailment is true, the script will also create a new json file under json_1/10000.json, which will be used by the JAR file to output the proof in .svg format. Assuming you have built the JAR from the proof_extractor folder (using mvn clean package
, needs Evee as a dependency) and put it into the same directory, to output the proof, run:
java -jar proof_extractor-1.0-SNAPSHOT-jar-with-dependencies.jar json_1/10000.json output/proof_10000.json output/proof_10000.svg
The above command will write the proof extracted from the input json file into output/proof_10000.json and an image of the proof into output/proof_10000.svg. Furthermore, it will also output the time taken for finding the proof in a separate file proof_time.txt.