A static snapshot of the code used for the paper is available here for reproducability purposes. The code uses two packages under "Apache License 2.0": Apache CLI and Guava. The code also uses the NxParser under a New BSD licence. The most recent version of the code is released as the BLabel library on GitHub under Apache License 2.0.
The synthetic graphs used are available here in a ZIP file (these graphs were selected from here). If you wish to run these experiments locally:
cl.uchile.dcc.blabel.cli.RunSyntheticEvaluation
:
build.xml
file is available if you wish to build a .jar (ant dist
).-d
, e.g., -d ~/skolem/test/eval/
.-t
, passing a value in seconds (the default is 600 seconds, per the paper).-Xmx1G
). We also set 100M for stack (-Xss100M
).dist
folder relative to the script/batch, with the synthetic graphs in the eval
folder).:: Test framework (correctness checks) java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 4 -d eval/ -t 600 > test-t600.tsv 2> test-t600.err :: Label java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 1 -d eval/ -t 600 -s 1 > label-t600-s1.tsv 2> label-t600-s1.err :: Label-NoPrune java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 1 -d eval/ -t 600 -s 1 -nlabel > label-t600-s1-n.tsv 2> label-t600-s1-n.err :: DFS+Label java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 2 -d eval/ -t 600 -s 1 -l 0 > both-t600-s1-dfs.tsv 2> both-t600-s1-dfs.err :: DFS-Rand+Label java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 2 -d eval/ -t 600 -s 1 -l 0 -r > both-t600-s1-dfs-r.tsv 2> both-t600-s1-dfs-r.err :: DFS-NoPrune+Label java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 2 -d eval/ -t 600 -s 1 -l 0 -r -nlean > both-t600-s1-dfs-n.tsv 2> both-t600-s1-dfs-n.err :: BFS+Label java -jar -Xmx1G -Xss10M dist/blabel.jar RunSyntheticEvaluation -b 2 -d eval/ -t 600 -s 1 -l 1 > both-t600-s1-bfs.tsv 2> both-t600-s1-bfs.err
The real-world experiments were run over the BTC-2014 dataset, available here. Since this requires processing 4 billion triples (over a 1TB of uncompressed data), to replicate these experiments, you will need a moderate sized machine:
data*
files (e.g., zcat */data*gz | gzip -c > data-all.nq.gz
)java -jar -Xmx$$G nxparser-1.2.4.jar Sort -i data-all.nq.gz -igz -o data-all.3012.nq.gz -ogz -so 3012 2> sort.log
where you should make sure to replace $$
with a large amount of RAM to avoid creating too many intermediate batch files).cl.uchile.dcc.blabel.cli.RunNQuadsTest
; you can call the class with -h
to get an explanation of all arguments.# Test framework java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 4 -t 600 -e btc14/test/error/ > test.tsv 2> test.err # Control java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 3 -t 600 > control.tsv 2> control.err # Labelling MD5 java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 1 -i data-all.3012.nq.gz -igz -s 0 -t 600 -e btc14/label-s0/ > label-s0.tsv 2> label-s0.err # Labelling Murmur java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 1 -i data-all.3012.nq.gz -igz -s 1 -t 600 -e btc14/label-s1/ > label-s1.tsv 2> label-s1.err # Labelling Sha1 java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 1 -i data-all.3012.nq.gz -igz -s 2 -t 600 -e btc14/label-s2/ > label-s2.tsv 2> label-s2.err # Labelling Murmur wo/ pruning java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 1 -i data-all.3012.nq.gz -igz -s 1 -t 600 -e btc14/label-s1-np/ -nlabel > label-s1-np.tsv 2> label-s1-np.err # DFS Standard java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 0 -i data-all.3012.nq.gz -igz -t 600 -e btc14/lean-s1-dfs/ -l 0 > lean-s1-dfs.tsv 2> lean-s1-dfs.err # DFS Random order java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 0 -i data-all.3012.nq.gz -igz -t 600 -e btc14/lean-s1-dfs-r/ -l 0 -r > lean-s1-dfs-r.tsv 2> lean-s1-dfs-r.err # DFS wo/ pruning java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 0 -i data-all.3012.nq.gz -igz -t 600 -e btc14/lean-s1-dfs-n/ -l 0 -nlean > lean-s1-dfs-n.tsv 2> lean-s1-dfs-n.err # BFS java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 0 -i data-all.3012.nq.gz -igz -t 600 -e btc14/lean-s1-bfs/ -l 1 > lean-s1-bfs.tsv 2> lean-s1-bfs.err # DFS Standard + Label java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 2 -i data-all.3012.nq.gz -igz -s 1 -t 600 -e btc14/both-s1-dfs/ -l 0 > both-s1-dfs.tsv 2> both-s1-dfs.err # DFS Standard + Label (count duplicate equiv graphs) java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 2 -i data-all.3012.nq.gz -igz -s 1 -t 600 -e btc14/both-s1-dfs-d/ -d -l 0 > both-s1-dfs-d.tsv 2> both-s1-dfs-d.err # Label (count duplicate iso graphs) java -jar -Xmx30G -Xss100M dist/blabel.jar RunNQuadsTest -i btc14/data-all.3012.nq.gz -igz -b 1 -i data-all.3012.nq.gz -igz -s 1 -t 600 -e btc14/both-s1-dfs-d/ -d > label-s1-d.tsv 2> label-s1-d.err