A static snapshot of the code used for the paper is available here for reproducability purposes. The code uses two packages under "Apache License 2.0": Apache CLI and Guava. The code also uses the NxParser under a New BSD licence. The most recent version of the code is released as the BLabel library on GitHub under Apache License 2.0.
The synthetic graphs used are available here in a ZIP file (these graphs were selected from here). If you wish to run these experiments locally:
cl.uchile.dcc.skolem.cli.RunSyntheticEvaluation
:
ant dist
).-d
, e.g., -d ~/skolem/test/eval/
.-t
, passing a value in seconds (the default is 600 seconds, per the paper).-Xmx1G
). We also set 100M for stack (-Xss100M
).The real-world experiments were run over the BTC-2014 dataset, available here. Since this requires processing 4 billion triples (over a 1TB of uncompressed data), to replicate these experiments, you will need a moderate sized machine:
data*
files (e.g., zcat */data*gz | gzip -c > data-all.nq.gz
)java -jar -Xmx$$G nxparser-1.2.4.jar -i data-all.nq.gz -igz -o data-all.3012.nq.gz -ogz -so 3012 2> sort.log
where you should make sure to replace $$
with a large amount of RAM to avoid creating too many intermediate batch files).cl.uchile.dcc.skolem.cli.Control -i data-all.3012.nq.gz -igz 2> control.log > control.std
(remember to set a reasonable about of RAM; we used 30G but that much is not necessary).cl.uchile.dcc.skolem.cli.ComputeCanonicalGraphs -i data-all.3012.nq.gz -igz -s $$ 2> canon.log > canon.std
(replacing $$ with the ID for the hashing scheme; run with -h
to see options; remember to set a reasonable about of RAM ... we used 30G but that much is not necessary).