A static snapshot of the code used for the paper is available here for reproducability purposes. The code uses two packages under "Apache License 2.0": Apache CLI and Guava. The code also uses the NxParser under a New BSD licence. The most recent version of the code is released as the BLabel library on GitHub under Apache License 2.0.
The synthetic graphs used are available here in a ZIP file (these graphs were selected from here). If you wish to run these experiments locally:
-t, passing a value in seconds (the default is 600 seconds, per the paper).
-Xmx1G). We also set 100M for stack (
The real-world experiments were run over the BTC-2014 dataset, available here. Since this requires processing 4 billion triples (over a 1TB of uncompressed data), to replicate these experiments, you will need a moderate sized machine:
zcat */data*gz | gzip -c > data-all.nq.gz)
java -jar -Xmx$$G nxparser-1.2.4.jar -i data-all.nq.gz -igz -o data-all.3012.nq.gz -ogz -so 3012 2> sort.logwhere you should make sure to replace
$$with a large amount of RAM to avoid creating too many intermediate batch files).
cl.uchile.dcc.skolem.cli.Control -i data-all.3012.nq.gz -igz 2> control.log > control.std(remember to set a reasonable about of RAM; we used 30G but that much is not necessary).
cl.uchile.dcc.skolem.cli.ComputeCanonicalGraphs -i data-all.3012.nq.gz -igz -s $$ 2> canon.log > canon.std(replacing $$ with the ID for the hashing scheme; run with
-hto see options; remember to set a reasonable about of RAM ... we used 30G but that much is not necessary).