A collection of RDF libraries for JavaScript
graphy
available from npm.
npm i -g graphy
The graphy
CLI works by pushing RDF data through a series of internal transforms, starting with a single input on stdin
(or instead, multiple inputs) and ending with a single output on stdout
. This internal pipeline feature allows for efficient, high-bandwidth transformations of RDF data.
Usage: graphy [OPTIONS] COMMAND [ / COMMAND]* [--inputs FILE...]
Table of Contents:
tree
– Put all quads into a tree data structure to remove duplicatescanonical
– Canonicalize a set of quads using RDF Dataset Normalization Algorithm (URDNA2015) [alias: canonicalize]union
– Compute the set union of 1 or more inputsintersect
– Compute the set intersection of 1 or more inputs [alias: intersection]diff
– Compute the set difference between 2 inputs [alias: difference]minus
– Subtract the second input from the first: A - B [alias: subtraction]equals
– Test if 2 inputs are equivalent [alias: equal]disjoint
– Test if 2 inputs are completely disjoint from one anothercontains
– Test if the first input completely contains the second [alias: contain]-e, --examples
– Print some examples and exit-h, --help
– Print a help message and exit-v, --version
– Print the version info and exit--show-stack-trace
– Show the stack trace when printing error messagesread
[OPTIONS]
Read RDF content, i.e., deserialize it.
Stream Multiplicity:
N-to-N<string,
QuadStream
>
– maps 1 or more input streams of utf-8 encoded strings into 1 or more output streams of Quad objects.Options:
-c, --content-type
– either an RDF Content-Type or format selector (defaults to ‘trig’).-b, --base, --base-uri
– sets the starting base URI for the RDF document, see more here.-r, --relax
– relax validation of tokens for trusted input sources to improve read speeds, see more here.Examples:
# validate an N-Triples document
$ graphy read -c nt < input.nt > /dev/null
# print line-delimited JSON of quads in N-Quads document
$ graphy read -c nq < input.nq
# validate a Turtle document
$ graphy read -c ttl < input.ttl > /dev/null
# print line-delimited JSON of quads in TriG document while validating it
$ graphy read -c trig < input.trig
scan
[OPTIONS]
Scan RDF content, i.e., deserialize it and do stuff using multiple threads.
EXPERIMENTAL! The
scan
verb is currently experimental.
Stream Multiplicity:
N-to-N<string,
QuadStream
>
– maps 1 or more input streams of utf-8 encoded strings into 1 or more output streams of Quad objects.Options:
-c, --content-type
– either an RDF Content-Type or format selector (defaults to ‘trig’).-r, --relax
– relax validation of tokens for trusted input sources to improve read speeds, see more here.--threads
– manually set the total number of threads to use (including the main thread).Examples:
# validate an N-Triples document
$ graphy scan -c nt < input.nt > /dev/null
# print line-delimited JSON of quads in N-Quads document
$ graphy scan -c nq < input.nq
# count the number of statements in an N-Triples document (bypass validation)
$ graphy scan -c nt --realx / count < input.nt
# convert an N-Triples document into Turtle
$ graphy scan -c nt / scribe -c ttl < input.nt > output.ttl
scribe
[OPTIONS]
Scribe RDF content, i.e., serialize it, fast (and possibly ugly).
Stream Multiplicity:
N-to-N<
QuadStream
, string>
– maps 1 or more input streams of Quad objects into 1 or more output streams of utf-8 encoded strings.Options:
-c, --content-type
– either an RDF Content-Type or format selector (defaults to ‘trig’).Examples:
# convert a Turtle document into N-Triples
$ cat input.ttl | graphy read -c ttl / scribe -c nt > output.nt
# convert a TriG document into N-Quads
$ cat input.trig | graphy read -c trig / scribe -c nq > output.nq
# convert an N-Triples document into Turtle
$ cat input.nt | graphy read -c nt / scribe -c ttl > output.ttl
# convert an N-Quads document into TriG
$ cat input.nq | graphy read -c nq / scribe -c trig > output.trig
# convert an N-Triples document into RDF/XML
$ cat input.nq | graphy read -c nt / scribe -c xml > output.rdf
write
[OPTIONS]
Write RDF content, i.e., serialize it, in style (pretty-print).
NOTE: If no serialization format is specified with the
-c
option, the output format will default to TriG with the simplified default graph option enabled, meaning that the output will also be Turtle-compatible if all quads written belong to the default graph.
Stream Multiplicity:
N-to-N<
QuadStream
, string>
– maps 1 or more input streams of Quad objects into 1 or more output streams of utf-8 encoded strings.Options:
-c, --content-type
– either an RDF Content-Type or format selector (defaults to ‘trig’).-i, --indent
– sets the whitespace string to use for indentation. Writers use '\t'
by default.-g, --graph-keyword
– sets the string to use when serializing the optional 'GRAPH'
keyword in TriG. Writers omit this keyword by default. Using this flag as a boolean (i.e., by passing 'true'
or nothing) is shorthand for the all-caps 'GRAPH'
keyword.-s, --simplify-default-graph
— if enabled, omits serializing the surrounding optional graph block for the default graph in TriG.-f, --first
– c1 string: sets the predicate to use for the ‘first’ relation when serializing list structures.-r, --rest
– c1 string: sets the predicate to use for the ‘rest’ relation when serializing list structures.-n, --nil
– c1 string: sets the predicate to use for the ‘nil’ relation when serializing list structures.Examples:
# convert a Turtle document into N-Triples
$ cat input.ttl | graphy read -c ttl / write -c nt > output.nt
# convert a TriG document into N-Quads
$ cat input.trig | graphy read -c trig / write -c nq > output.nq
# convert an N-Triples document into Turtle
$ cat input.nt | graphy read -c nt / write -c ttl > output.ttl
# convert an N-Triples document into Turtle (equivalent to above)
$ cat input.nt | graphy read -c nt / write > output.trig
# convert an N-Quads document into TriG
$ cat input.nq | graphy read -c nq / write -c trig > output.trig
# convert an N-Quads document into TriG (equivalent to above)
$ cat input.nq | graphy read -c nq / write > output.trig
skip
[size=1] [OPTIONS]
Skip over some amount of data (quads by default) for each input stream before piping the remainder as usual.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Arguments:
size
– the number of things to skipOptions:
-q, --quads, -t, --triples
– skip the given number of quads-s, --subjects
– skip quads until the given number of distinct subjects have been encounteredExamples:
# skip the first 1 million quads
$ graphy read / skip 1e6 / write < in.ttl > out.ttl
# skip the first 50 subjects
$ graphy read / skip 50 --subjects / write < in.ttl > out.ttl
head
[size=1] [OPTIONS]
Limit the number of quads that pass through by counting from the top of the stream.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Arguments:
size
– the number of things to emitOptions:
-q, --quads, -t, --triples
– emit only the given number of quads from the top of a stream-s, --subjects
– emit quads until the given number of distinct subjects have been encountered from the top of a streamExamples:
# skim the first 1 million quads from the top
$ graphy read / head 1e6 / write < in.ttl > out.ttl
# skim the first 50 subjects from the top
$ graphy read / head 50 --subjects / write < in.ttl > out.ttl
tail
[size=1] [OPTIONS]
Limit the number of quads that pass through by counting from the bottom of the stream.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Arguments:
size
– the number of things to emitOptions:
-q, --quads, -t, --triples
– emit only the given number of quads from the bottom of a stream-s, --subjects
– emit quads contained by the given number of distinct subjects from the bottom of a streamExamples:
# tail the last 1 million quads
$ graphy read / tail 1e6 / write < in.ttl > out.ttl
# tail the last 50 subjects
$ graphy read / tail 50 --subjects / write < in.ttl > out.ttl
filter
[OPTIONS]
Filter quads using either a Quad Filter Expression or JavaScript expression.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Options:
-x, --expression
– filter quads using the given Quad Filter Expression-j, --javascript
– filter quads using the given JavaScript expression which will be evaluated as a callback function passed the quad and current prefix map as argumentsExamples:
# filter by subject: 'dbr:Banana_split' using prefix mappings embedded in document
$ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x 'dbr:Banana_split'
# filter by predicate: 'rdf:type' alias
$ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x '; a'
# select quads that *do not have* the predicate: 'owl:sameAs' _nor_ `dbo:wikiPageRedirects`
$ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x '; !(owl:sameAs or dbo:wikiPageRedirects)'
# filter by object: '"Banana"@en'
$ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x ';; "Banana"@en'
# filter by graph using absolute IRI ref
$ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x ';;; <http://ex.org/some-absolute-graph-iri>'
transform
[OPTIONS]
Apply a custom transform function to each quad in the stream(s). Notice that for each quad that the transform function is applied to, it may yield zero, one, or many quads as output (i.e., the function is one-to-many).
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Options:
-j, --javascript
– transform quads using the given JavaScript expression which will be evaluated as a callback function passed the quad and current prefix map as argumentsThe callback function has the signature: callback(ConvenientQuad, hash<PrefixID, IriString>)
Where ConvenientQuad extends
Quad
with the following properties:
.s
– shorthand for the .subject
property.p
– shorthand for the .predicate
property.o
– shorthand for the .object
property.g
– shorthand for the .graph
propertyThe callback return value can be any of the following types:
null
, undefined
, false
or otherwise falsy (e.g., 0
, empty string, etc.) – ignore this quadArray<SomeTerm>
– with the subject at position [0]
, the predicate at position [1]
, the object at position [2]
and optionally the graph at position [3]
.
SomeTerm
is either an AnyTerm
or a #string/c1
.Quad
– simply a quad objectWritableDataEvent<
#hash/c3
|
#hash/c4>
>
– using the function identifier c3()
or c4()
(defined for you in the upper-scope) to wrap the return value#string/trig
– return any valid TriG string (which is also a superset of N-Triples, N-Quads, and Turtle)Examples:
# materialize the inverse owl:sameAs relations
$ graphy read / filter -x '; owl:sameAs' / transform -j 't => [t.o, t.p, t.s]'
# reify all statements
$ graphy read / transform -j 'triple => c3({
[">http://demo.org/"+factory.hash(triple)]: {
a: "rdf:Statement",
"rdf:subject": triple.subject,
"rdf:predicate": triple.predicate,
"rdf:object": triple.object,
},
})' / write
concat
Concatenate quads from all input streams in order.
Stream Multiplicity:
N-to-1<
QuadStream
,
QuadStream
>
– reduces 1 or more input streams of Quad objects into exactly 1 output stream of Quad objects.merge
Merge quads from all input streams without order.
Stream Multiplicity:
N-to-1<
QuadStream
,
QuadStream
>
– reduces 1 or more input streams of Quad objects into exactly 1 output stream of Quad objects.tree
Puts all quads thru a tree data structure to remove duplicates.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).canonical
Puts all quads thru a tree data structure to remove duplicates.
Stream Multiplicity:
N-to-N<
QuadStream
,
QuadStream
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of Quad objects, or WritableDataEvent objects, depending on the capabilities of the destination stream(s).Example:
# canonicalize
$ graphy read -c ttl / canonical / write -c ttl \
< input.ttl \
> output.ttl
union
[OPTIONS]
Compute the union of all inputs.
Stream Multiplicity:
N-to-1<
QuadStream
,
QuadStream
>
– reduces 1 or more input streams of Quad objects into exactly 1 output stream of Quad objects.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# perform a union on all *.ttl files inside `data/` directory
$ graphy read -c ttl / union / write -c ttl \
--inputs input/*.ttl \
> union.ttl
intersect
[OPTIONS]
Performs the intersection of all inputs.
intersection
is also an alias
Stream Multiplicity:
N-to-1<
QuadStream
,
QuadStream
>
– reduces 1 or more input streams of Quad objects into exactly 1 output stream of Quad objects.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# perform an intersection on all *.ttl files inside `data/` directory
$ graphy read -c ttl / intersect / write -c ttl \
--inputs input/*.ttl \
> intersection.ttl
diff
[OPTIONS]
Compute the difference between the two inputs.
difference
is also an alias
Stream Multiplicity:
2-to-1<
QuadStream
,
QuadStream
>
– joins exactly 2 input streams of Quad objects into exactly 1 output stream of Quad objects.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# compute the isomorphic difference between two files
$ graphy read -c ttl / diff / write -c ttl \
--inputs a.ttl b.ttl \
> canonical-difference.ttl
minus
[OPTIONS]
Subtracts the second input from the first.
subtract
andsubtraction
are also aliases
Stream Multiplicity:
2-to-1<
QuadStream
,
QuadStream
>
– joins exactly 2 input streams of Quad objects into exactly 1 output stream of Quad objects.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# subtract `input/dead.ttl` from `union.ttl`
$ graphy read -c ttl / minus / write -c ttl \
--inputs union.ttl input/dead.ttl \
> leftover.ttl
equals
[OPTIONS]
Tests for equality between the two inputs.
equal
is also an alias
Stream Multiplicity:
2-to-1<
QuadStream
,
ResultValueStream<Boolean>
>
– joins exactly 2 input streams of Quad objects into exactly 1 output stream of a single boolean
value.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# test if `before.ttl` and `after.ttl` are strictly equal
$ graphy read -c ttl / equals --strict \
--inputs before.ttl after.ttl
# test if `before.ttl` and `after.ttl` are isomorphically equivalent
$ graphy read -c ttl / equals \
--inputs before.ttl after.ttl
disjoint
[OPTIONS]
Tests for disjointess between the two inputs.
Stream Multiplicity:
2-to-1<
QuadStream
,
ResultValueStream<Boolean>
>
– joins exactly 2 input streams of Quad objects into exactly 1 output stream of a single boolean
value.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# test if `apples.ttl` and `oranges.ttl` are strictly disjoint
$ graphy read -c ttl / disjoint --strict \
--inputs apples.ttl oranges.ttl
contains
[OPTIONS]
Tests if the first input contains the second.
Stream Multiplicity:
2-to-1<
QuadStream
,
ResultValueStream<Boolean>
>
– joins exactly 2 input streams of Quad objects into exactly 1 output stream of a single boolean
value.Options:
--strict
– if true, forgoes canonicalization before the set operationExample:
# test if `superset.ttl` strictly contains `subset.ttl`
$ graphy read -c ttl / contains --strict \
--inputs superset.ttl subset.ttl
count
Count the number of events in each steam
Stream Multiplicity:
N-to-N<
QuadStream
,
ResultValueStream<Number>
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of number
values.distinct
[OPTIONS]
Count the number of distinct things, such as quads, triples, subjects, etc.
Stream Multiplicity:
N-to-N<
QuadStream
,
ResultValueStream<Number>
>
– maps 1 or more input streams of Quad objects into 1 or more output streams of number
values.Options:
-q, --quads
– count the number of distinct quads (default)-t, --triples
– count the number of distinct triples by ignoring the graph component-s, --subjects
– count the number of distinct subjects-p, --predicates
– count the number of distinct predicates-o, --objects
– count the number of distinct objects-g, --graphs
– count the number of distinct graphshelp
Alias for $ graphy --help
. Print the help message and exit.
version
Alias for $ graphy --version
. Print the version info and exit.
examples
Alias for $ graphy --examples
. Print some examples and exit.
Options you can pass to the main graphy command that print some information and exit:
-e, --examples
– Print some examples and exit-h, --help
– Print the help message and exit-v, --version
– Print the version info and exitConfigure certain options for the process:
--show-stack-trace
– Show the stack trace when printing error messagesBy default, graphy
expects a single input stream on stdin
, which it will forward through the internal pipeline. Some commands may allow for or even expect multiple inputs (e.g., for computing the difference between two datasets).
--inputs FILE ...
If you are simply piping in multiple input files, you can use the --inputs
options like so:
$ graphy read -c ttl / diff / write -c ttl \
--inputs original.ttl modified.ttl \
> difference.ttl
Keep in mind that each command has its own restrictions on the number of inputs it accepts, which may also depend on the operation being performed (e.g., diff
expects exactly 2 input streams while union
accepts 1 or more).
If you need to execute other commands before passing in multiple inputs, you can use process substitution (supported in bash) like so:
$ DBPEDIA_EN_URL="http://downloads.dbpedia.org/2016-10/core-i18n/en"
$ graphy read -c ttl / union / write -c ttl \
--inputs \
<(curl "$DBPEDIA_EN_URL/topical_concepts_en.ttl.bz2" | bzip2 -d) \
<(curl "$DBPEDIA_EN_URL/uri_same_as_iri_en.ttl.bz2" | bzip2 -d) \
> union.ttl
A stream of utf8-encoded strings. This always applies to stdin
and stdout
.
A stream of Quad objects.
A stream of WritableDataEvent objects.
Automatically determines which mode is best suited for the destination stream. Compatible with QuadStream, WritableDataEventStream and StringStream. In the case of StringStream, each object is converted to its JSON equivalent on a single line, followed by a newline '\n'
(i.e., Line-delimited JSON).
A stream that will emit a single 'data'
event which is the result of some test or computation (e.g., a single boolean
or number
value). Compatible with StringStream, in which case the value will be converted to JSON and then terminated by a newline '\n'
.