graphy.js

A collection of RDF libraries for JavaScript

View the Project on GitHub

« Home / Command Line Interface

This document describes the command-line interface for the binary graphy available from npm.

npm i -g graphy


Internal Pipeline

The graphy CLI works by pushing RDF data through a series of internal transforms, starting with a single input on stdin (or instead, multiple inputs) and ending with a single output on stdout. This internal pipeline feature allows for efficient, high-bandwidth transformations of RDF data.

Usage: graphy [OPTIONS] COMMAND [ / COMMAND]* [--inputs FILE...]

Table of Contents:


Commands

read [OPTIONS]

Read RDF content, i.e., deserialize it.

Stream Multiplicity:

Options:

Examples:

   # validate an N-Triples document
   $ graphy read -c nt < input.nt > /dev/null

   # print line-delimited JSON of quads in N-Quads document
   $ graphy read -c nq < input.nq

   # validate a Turtle document
   $ graphy read -c ttl < input.ttl > /dev/null

   # print line-delimited JSON of quads in TriG document while validating it
   $ graphy read -c trig < input.trig

scan [OPTIONS]

Scan RDF content, i.e., deserialize it and do stuff using multiple threads.

EXPERIMENTAL! The scan verb is currently experimental.

Stream Multiplicity:

Options:

Examples:

   # validate an N-Triples document
   $ graphy scan -c nt < input.nt > /dev/null

   # print line-delimited JSON of quads in N-Quads document
   $ graphy scan -c nq < input.nq

   # count the number of statements in an N-Triples document (bypass validation)
   $ graphy scan -c nt --realx / count < input.nt

   # convert an N-Triples document into Turtle
   $ graphy scan -c nt / scribe -c ttl < input.nt > output.ttl

scribe [OPTIONS]

Scribe RDF content, i.e., serialize it, fast (and possibly ugly).

Stream Multiplicity:

Options:

Examples:

   # convert a Turtle document into N-Triples
   $ cat input.ttl | graphy read -c ttl / scribe -c nt > output.nt

   # convert a TriG document into N-Quads
   $ cat input.trig | graphy read -c trig / scribe -c nq > output.nq

   # convert an N-Triples document into Turtle
   $ cat input.nt | graphy read -c nt / scribe -c ttl > output.ttl

   # convert an N-Quads document into TriG
   $ cat input.nq | graphy read -c nq / scribe -c trig > output.trig

   # convert an N-Triples document into RDF/XML
   $ cat input.nq | graphy read -c nt / scribe -c xml > output.rdf

write [OPTIONS]

Write RDF content, i.e., serialize it, in style (pretty-print).

NOTE: If no serialization format is specified with the -c option, the output format will default to TriG with the simplified default graph option enabled, meaning that the output will also be Turtle-compatible if all quads written belong to the default graph.

Stream Multiplicity:

Options:

Examples:

   # convert a Turtle document into N-Triples
   $ cat input.ttl | graphy read -c ttl / write -c nt > output.nt

   # convert a TriG document into N-Quads
   $ cat input.trig | graphy read -c trig / write -c nq > output.nq

   # convert an N-Triples document into Turtle
   $ cat input.nt | graphy read -c nt / write -c ttl > output.ttl

   # convert an N-Triples document into Turtle (equivalent to above)
   $ cat input.nt | graphy read -c nt / write > output.trig

   # convert an N-Quads document into TriG
   $ cat input.nq | graphy read -c nq / write -c trig > output.trig

   # convert an N-Quads document into TriG (equivalent to above)
   $ cat input.nq | graphy read -c nq / write > output.trig

skip [size=1] [OPTIONS]

Skip over some amount of data (quads by default) for each input stream before piping the remainder as usual.

Stream Multiplicity:

Arguments:

Options:

Examples:

   # skip the first 1 million quads
   $ graphy read / skip 1e6 / write < in.ttl > out.ttl

   # skip the first 50 subjects
   $ graphy read / skip 50 --subjects / write < in.ttl > out.ttl

head [size=1] [OPTIONS]

Limit the number of quads that pass through by counting from the top of the stream.

Stream Multiplicity:

Arguments:

Options:

Examples:

   # skim the first 1 million quads from the top
   $ graphy read / head 1e6 / write < in.ttl > out.ttl

   # skim the first 50 subjects from the top
   $ graphy read / head 50 --subjects / write < in.ttl > out.ttl

tail [size=1] [OPTIONS]

Limit the number of quads that pass through by counting from the bottom of the stream.

Stream Multiplicity:

Arguments:

Options:

Examples:

   # tail the last 1 million quads
   $ graphy read / tail 1e6 / write < in.ttl > out.ttl

   # tail the last 50 subjects
   $ graphy read / tail 50 --subjects / write < in.ttl > out.ttl

filter [OPTIONS]

Filter quads using either a Quad Filter Expression or JavaScript expression.

Stream Multiplicity:

Options:

Examples:

   # filter by subject: 'dbr:Banana_split' using prefix mappings embedded in document
   $ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x 'dbr:Banana_split'

   # filter by predicate: 'rdf:type' alias
   $ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x '; a'

   # select quads that *do not have* the predicate: 'owl:sameAs' _nor_ `dbo:wikiPageRedirects`
   $ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x '; !(owl:sameAs or dbo:wikiPageRedirects)'

   # filter by object: '"Banana"@en'
   $ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x ';; "Banana"@en'

   # filter by graph using absolute IRI ref
   $ curl http://dbpedia.org/data/Banana.ttl | graphy read / filter -x ';;; <http://ex.org/some-absolute-graph-iri>'

transform [OPTIONS]

Apply a custom transform function to each quad in the stream(s). Notice that for each quad that the transform function is applied to, it may yield zero, one, or many quads as output (i.e., the function is one-to-many).

Stream Multiplicity:

Options:

The callback function has the signature: callback(ConvenientQuad, hash<PrefixID, IriString>) Where ConvenientQuad extends Quad with the following properties:

The callback return value can be any of the following types:

Examples:

   # materialize the inverse owl:sameAs relations
   $ graphy read / filter -x '; owl:sameAs' / transform -j 't => [t.o, t.p, t.s]'

   # reify all statements
   $ graphy read / transform -j 'triple => c3({
       [">http://demo.org/"+factory.hash(triple)]: {
           a: "rdf:Statement",
           "rdf:subject": triple.subject,
           "rdf:predicate": triple.predicate,
           "rdf:object": triple.object,
       },
     })' / write

concat

Concatenate quads from all input streams in order.

Stream Multiplicity:

merge

Merge quads from all input streams without order.

Stream Multiplicity:

tree

Puts all quads thru a tree data structure to remove duplicates.

Stream Multiplicity:

canonical

Puts all quads thru a tree data structure to remove duplicates.

Stream Multiplicity:

Example:

   # canonicalize 
   $ graphy read -c ttl / canonical / write -c ttl   \
       < input.ttl                                   \
       > output.ttl

union [OPTIONS]

Compute the union of all inputs.

Stream Multiplicity:

Options:

Example:

   # perform a union on all *.ttl files inside `data/` directory
   $ graphy read -c ttl / union / write -c ttl   \
       --inputs input/*.ttl                      \
       > union.ttl

intersect [OPTIONS]

Performs the intersection of all inputs.

intersection is also an alias

Stream Multiplicity:

Options:

Example:

   # perform an intersection on all *.ttl files inside `data/` directory
   $ graphy read -c ttl / intersect / write -c ttl   \
       --inputs input/*.ttl                          \
       > intersection.ttl

diff [OPTIONS]

Compute the difference between the two inputs.

difference is also an alias

Stream Multiplicity:

Options:

Example:

   # compute the isomorphic difference between two files
   $ graphy read -c ttl / diff / write -c ttl   \
       --inputs a.ttl b.ttl                     \
       > canonical-difference.ttl

minus [OPTIONS]

Subtracts the second input from the first.

subtract and subtraction are also aliases

Stream Multiplicity:

Options:

Example:

   # subtract `input/dead.ttl` from `union.ttl`
   $ graphy read -c ttl / minus / write -c ttl   \
       --inputs  union.ttl  input/dead.ttl       \
       > leftover.ttl

equals [OPTIONS]

Tests for equality between the two inputs.

equal is also an alias

Stream Multiplicity:

Options:

Example:

   # test if `before.ttl` and `after.ttl` are strictly equal
   $ graphy read -c ttl / equals --strict   \
       --inputs before.ttl after.ttl

   # test if `before.ttl` and `after.ttl` are isomorphically equivalent
   $ graphy read -c ttl / equals   \
       --inputs before.ttl after.ttl

disjoint [OPTIONS]

Tests for disjointess between the two inputs.

Stream Multiplicity:

Options:

Example:

   # test if `apples.ttl` and `oranges.ttl` are strictly disjoint
   $ graphy read -c ttl / disjoint --strict   \
       --inputs apples.ttl oranges.ttl

contains [OPTIONS]

Tests if the first input contains the second.

Stream Multiplicity:

Options:

Example:

   # test if `superset.ttl` strictly contains `subset.ttl`
   $ graphy read -c ttl / contains --strict   \
       --inputs superset.ttl subset.ttl

count

Count the number of events in each steam

Stream Multiplicity:

distinct [OPTIONS]

Count the number of distinct things, such as quads, triples, subjects, etc.

Stream Multiplicity:

Options:

help

Alias for $ graphy --help. Print the help message and exit.

version

Alias for $ graphy --version. Print the version info and exit.

examples

Alias for $ graphy --examples. Print some examples and exit.


Informational Options

Options you can pass to the main graphy command that print some information and exit:

Process Options

Configure certain options for the process:


Inputs

By default, graphy expects a single input stream on stdin, which it will forward through the internal pipeline. Some commands may allow for or even expect multiple inputs (e.g., for computing the difference between two datasets).

--inputs FILE ...

If you are simply piping in multiple input files, you can use the --inputs options like so:

$ graphy read -c ttl / diff / write -c ttl   \
    --inputs original.ttl modified.ttl       \
	  > difference.ttl

Keep in mind that each command has its own restrictions on the number of inputs it accepts, which may also depend on the operation being performed (e.g., diff expects exactly 2 input streams while union accepts 1 or more).

Process Substitution

If you need to execute other commands before passing in multiple inputs, you can use process substitution (supported in bash) like so:

$ DBPEDIA_EN_URL="http://downloads.dbpedia.org/2016-10/core-i18n/en"
$ graphy read -c ttl / union / write -c ttl   \
    --inputs \
      <(curl "$DBPEDIA_EN_URL/topical_concepts_en.ttl.bz2" | bzip2 -d) \
      <(curl "$DBPEDIA_EN_URL/uri_same_as_iri_en.ttl.bz2" | bzip2 -d) \
    > union.ttl


Classes

class StringStream

A stream of utf8-encoded strings. This always applies to stdin and stdout.

class QuadStream

A stream of Quad objects.

class WritableDataEventStream

A stream of WritableDataEvent objects.

class AnyDestination adapts to QuadStream, WritableDataEventStream, StringStream

Automatically determines which mode is best suited for the destination stream. Compatible with QuadStream, WritableDataEventStream and StringStream. In the case of StringStream, each object is converted to its JSON equivalent on a single line, followed by a newline '\n' (i.e., Line-delimited JSON).

class ResultValueStream adapts to StringStream

A stream that will emit a single 'data' event which is the result of some test or computation (e.g., a single boolean or number value). Compatible with StringStream, in which case the value will be converted to JSON and then terminated by a newline '\n'.