A collection of tools for working with files and filesystems.
Note: There is an oddity in the version numbering: version 22.214.171.124.1 was followed immediately by version 126.96.36.199.3 (skipping 188.8.131.52.2); version 184.108.40.206.2 was then released following version 220.127.116.11.3 (and is identical to version 18.104.22.168.3 aside from the version number); normal numbering then resumed with version 22.214.171.124.4, .5, .6 etc. following version 126.96.36.199.2 (skipping version 188.8.131.52.3, as that number was already used).
Crystallize provides the following primary user-facing command-line scripts.
- Archives a snapshot of a file or files and removes it/them (unless --keep is specified), providing an address and/or a pointer file that can be used to retrieve it/them. Synopsis:
crystallize ( ([--version]) | ([--keep] [--] [--leave-pointer] <file>...) )
- Backs up
crystallized files with their associated metadata to the current directory. Synopsis:
- Retrieves files stored using
crystallizeto the current directory, given their address. Synopsis:
decrystallize <crystal-address> [--here]
- Retrieves files stored using
crystallizeto the current directory, given a path to a pointer directory. Does not handle single-file pointers: use sreg_read_stream instead. Synopsis:
decrystallize-pointer <file> [--here]
The following public-facing scripts are either used internally by the other tools and are mainly useful for writing other shell scripts, or are not thoroughly tested.
- Retrieve a configuration value from the configuration file used by Crystallize tools. Synopsis:
- Generate and check against lists of file checksums. If no action is specified,
saveis assumed; if no checksum file name is specified, stdout/stdin are used for
csum [(save|check) [<checksum-file-name>]]
- Retrieve a file stored using
dequicklify (--no-trim-extensions|--trim-extensions) <URL>
- Create an "fcache" cache directory: fcache is a naïve caching layer for non-changing URLs. Synopsis:
fcache_init <directory-name> <cache-size-limit-in-bytes>(if the cache directory already exists, the current size limit will override the one provided as an argument)
- Get an item using the specified "fcache" cache. Synopsis:
fcache_request <cache-directory> <URL>
- Quickly upload a file to the Internet Archive. Synopsis:
- Not ready for production use! Provides tools for managing filesystems. Synopsis:
rubberfs ( ((create|mount|soft-mount|remount|rename|cd|unmount|soft-unmount|attach|check|save|freeze|gc|thaw|patch|status|list|usage-write|destroy|destroy-no-upload|historybak|historypull) [<RubberFS-name>]) | usage | whereami | stub | (stash <file>...) | (delta [<RubberFS-name> [--keep]]) )
- Not ready for production use! Streaming upload to Amazon S3–compatible endpoints, supporting some of the Internet Archive's extensions to S3. Synopsis:
s3-streaming-upload <host-name> <collection> <identifier> <remote-file-name> <file-size-estimate> <title> <description> <keywords> [access-key-id] [secret-access-key](if the access keys are not provided,
s3-streaming-uploadwill attempt to retrieve them from
ia's configuration file)
- Make a clone of the "sreg" (stream registry) stream database where the bodies of the streams are stored instead of the pointers. Synopsis:
- Check whether streams that could not be read in the past and were moved to the Failed Fsck directory have become readable in the meantime, and return them to the database if so. Synopsis:
- Convert LocalStore pointers to finished pointers. Synopsis:
- Go through the hashpointers in the specified directory, make sure that they are present in the stream registry, optionally verify those streams, and optionally remove any streams from the stream registry that are not referenced by the hashpointers in the specified directory. If no directory is specified, the current directory is assumed. Synopsis:
sreg_folder_check [--verify] [--drop-unused] [<directory>]
- Verify that all entries in the sreg stream database can be read correctly. Synopsis:
sreg_fsck [--skip-cache] [--drop-failed]
- Verify that all sreg hash pointers in the specified directory can be read correctly (an alias for
sreg_folder_check --verify). Synopsis:
- Accepts a sreg pointer on stdin, and outputs the corresponding data from the stream registry. If a checksum is provided on the command line, the retrieved data will be checked against it. Synopsis:
sreg_read_stream [--checksum <checksum>] [--disallow-hash-pointer] [--skip-cache]
- Stores data provided on stdin into the stream registry, and sends a pointer to it to stdout. A checksum, if one is known for the stream, can be provided on the command line for a slight performance improvement. Synopsis:
sreg_store_stream [--assume-checksum <checksum>]
- Mount and unmount a translation FUSE filesystem for a folder containing hash pointers. Synopsis:
srfs (mount|unmount) [<root-folder-name>]
- Copy the first argument into the second and replace any enclosed sreg pointers with their contents. The
--replaceoption controls whether files that exist in the destination are overwritten (due to the implementation, with
--replace, even files that do not need to be overwritten will be replaced) (files that exist in the destination that do not exist in the source will not be removed). Synopsis:
srpull [--replace] <source-path> <destination-directory>
- Copy the first argument into the second and replace non-pointerized or out-of-date files in the destination with their pointers. Synopsis:
srsync [--verify|--no-verify] <source-path> <destination-directory>
In addition, Crystallize also provides the following scripts that it uses internally that are not supported for independent use.
- Set up the bash environment shared by Crystallize tools. Synopsis:
- Part of
crystallizethat uploads the data body (not metadata). Synopsis:
crystallize-internal-ia(needs specific environment variables set)
- Part of
crystallizethat compresses a JSON metadata file. Synopsis:
crystallize-internal-xz-b(needs specific environment variables set)
- The main logic for
crystallize-logsession <log-file> <crystal-address> <file>...(needs specific environment variables set)
- Create a "localstorecache" cache directory: localstorecache is a naïve caching layer for LocalStore crystals (variant of "fcache"). Synopsis:
localstorecache_init <directory-name> <cache-size-limit-in-bytes>(if the cache directory already exists, the current size limit will override the one provided as an argument)
- Get an item (returned as a file path) using the specified "localstorecache" cache. Synopsis:
localstorecache_request <cache-directory> <crystal-address>
- Drop old items from the specified (s/f/localstore)cache. Defaults to scache. Synopsis:
scache_gc <cache-directory> [s|f|localstore]
- Mount a FUSE filesystem overlay for sreg. Synopsis:
sreg_fuse <mount-point> <sreg-directory-to-mount>
- Given a LocalStore pointer, replace it with a remote pointer. Synopsis:
sregi_bundle_pointer <path-to-remote-pointer-data> <path-to-pointer-to-replace> <crystalWorkdir-config-value>
- Check that the specified pointer not in the stream registry database can be retrieved, and if so, move it into the stream registry database. If a tracking file (should contain only an integer) is specified, the file's value will be incremented. Synopsis:
sregi_check_failed_entry <path-to-pointer> [tracking-file] [--skip-cache]
- Copy the first argument (must be a file) to the first argument appended to the destination folder, and replace it with a sreg pointer. If a tracking file (should contain only an integer) is specified, the file's value will be incremented.
characters-to-trimis the number of characters to remove from the source filename to give the location of the destination file relative to the enclosing destination directory. Synopsis:
sregi_copy_write [--no-verify] <path-to-file> <destination-folder> <tracking-file> <characters-to-trim>
- Remove the specified pointer from the stream registry if it is not listed in the specified ID list (newline-separated list of pointer IDs). Synopsis:
sregi_drop_single_unused <path-to-pointer> <path-to-ID-list> <tracking-file>
- For each file specified, if it is a sreg pointer, then replace it with the pointer's contents. Synopsis:
sregi_expand_pointers <tracking-file> <file>...
- Accepts a sreg database entry as an argument, and replaces the entry with the entry's contents (but does nothing if this has already been done). Synopsis:
- Check that the pointer corresponding to the specified hash pointer (it's OK to toss pretty much any files at it — hash pointers, other pointers, and files that aren't pointers at all; non–hash pointer pointers will have hash pointers generated on demand for testing, and non-pointer files will be ignored and reported as success: this allows checking folders containing a mix of file types) is present in the stream registry, and add the specified hash pointer to the specified ID list for use by
sregi_drop_single_unused. If instead of a file name "-" is given, the input to check will be read from standard input (use "./-" to check a file named "-" in the current directory). Synopsis:
sregi_hashpointer_sane [<path-to-hashpointer>|-] [<path-to-ID-list> <tracking-file> [--verify]]
- Check that the specified pointer can be retrieved. If a tracking file (should contain only an integer) is specified, the file's value will be incremented. Synopsis:
sregi_verify_entry <path-to-pointer> [tracking-file] [--quick] [--skip-cache|--skip-drop-failed|--drop-failed]
crystallize-bash_setup provides these bash functions.
- Wrapper around the
rubberfsscript: this function should be used instead. Synopsis:
rubberfs ( ((create|mount|soft-mount|remount|rename|cd|unmount|soft-unmount|attach|check|save|freeze|delta|gc|thaw|patch|status|list|usage-write|destroy|destroy-no-upload|historybak|historypull) [RubberFS name]) | usage | whereami | stub | (stash <file>...) )
- ???? Synopsis:
- ???? Synopsis:
- Mount a FUSE filesystem overlay for sreg (wrapper around sreg_fuse.py). Synopsis:
- Basic sanity check for a stream registry entry given a preexisting variable
knownChecksumcorresponding to the entry to check. Synopsis:
- Retrieves and prints the checksum from a pointer. Synopsis:
For Wreathe 7.3, an ebuild (
app-misc/crystallize) is available in the Wreathe overlay (this may also work for similar operating systems such as Ututo XS GNU/Linux).
For other operating systems, use the following installation instructions.
Instructions for installation without ebuild
Wreathe 7.3 is required for full support. The simple invocations of the 'crystallize' and 'decrystallize' commands (with filenames as the only arguments) are also supported on Ubuntu GNU/Linux and macOS 10.12 in the interest of promoting the preservation of knowledge (although Ember strongly advises not using non-libre software such as those operating systems), and will probably work on many other UNIX-like operating systems; these instructions only cover that basic support. This is not as well tested as using the software in Wreathe. Please report issues if you encounter them.
- ember-shared (required during Crystallize installation as well as at runtime)
- An account at the non-profit Internet Archive
- ia (https://pypi.python.org/pypi/internetarchive) 1.0.2 or later (ia must be configured by running
ia configuresince installing 1.0.2 or later before Crystallize can be used)
- a root login or administrator/sudo privileges for your local computer, both for installation and use of all tools except quickliquid and dequicklify.
- wget 1.14 or later
- sha512sum (from GNU coreutils)
- readlink (GNU coreutils version or compatible)
- sponge (from moreutils)
- A recent version of bash
- at least one of:
While using full functionality is not supported without using the ebuild, the following additional requirements are needed for it (this list is probably incomplete/incorrect).
- GNU userland (or compatible)
To download Crystallize, run:
git clone https://github.com/ethus3h/crystallize.git
To install the downloaded scripts, run
cd crystallize; make.
Edit the configuration file; see the "Configuration file format" section for documentation of this.
sudo make install.
Configuration file format
The configuration file is located in your system configuration directory (probably
/etc), and is named
crystallize.conf. It is a list of key-value pairs in the format
Key,Value, separated by a line feed (0x0A), as follows:
- Installation UUID
- Internet Archive collection identifier (write access to the collection is required)
- Passphrase (must be a valid GPG passphrase)
- Directory for working data (should be writeable and have sufficient free space to hold approximately three times the amount of data being crystallized at any one time)
- Path to a directory tree of the format used by the Ember Library
Note that there is currently no facility for storing configuration values containing line feeds.
To learn about contributing to this project, visit the development page.