Warcdealer

Warcdealer is an app that uploads Web archive files to the Internet Archive.

Its source code is available at GitHub.

It can be used with grab-site for a complete on-demand Web archival system.

Documentation

Command syntax

Before running Warcdealer, cd to the directory that contains or will contain the files you would like uploaded. Note that those should be the only files present in that directory.

warcdealer

Installation

Download

For Wreathe 7.3, an ebuild (app-misc/crystallize) is available in the Wreathe overlay (this may also work for similar operating systems such as Ututo XS GNU/Linux).

For other operating systems, use the following installation instructions.

  1. Ensure your system has the correct dependencies:
    • Only tested and supported using GNU/Linux operating systems. It may be possible to run Warcdealer using other UNIX-like operating systems, but that is not currently supported.
    • Requires a connection to the World Wide Web able to access the Internet Archive, and GitHub for installation and updates.
    • Requires ia (https://pypi.python.org/pypi/internetarchive) to be installed and to have been configured since being updated to version 1.0.2.
    • hashdeep (probably provided in the md5deep package for your system)
    • Python 2, runnable as python2
    • Git (for installing and updating)
    • Optionally, Warcdealer can be used with grab-site.
    Needed for the optional archive index viewer:
    • Git
    • Python 3, runnable as python3
    • Tornado
    • python-dateutil
  2. If you have a version of Warcdealer prior to version 3 installed, remove it.
  3. Figure out as what user Warcdealer should run. If you're using grab-site, this is either the user as which grab-site is running, or root. Otherwise, this is any user with write permission for the files you would like to upload. (root will work in either case.) When you have selected the correct user, substitute it in place of "userName" in the following command, and run the resulting command (note that output beginning with "mv: cannot stat" can be safely ignored):
    mv ~/.warcdealer.cfg ~/.warcdealer.bak; echo "userName" > ~/.warcdealer.cfg
  4. Decide at what threshold of free disk space, in gigabytes, you would like uploads to begin. A reasonable default for large disks is "80". When you have decided this, substitute that number in place of "diskThreshold" in the following command, and run the resulting command:
    echo "diskThreshold" >> ~/.warcdealer.cfg
  5. Figure out what partition you would like to monitor for free disk space (e. g. / for the root partition). When you know this, substitute it in place of "diskMonitorPartition" in the following command, and run the resulting command:
    echo "diskMonitorPartition" >> ~/.warcdealer.cfg
  6. Decide how frequently, in seconds, you want the free disk space to be checked. A reasonable default is "60". When you have decided this, substitute it in place of "diskCheckInterval" in the following command, and run the resulting command:
    echo "diskCheckInterval" >> ~/.warcdealer.cfg
  7. Figure out to which collection at the Internet Archive you want to upload (upload permission to the collection is necessary). If you don't know otherwise, this should be "opensource". When you have figured this out, substitute it in place of "destinationCollection" in the following command, and run the resulting command:
    echo "destinationCollection" >> ~/.warcdealer.cfg
  8. If you want to disable the archive index feature, run the following command:
    echo "disableArchiveIndex" >> ~/.warcdealer.cfg; set +H

    Otherwise, run the following command:
    echo "enableArchiveIndex" >> ~/.warcdealer.cfg; set +H
  9. As the user you chose in step 2, run the following command:
    warcdealerPreInstallPath=$(pwd); echo "enableWARCBundling" >> ~/.warcdealer.cfg; echo "disableRemoteJobControl" >> ~/.warcdealer.cfg; warcdealerInstallationUUID=$(python2 -c 'import uuid; print str(uuid.uuid4())'); oldWarcdealerInstallationUUID=$(sed '7q;d' "$HOME"/.warcdealer.cfg); warcdealerUUIDRegex='^[0-9a-f]{8}-[0-9a-f]{4}-[4][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'; if [[ $oldWarcdealerInstallationUUID =~ $warcdealerUUIDRegex ]]; then warcdealerInstallationUUID="$oldWarcdealerInstallationUUID"; fi; cd ~; mkdir "$warcdealerInstallationUUID"; cd "$warcdealerInstallationUUID"; git clone https://github.com/ethus3h/warcdealer.git; mkdir -p ~/.warcdealer/pending/multiseed; cp -f ./warcdealer/warcdealer ~/.warcdealers; chmod +x ~/.warcdealers; cp -f ./warcdealer/warcdealer-ia-upload ~/.warcdealer-ia-upload; chmod +x ~/.warcdealer-ia-upload; cp -f ./warcdealer/warcdealer-ia-upload-split ~/.warcdealer-ia-upload-split; chmod +x ~/.warcdealer-ia-upload-split; cp -f ./warcdealer/JobControlAPI.php ~/.warcdealer/JobControlApi.php; chmod +x ~/.warcdealer/JobControlApi.php; cp -f ./warcdealer/warcdealer-aliases ~/.warcdealer-aliases; chmod +x ~/.warcdealer-aliases; cp -f ./warcdealer/viewer.patch ~/.warcdealer-viewer.patch; echo $(date +%Y-%m-%d-%H-%M-%S-%N)-$(xxd -pu <<< "$(date +%z)") >> ~/.warcdealer.cfg; echo "$warcdealerInstallationUUID" >> ~/.warcdealer.cfg; echo "#"'!'"/bin/bash" >> ~/.bash_profile; echo ". ~/.warcdealer-aliases" >> ~/.bash_profile; echo "#"'!'"/bin/bash" >> ~/.bashrc; echo ". ~/.bash_profile" >> ~/.bashrc; chmod +x ~/.bash_profile; chmod +x ~/.bashrc; cd ..; rm -r ./"$warcdealerInstallationUUID"; cd "$warcdealerPreInstallPath"
  10. Choose a passphrase. When you have chosen one, substitute it in place of "passPhrase" in the following command, and run the resulting command:
    perl -0777 -p -i -e "s/ThisIsntEvenMyFinalPassword/$passPhrase/g" ~/.warcdealer/JobControlAPI.php
  11. Log out and log in again, to update your shell aliases.

Updating

  1. Check for changes to the requirements.
  2. Check the Warcdealer version. If the major version has changed, or the release notes specify that it should be reinstalled since the version you have installed, (the part before the first "." in the version number), follow the current installation instructions instead (unless you know you don't need to, e. g. if the release notes specify that the configuration file format is compatible with the major version you have installed).
  3. As the user you chose in step 2 of the installation process, run the following command:
    mkdir -p ~/.warcdealer/pending/multiseed; warcdealerPreInstallPath=$(pwd); warcdealerTemporaryUUID=$(python2 -c 'import uuid; print str(uuid.uuid4())'); cd ~; mkdir "$warcdealerTemporaryUUID"; cd "$warcdealerTemporaryUUID"; git clone https://github.com/ethus3h/ember.git; cp -f ./warcdealer/warcdealer ~/.warcdealers; chmod +x ~/.warcdealers; cp -f ./warcdealer/warcdealer-ia-upload ~/.warcdealer-ia-upload; chmod +x ~/.warcdealer-ia-upload; cp -f ./warcdealer/warcdealer-ia-upload-split ~/.warcdealer-ia-upload-split; chmod +x ~/.warcdealer-ia-upload-split; cp -f ./warcdealer/JobControlAPI.php ~/.warcdealer/JobControlApi.php; chmod +x ~/.warcdealer/JobControlApi.php; cp -f ./warcdealer/warcdealer-aliases ~/.warcdealer-aliases; chmod +x ~/.warcdealer-aliases; cp -f ./warcdealer/viewer.patch ~/.warcdealer-viewer.patch; cd ..; rm -r ./"$warcdealerTemporaryUUID"; cd "$warcdealerPreInstallPath"

Release notes

See the dedicated page.

Configuration file format

Be careful when manually editing the configuration file. Specifically, do NOT use the "disableWARCBundling" option unless you know what you are doing.

The configuration file is located at ~/.warcdealer.cfg. It contains configuration values separated by a line feed (0x0A), as follows:

Note that there is currently no facility for storing configuration values containing line feeds.

Development

To learn about contributing to this project, visit the development page.