Commands

The Nautilus Librarian console commands.

Gold Images Processing

Gold images are a type of image in terms of its purpose, defined on the Nautilus Filename Specification.

It is a command to handle Gold image changes in an image dataset (we call it "Library").

Description

This command allows you to process all the changes in a library repository affecting Gold images.

Sample usage:

nautilus-librarian gold-images-processing \
    --previous-ref PREVIOUS_REF \
    --current-ref  CURRENT_REF

Where PREVIOUS_REF and CURRENT_REF ar the git first and last commit references defining the list of commits you want to process.

If you handle a dataset of images following the Nautilus Filename Specification, there is a special type of image called "Gold" image. That is the first artifact after the acquired media, for example via scanning.

Chinese Ideographs is a sample library which follows the Nautilus Filename Specification. It contains some Chinese drawings related to Chinese ideographs.

In the data folder you can see all the images:

data
├── 000000
│   ├── 32
│   │   ├── 000000-32.600.2.tif
│   │   └── 000000-32.600.2.tif.dvc
│   └── 52
│       ├── 000000-52.600.2.tif
│       └── 000000-52.600.2.tif.dvc
...
├── 000007
│   ├── 32
│   │   ├── 000007-32.600.2.tif
│   │   └── 000007-32.600.2.tif.dvc
│   └── 52
└── 000008
    ├── 32
    │   ├── 000008-32.600.2.tif
    │   └── 000008-32.600.2.tif.dvc
    └── 52

27 directories, 32 files

In fact, it does not contain the image itself but the "pointer" to the file in the remote DVC storage (.dvc extensions). DVC is a wrapper on top of Git to version and store binary files. It is an alternative to Git LFS. You can get (pull) the real images from the remote DVC storage into your local file system and they are ignored in the Git repository.

DVC has a command similar to git diff called dvd diff which give you a list of the changes between two commits.

For example, if you clone that repo:

git clone https://github.com/Nautilus-Cyberneering/chinese-ideographs
cd chinese-ideographs

and you run this command:

dvc diff --json 420ea8d 6e9878f

You will obtain this json object (it's has been truncated here):

{                                                                     
  "added": [],
  "deleted": [
    {
      "path": "data/000000/32/000000-32.600.2.tif"
    },
    {
      "path": "data/000008/32/000008-32.600.2.tif"
    }
  ],
  "modified": [
    {
      "path": "data/000001/32/000001-32.600.2.tif"
    }
  ],
  "renamed": []
}

This means that some images were deleted and one image was modified.

The gold-images-processing command helps you to handle all changes related to Gold images. Gold images are identified by their purpose code 32. We know that the modified image is a Gold image because the second code in the name is 32, following the art work ID: 000001-32.600.2.tif.

The main goal for this command is to generate Base images automatically. When you add a new Gold image to a library that image is usually too big, and very often you do not need a very high resolution image. Base images are a second type of image that have a lower resolution and can be used for a lot of use cases. The gold-images-processing command automatically generates and keep synced the set of Base images.

The gold-images-processing command also helps to keep the library clean and tidy. This is the list of all tasks.

  • Get new or modified Gold images using dvc diff command.
  • Pull images that are going to be processed from DVC remote storage.
  • Validate filenames. Make sure the filename follows the Nautilus Filename Specification.
  • Validate filepaths. Make sure the file is in the right folder.
  • Validate image size.
  • Generate Base image from Gold (change size and ICC profile).
  • Auto-commit new or changed Base images.

The way you usually use this command is by invoking it on a GitHub (or other CI/CD tool) workflow. You can see an example here.

Example of invoking the command in a GitHub workflow:

- name: Run librarian gold image processing command
run: |
    nautilus-librarian gold-images-processing --previous-ref ${{ env.PREVIOUS_REF }}  --current-ref ${{ env.CURRENT_REF }}
env:
    AZURE_STORAGE_ACCOUNT: ${{ secrets.AZURE_STORAGE_ACCOUNT }}
    AZURE_STORAGE_SAS_TOKEN: ${{ secrets.AZURE_STORAGE_SAS_TOKEN }}

Arguments

Name Description Env Var
previous_ref The a_rev in the dvd diff command. See: dvc diff --help. NL_PREVIOUS_REF
current_ref The b_rev in the dvd diff command. See: dvc diff --help. NL_CURRENT_REF

The previous_ref and current_ref are the positional arguments passed to the dvc diff command:

a_rev  Old Git commit to compare (defaults to HEAD)
b_rev  New Git commit to compare (defaults to the current workspace)

Options

Name Optional Description Default Env Var
git_user_name yes Committer name for automatically generated git commits Git global config NL_GIT_USER_NAME
git_user_email yes Committer email for automatically generated git commits Git global config NL_GIT_USER_EMAIL
git_user_signingkey yes GPG signing key ID to sign commits Git global config NL_GIT_USER_SIGNINGKEY
git_repo_dir yes The directory where the Git repository is located Current working dir NL_GIT_REPO_DIR
min_image_size yes Minimum Gold image size in pixels for width and height 256 NL_MIN_IMAGE_SIZE
max_image_size yes Maximum Gold image size in pixels for width and height 16384 NL_MAX_IMAGE_SIZE
base_image_size yes Size for the longer dimension of the Base image 512 NL_BASE_IMAGE_SIZE
dvc_diff yes Alternative diff to overwrite previous_ref and current_ref arguments NL_DVC_DIFF
dvc_remote yes The name of the remote DVC storage in the .dvc\config file. See dvc remote command. None NL_DVC_REMOTE
gnupghome yes GPG env var to overwrite default --homedir. More info. ~/.gnupg GNUPGHOME

GPG is used to sign commits. Git relays on GPG to sign commits. You have to setup your GPG configuration in order to sign commits. Right now signing commits is mandatory but we plan to make it optional.

Default Git configuration for commits is also obtained from Git global configuration. We are also planning to change that and let the user preset that configuration before calling this command.

Extra environment variables

Name Optional Description
AZURE_STORAGE_ACCOUNT no Your Azure Storage Account
AZURE_STORAGE_SAS_TOKEN no Your SAS token

You can use environment variables instead of arguments and options, but some env vars are exclusively env vars.

Some of them, like AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_SAS_TOKEN, are used by DVC to access the remote storage.

You can find instructions about how to setup a storage for DVC on the DVC documentation.

There is also a specific tutorial for adding remote Azure Storage for DVC.