Using pepcli

The pepcli application is the primary command line interface (CLI) application to interact with the PEP system. It is available for multiple platforms, and is included in PEP's client Docker images and in the Windows client software installer. Among pepcli's functionalities are the ability to upload and download data, and to administer the PEP system.

The use of command line utilities such as pepcli is subject to details of the platform on which it is used. For example, a literal * (asterisk) parameter value must be escaped to \* on Linux to prevent shell expansion ("globbing"). Such details are not (extensively) covered in this documentation. Users are expected to be knowledgeable enough about their platforms to perform basic tasks and avoid common pitfalls.

General usage

The pepcli utility must be invoked from a command line, with parameters telling it what to do. The general form of invocation is

pepcli [general flags] <COMMAND> [flags] [parameters...]

The various commands are documented in some detail on this page. The general flags are documented separately. Some commands have subcommands; some subcommands have sub-sub-commands, and so on:

pepcli [general flags] <COMMAND> <SUBCOMMAND>                 [flags] [parameters...]
pepcli [general flags] <COMMAND> <SUBCOMMAND> <SUBSUBCOMMAND> [flags] [parameters...]

The abilities and options of underlying commands are documented with the parent commands to which they apply.

Command line help

The pepcli application provides command line help if it is invoked without parameters, or with the --help switch:

pepcli         # Produces command line help
pepcli --help  # Produces command line help

The --help switch is also supported by most (or all?) of pepcli's commands and subcommands. This can be used to "drill down" through command line help to construct an appropriate command line, e.g. by sequentially invoking:

pepcli --help # Output includes the _ama_ command, so we then issue:
pepcli ama --help # Output includes the _query_ subcommand, so we then issue:
pepcli ama query --help # Output mentions the _\--column-group_ switch, so we then issue:
pepcli ama query --column-group ShortPseudonyms # The completed command line

Enrollment

Most of pepcli's commands will connect to one or more of the PEP servers, and most server requests will require the user to be enrolled. There are two primary methods get enrolled for pepcli usage:

  • Have an OAuth token issued and present it to pepcli's --oauth-token switch. Contact your PEP support contact for more information on obtaining such a prefab OAuth token.
  • From the directory where you intend to use pepcli, run the pepLogon utility and log on interactively. Your enrollment data will be stored to a file and will remain valid for a period of 12 hours. During this period, use pepcli without its --oauth-token switch to perform commands in the role for which you logged on.

Note that prefab OAuth tokens are usually issued for longer validity periods (think months rather than hours), e.g. making them usable to execute pepcli from automatic server processes.

General flags

The pepcli utility's general flags can be used to indicate how to connect to and enroll with the PEP system:

  • --client-working-directory specifies a directory containing configuration files specifying how to connect to PEP's servers. If not specified, these configuration files are assumed to be located in the directory containing the pepcli executable file. Users of PEP client Docker images should specify a --client-working-directory /config to use the configuration files included in the image.
  • --client-config-name specifies the name of the main (client) configuration file. If not specified, the file is assumed to be named ClientConfig.json.
  • --oauth-token specifies an OAuth token to be used for enrollment, or the path to a file containing such an OAuth token. If not specified, the user is assumed to have been enrolled prior to pepcli's invocation, e.g. by means of the pepLogon utility.

For brevity, these general flags will not be mentioned in the documentation of individual commands, or in the examples below. But they may be included with any command issued to pepcli, e.g.:

pepcli --oauth-token /PATH/TO/OAuthToken.json --client-working-directory /PATH/TO/config-directory list -C \* -P \*

Other general flags exist, but are intended for use by developers of the PEP system. While mentioned in pepcli's command line help, they are not further documented here.

Commands

The pepcli utility supports commands for various tasks, aimed at different types of users:

General purpose:

Data storage and retrieval:

  • get retrieves data from a specific cell.
  • list lists data available in PEP.
  • pull stores a data set in files on your local machine.
  • store stores data in a specific cell.

Administrative tasks:

ama

The ama command's various sub-commands can be used to perform administrative tasks. While ama is short for "Access Manager Administration", it should be noted that ama provides subcommands for both the Access Administrator and Data Administrator roles. Users must be enrolled for the role appropriate for the subcommand they're invoking.

ama cgar

The cgar subcommand is short for "column group access rule". It allows Access Administrator to determine the types of access that access groups have to column groups:

pepcli ama cgar create <column group name> <access group name> <mode>
pepcli ama cgar remove <column group name> <access group name> <mode>

The column group must have been previously created by a Data Administrator using the pepcli ama columnGroup subcommand. The mode parameter must be either read or write, indicating the type of access to grant or revoke.

After using the cgar subcommand, the rule immediately takes effect. Users enrolled for the specified access group will immediately be granted (or denied) access to the specified column group.

ama column

The column subcommand allows Data Administrator to create and remove columns:

pepcli ama column create <column name>
pepcli ama column remove <column name>

Because of technical limitations, PEP column names may contain only printable ASCII characters. Additional restrictions apply to the names of columns into which Castor data are imported @@@ link and/or describe @@@.

Note that column removal will not discard data present in those columns; it will merely make the column's contents inaccessible. Therefore:

  • when users retrieve data from an earlier moment in time, those data may include columns that have since been removed.
  • when a column is removed and later re-added, the newly added column will contain any data that had previously been stored in the same column name.

The column subcommand can also be used to group and un-group columns into column groups:

pepcli ama column addTo      <column name> <column group name>
pepcli ama column removeFrom <column name> <column group name>

When columns are added to a column group, those columns immediately become available to users who can access the column group.

Column groups can be created and removed using the pepcli ama columnGroup subcommand. Access Administrator can grant access to column groups using the pepcli ama cgar subcommand.

ama columnGroup

The columnGroup subcommand allows Data Administrator to create and remove column groups:

pepcli ama columnGroup create <column group name>
pepcli ama columnGroup remove <column group name>

Because of technical limitations, column group names may contain only printable ASCII characters. Note that some column groups are predefined and/or automatically managed by PEP software.

Once a column group has been created, use the pepcli ama column subcommand to determine which columns are included in the group. Access Administrator can grant access to column groups using the pepcli ama cgar subcommand.

ama group

The group subcommand allows Data Administrator to create and remove participant groups:

pepcli ama group create <participant group name>
pepcli ama group remove <participant group name>

The group subcommand also allows to add and remove participants to and from participant groups:

pepcli ama group addTo      <participant group name> <participant>
pepcli ama group removeFrom <participant group name> <participant>

There are multiple ways to specify <participant>:

  • A PEP-id
  • A Local Pseudonym. This is the pseudonym that is used as directory name, when doing pepcli pull (this will be replaced by the shorter Participant Alias with the pep release planned in Februari or March 2023)
  • A Participant Alias . This is a shorter version, derived from (cropping the) Local Pseudonym . (formerly called user pseudonym)

ama group auto-assign

Automatically update participant groups based on participant assignment to study contexts, and participants being marked as test participants. The command

  • assigns every non-test participant to the all participant group, and
  • for every study context that a non-test participant belongs to, assigns that participant to a participant group named all-[STUDY_CONTEXT], and
  • for every participant group whose name is all or starts with all-, removes participants from that group if they should not (or no longer) be assigned to that group, and
  • removes empty participant groups whose name is all or starts with all-.
pepcli ama group auto-assign

By default the command only outputs the (re-)assignments that would be performed, but doesn't update the configuration. To actually perform the (re-)assignments, invoke the command with the --wet switch (indicating that it's not a "dry" run).

Use the command's --mapname switch to create all-[MAPPED_REPLACEMENT] groups instead of groups whose names correspond with a raw context name. Multiple --mapnames may be specified with a single invocation.

ama query

The query subcommand summarizes the current state of PEP's data structure and access rules. Both the Access Administrator and Data Administrator roles can invoke:

pepcli ama query

The output lists

  • The Columns that have been defined by data administrator.
  • The ColumnGroups that have been defined by data administrator, and the columns included in each column group.
  • The ColumnGroupAccessRules that have been defined by access administrator, i.e. which access groups have what type(s) of access to which column groups.
  • The (participant) Groups that have been defined by data administrator.
  • The (participant) GroupAccessRules that have been defined by access administrator, i.e. which access groups have what type(s) of access to which participant groups.

castor

The castor command's various sub-commands are used to interact with PEP's Castor integration functionality. Most commands are intended for (and restricted to) use by Data Administrator to configure the system for the import of Castor data.

castor column-name-mapping

To prevent the Castor import from using overly long and/or difficult to interpret column names, Data Administrator can use the pepcli castor column-name-mapping command to define column name mappings. PEP's import routine will then import Castor data into columns named after the mappings rather than after the raw Castor names. The command's subcommands provide basic CRUD operations. The "read" operations can be performed by any enrolled user:

  • pepcli castor column-name-mapping list lists all configured column name mappings.
  • pepcli castor column-name-mapping read <castor> reports the mapping that is applied to Castor entities whose name matches <castor>. If no such mapping exists, the output will be empty.

Manipulation of column name mappings requires enrollment as a Data Administrator:

  • pepcli castor column-name-mapping create <castor> <pep> creates a new column name mapping, causing PEP to use the specified <pep> replacement for Castor entities whose name matches <castor>.
  • pepcli castor column-name-mapping update <castor> <pep> specifies a new <pep> replacement for an existing column name mapping for the specified <castor> name.
  • pepcli castor column-name-mapping delete <castor> removes an existing column name mapping for the specified <castor> name. Subsequent import runs will revert to using the Castor name to determine the name of the column where data will be stored.

castor create-import-columns

This command ensures that all columns exist that will be needed when the specified Castor data are imported into PEP. Since the import process cannot create columns itself, Data Administrator can use this command to create them beforehand. When invoked without further parameters, the command creates missing columns for all Castor data that will be imported. Each created column is also automatically added to the Castor column group, ensuring that the column is writable to the import process:

pepcli castor create-import-columns

The above switch-less invocation may lack information required to create import columns for some short pseudonym columns. These will be reported, and the command can then be invoked with switches to create columns for data import from affected Castor studies:

  • the --sp-column switch specifies the column containing short pseudonym values that correspond with the Castor study's record IDs.
  • when data are to be imported from surveys that are answered multiple times by participants, the --answer-set-count must be used to indicate how many answer sets are to be expected. For example, if participants will answer a survey a maximum of 10 times, specify --answer-set-count 10 to have the command create sufficient column names for all survey answers to be imported.

E.g. when a Castor study is bound to short pseudonym column ShortPseudonym.Covid.Castor.CovidQuestionnaires, and that study contains survey packages that are answered up to 30 times:

pepcli castor create-import-columns --sp-column ShortPseudonym.Covid.Castor.CovidQuestionnaires --answer-set-count 30

castor export

@@@ more here @@@

castor list-import-columns

This command lists the columns that are needed when the specified Castor data are imported into PEP. Since the import process cannot create columns itself, Data Administrator must create required columns beforehand. (S)he then uses this command to find out which columns are needed. The command must be invoked with the --sp-column switch, specifying the column containing short pseudonym values that correspond with the Castor study's record IDs. E.g.:

pepcli castor list-import-columns --sp-column ShortPseudonym.Visit1.Castor.HomeQuestionnaires`

While the output lists the columns that will be needed by the import process, missing columns are not created by this command. Use the castor create-import-columns command instead to have the columns created instead of just listed. Alternatively, Data Administrator can castor list the columns, then invoke ama column's create and addTo commands manually to create the columns and group them into the Castor column group.

The castor list-import-columns command accepts an optional switch:

  • the --answer-set-count is used for Castor studies containing surveys that are answered multiple times by participants. PEP imports such survey data into numbered columns, and the switch indicates the number of answer sets to expect. For example, if participants will answer a survey a maximum of 10 times, specify --answer-set-count 10 to have the command list sufficient column names for all survey answers to be imported.

castor list-sp-columns

When invoked without further parameters, this command lists the names of all short pseudonym columns that refer to a Castor study. This means that the values in the listed columns are the Castor record IDs for the corresponding participant.

The command accepts an optional --imported-only switch. If specified, the command outputs only those columns that are processed when importing data from Castor into PEP. These are the column names that can be passed to the --sp-column switch of the castor list-import-columns and castor create-import-columns subcommands.

get

When the pepcli list command has produced IDs referring to data, the associated data can be retrieved using the pepcli get command:

pepcli get -t <ticket file> -i <identifier> -o <output file>

The flags are:

  • -t The ticket you stored with the -T flag of the list command
  • -i The identifier you got from pepcli list
  • -o The file to write the output to. - indicates stdout. This is the default.

Note that the pepcli get command is not capable of data re-pseudonymization. Data from columns requiring such processing should be retrieved using pepcli pull instead.

list

Use the pepcli list command to determine which data is available in PEP:

pepcli list -C <column group> -P <participant group> -T <ticket out file>

The command outputs its information as a JSON array with one entry per subject. Every such entry contains the subject's polymorphic pseudonym, plus an array listing the columns in which data is stored for the subject. Depending on switches passed to the command, small data may be inlined, i.e. included in the command's output. For larger entries, the output will include an ID instead of the data itself. Such IDs can then be passed to the pepcli get command to retrieve the associated data.

Important flags are:

  • -C Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group * that contains all columns.
  • -c Specific column to list data for. Can be repeated, and combined with -C if you want multiple columns and column groups
  • -P Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group * that contains all participants.
  • -p Specific participant to list data for. Can be repeated, and combined with -P if you want multiple participants and participant groups
  • -l Include the local pseudonyms in the output. By default pepcli will only show polymorphic pseudonyms (PP). These are not constant, and cannot be used to see whether data belongs to the same participant. You need the local pseudonyms (LP) for that.
  • -T The first thing PEP does when you interact with it, is checking whether you have access to the participant(group)s and column(group)s you request. If you do have access, it will hand out a ticket. You can store this ticket with the -T flag, to use it for later actions.
  • -t You can pass a ticket from an earlier request with the -t flag. The column(group)s and participant(group)s of this request must be a subset of the earlier request.
  • -s The size limit (in bytes) for data that should be inlined, i.e. be included the the list command's output. Currently defaults to 1000. Setting this to 0 means that data will ALWAYS be inlined
  • --no-inline-data Never inline data
  • -g Data MAY show up grouped, when it belongs to the same participant. By default this depends on the order in which data comes in, so this grouping is not guaranteed. Use -g to force grouping of data. This may impact performance.

Note that the pepcli list command is not capable of data re-pseudonymization. Data from columns requiring such processing should be retrieved using pepcli pull instead.

pull

The pull command downloads a data set from PEP and stores the data in files. If you need more fine-grained control, use the list and get commands instead.

pepcli pull --all-accessible
pepcli pull -C <column group> -P <participant group>

This will by default store the data to the directory pulled-data.

Important flags are:

  • --all-accessible Download all the data the user has access to. You can use this instead of the -c, -C, -p and -P flags, and will just give you everything that you have been granted access to.
  • -C Column group to download data for. Can be repeated if you want data for more than one column group. There is a special column group * that contains all columns.
  • -c Specific column to download data for. Can be repeated, and combined with -C if you want multiple columns and column groups
  • -P Participant group to download data for. Can be repeated if you want data for more than one participant group. There is a special participant group * that contains all participants.
  • -p Specific participant identifier or pseudonym to download data for. Can be repeated if you want data for more than one participant.
  • -o Directory to write files to. Default is pulled-data.
  • -f Overwrite or remove existing data in output directory
  • -r Resume an interrupted download
  • -u Updates an existing output directory, e.g. when new data is available. This will use the same participant(group)s and column(group)s as the original download, so -c, -C, -p and -P are not allowed with this flag
  • --report-progress Show progress updates

query

Use the pepcli query command to retrieve information about your access to the system. Note that such access depends on the access group for which you are enrolled. Enroll for a specific access group:

query column-access

The column-access subcommand lists the columns and column groups that are accessible to you.

pepcli query column-access

The output will include all accessible columns and column groups, as well as whether you have read and/or write access.

query participant-group-access

The participant-group-access subcommand lists the columns and column groups that are accessible to you.

pepcli query participant-group-access

The output will include whether you have access and/or enumerate privileges for each participant group.

  • the access privilege allows you to retrieve data from rows included in the group.
  • the enumerate privilege allows you to list the rows included in the group.

Most actions require the user to hold both privileges. E.g. when specifying a -P groupname to pepcli pull, you'll want to list the rows included in the group and retrieve data from them.

Members of the Data Administrator user group automatically have full access to all participant groups.

query enrollment

Use the enrollment subcommand to find out how you're enrolled into the PEP system.

pepcli query enrollment

The output will include your user name (ID) and the user group to which you belong. The command will produce an error if you haven't enrolled yet, or if your enrollment has expired.

store

You can store data with this command:

pepcli store -c <column name> -p <participant> -i /PATH/TO/DATA/FILE

This will output the identifier of the stored entry.

The flags are:

  • -c The column to store the data in
  • -p This is either the participant identifer, or the polymorphic pseudonym to store data for. PPs can be obtained with pepcli list.
  • -i Path to the file to store. - means stdin, and is the default.
  • -d Data to store. Use either this or -i
  • -T By default, pepcli will request a write-only ticket. You can use -T and give a path to store the ticket in. If you use this flag, pepcli will also request read access to the entry that is stored. You can then use the ID in the output, together with this ticket for pepcli get. This way you can check whether the data was stored correctly. Note that pepcli also performs its own checks to see whether the data was stored correctly.

Specific usage scenarios

Uploading and downloading data

See Uploading and downloading data

Data administration

Creating participant groups, based on certain attributes

Lets say we want to create a participant group males, which contains all male participants. The sex of a participant can be found in a column Castor.GeneralInfo.

  1. We start by creating the participant group:

plaintext pepcli ama group create males

  1. We then download the data for the column Castor.GeneralInfo, for all participants:

plaintext pepcli pull -c "Castor.GeneralInfo" -P "*"

  1. The data administrator now filters the downloaded data: They take note of the directory names of those participants that, according to the downloaded data, are male. How this is done exactly, is outside of the scope of PEP. The result is a list of directory names. These directory names are the local pseudonyms of the participants.
  2. We can now use this list to add participants to the participant group:

plaintext pepcli ama group addTo males <local pseudonym>

For each <local pseudonym> from the list of step 3.