Using pepcli
The pepcli
application is the primary command line interface (CLI) application to interact with the PEP system. It is available for multiple platforms, and is included in PEP's client
Docker images and in the Windows client software installer. Among pepcli
's functionalities are the ability to upload and download data, and to administer the PEP system.
The use of command line utilities such as pepcli
is subject to details of the platform on which it is used. For example, a literal *
(asterisk) parameter value must be escaped to \*
on Linux to prevent shell expansion ("globbing"). Such details are not (extensively) covered in this documentation. Users are expected to be knowledgeable enough about their platforms to perform basic tasks and avoid common pitfalls.
General usage
The pepcli
utility must be invoked from a command line, with parameters telling it what to do. The general form of invocation is
pepcli [general flags] <COMMAND> [flags] [parameters...]
The various commands are documented in some detail on this page. The general flags are documented separately. Some commands have subcommands; some subcommands have sub-sub-commands, and so on:
pepcli [general flags] <COMMAND> <SUBCOMMAND> [flags] [parameters...]
pepcli [general flags] <COMMAND> <SUBCOMMAND> <SUBSUBCOMMAND> [flags] [parameters...]
The abilities and options of underlying commands are documented with the parent commands to which they apply.
Command line help
The pepcli
application provides command line help if it is invoked without parameters, or with the --help
switch:
pepcli # Produces command line help
pepcli --help # Produces command line help
The --help
switch is also supported by most (or all?) of pepcli
's commands and subcommands. This can be used to "drill down" through command line help to construct an appropriate command line, e.g. by sequentially invoking:
pepcli --help # Output includes the _ama_ command, so we then issue:
pepcli ama --help # Output includes the _query_ subcommand, so we then issue:
pepcli ama query --help # Output mentions the _\--column-group_ switch, so we then issue:
pepcli ama query --column-group ShortPseudonyms # The completed command line
Enrollment
Most of pepcli
's commands will connect to one or more of the PEP servers, and most server requests will require the user to be enrolled. There are two primary methods get enrolled for pepcli
usage:
- Have an OAuth token issued and present it to
pepcli
's--oauth-token
switch. Contact your PEP support contact for more information on obtaining such a prefab OAuth token. - From the directory where you intend to use
pepcli
, run thepepLogon
utility and log on interactively. Your enrollment data will be stored to a file and will remain valid for a period of 12 hours. During this period, usepepcli
without its--oauth-token
switch to perform commands in the role for which you logged on.
Note that prefab OAuth tokens are usually issued for longer validity periods (think months rather than hours), e.g. making them usable to execute pepcli
from automatic server processes.
General flags
The pepcli
utility's general flags can be used to indicate how to connect to and enroll with the PEP system:
--client-working-directory
specifies a directory containing configuration files specifying how to connect to PEP's servers. If not specified, these configuration files are assumed to be located in the directory containing thepepcli
executable file. Users of PEPclient
Docker images should specify a--client-working-directory /config
to use the configuration files included in the image.--client-config-name
specifies the name of the main (client) configuration file. If not specified, the file is assumed to be namedClientConfig.json
.--oauth-token
specifies an OAuth token to be used for enrollment, or the path to a file containing such an OAuth token. If not specified, the user is assumed to have been enrolled prior topepcli
's invocation, e.g. by means of thepepLogon
utility.
For brevity, these general flags will not be mentioned in the documentation of individual commands, or in the examples below. But they may be included with any command issued to pepcli
, e.g.:
pepcli --oauth-token /PATH/TO/OAuthToken.json --client-working-directory /PATH/TO/config-directory list -C \* -P \*
Other general flags exist, but are intended for use by developers of the PEP system. While mentioned in pepcli
's command line help, they are not further documented here.
Commands
The pepcli
utility supports commands for various tasks, aimed at different types of users:
General purpose:
query
provides information on your access to the PEP environment.query column-access
lists the columns and column groups accessible to the enrolled user.query participant-group-access
lists the participant groups accessible to the enrolled user.query enrollment
tells users how they're enrolled.
Data storage and retrieval:
get
retrieves data from a specific cell.list
lists data available in PEP.pull
stores a data set in files on your local machine.store
stores data in a specific cell.
Administrative tasks:
ama
provides subcommands to perform administrative tasks related to the Access Manager service.ama query
summarizes the current data structure and access rules.- For users enrolled as a
Data Administrator
:ama column
can be used to create and remove columns, and to group and un-group them.ama columnGroup
can be used to create and remove column groups.ama group
can be used to create and remove participant groups, and add participants to themama group auto-assign
can be used to automatically fill participant groups based on study contexts.
- For users enrolled as an
Access Administrator
:ama cgar
can be used to manage the type(s) of access that access groups have to column groups.
castor
provides subcommands to perform administrative tasks related to PEP's Castor integration functionality.castor export
@@@ TODO @@@castor list-sp-columns
lists short pseudonym columns that are bound to Castor studies.castor list-import-columns
lists columns required by PEP's Castor import.castor create-import-columns
creates columns required by PEP's Castor import.castor column-name-mapping
allows a Data Administrator to configure column name mappings to have PEP import Castor data into appropriately named columns.
ama
The ama
command's various sub-commands can be used to perform administrative tasks. While ama
is short for "Access Manager Administration", it should be noted that ama
provides subcommands for both the Access Administrator
and Data Administrator
roles. Users must be enrolled for the role appropriate for the subcommand they're invoking.
ama cgar
The cgar
subcommand is short for "column group access rule". It allows Access Administrator
to determine the types of access that access groups have to column groups:
pepcli ama cgar create <column group name> <access group name> <mode>
pepcli ama cgar remove <column group name> <access group name> <mode>
The column group must have been previously created by a Data Administrator
using the pepcli ama columnGroup
subcommand. The mode
parameter must be either read
or write
, indicating the type of access to grant or revoke.
After using the cgar
subcommand, the rule immediately takes effect. Users enrolled for the specified access group will immediately be granted (or denied) access to the specified column group.
ama column
The column
subcommand allows Data Administrator
to create and remove columns:
pepcli ama column create <column name>
pepcli ama column remove <column name>
Because of technical limitations, PEP column names may contain only printable ASCII characters. Additional restrictions apply to the names of columns into which Castor data are imported @@@ link and/or describe @@@.
Note that column removal will not discard data present in those columns; it will merely make the column's contents inaccessible. Therefore:
- when users retrieve data from an earlier moment in time, those data may include columns that have since been removed.
- when a column is removed and later re-added, the newly added column will contain any data that had previously been stored in the same column name.
The column
subcommand can also be used to group and un-group columns into column groups:
pepcli ama column addTo <column name> <column group name>
pepcli ama column removeFrom <column name> <column group name>
When columns are added to a column group, those columns immediately become available to users who can access the column group.
Column groups can be created and removed using the pepcli ama columnGroup
subcommand. Access Administrator
can grant access to column groups using the pepcli ama cgar
subcommand.
ama columnGroup
The columnGroup
subcommand allows Data Administrator
to create and remove column groups:
pepcli ama columnGroup create <column group name>
pepcli ama columnGroup remove <column group name>
Because of technical limitations, column group names may contain only printable ASCII characters. Note that some column groups are predefined and/or automatically managed by PEP software.
Once a column group has been created, use the pepcli ama column
subcommand to determine which columns are included in the group. Access Administrator
can grant access to column groups using the pepcli ama cgar
subcommand.
ama group
The group
subcommand allows Data Administrator
to create and remove participant groups:
pepcli ama group create <participant group name>
pepcli ama group remove <participant group name>
The group
subcommand also allows to add and remove participants to and from participant groups:
pepcli ama group addTo <participant group name> <participant>
pepcli ama group removeFrom <participant group name> <participant>
There are multiple ways to specify <participant>
:
- A PEP-id
- A Local Pseudonym. This is the pseudonym that is used as directory name, when doing
pepcli pull
(this will be replaced by the shorter Participant Alias with the pep release planned in Februari or March 2023) - A Participant Alias . This is a shorter version, derived from (cropping the) Local Pseudonym . (formerly called user pseudonym)
ama group auto-assign
Automatically update participant groups based on participant assignment to study contexts, and participants being marked as test participants. The command
- assigns every non-test participant to the
all
participant group, and - for every study context that a non-test participant belongs to, assigns that participant to a participant group named
all-[STUDY_CONTEXT]
, and - for every participant group whose name is
all
or starts withall-
, removes participants from that group if they should not (or no longer) be assigned to that group, and - removes empty participant groups whose name is
all
or starts withall-
.
pepcli ama group auto-assign
By default the command only outputs the (re-)assignments that would be performed, but doesn't update the configuration. To actually perform the (re-)assignments, invoke the command with the --wet
switch (indicating that it's not a "dry" run).
Use the command's --mapname
switch to create all-[MAPPED_REPLACEMENT]
groups instead of groups whose names correspond with a raw context name. Multiple --mapname
s may be specified with a single invocation.
ama query
The query
subcommand summarizes the current state of PEP's data structure and access rules. Both the Access Administrator
and Data Administrator
roles can invoke:
pepcli ama query
The output lists
- The
Columns
that have been defined by data administrator. - The
ColumnGroups
that have been defined by data administrator, and the columns included in each column group. - The
ColumnGroupAccessRules
that have been defined by access administrator, i.e. which access groups have what type(s) of access to which column groups. - The (participant)
Groups
that have been defined by data administrator. - The (participant)
GroupAccessRules
that have been defined by access administrator, i.e. which access groups have what type(s) of access to which participant groups.
castor
The castor
command's various sub-commands are used to interact with PEP's Castor integration functionality. Most commands are intended for (and restricted to) use by Data Administrator to configure the system for the import of Castor data.
castor column-name-mapping
To prevent the Castor import from using overly long and/or difficult to interpret column names, Data Administrator can use the pepcli castor column-name-mapping
command to define column name mappings. PEP's import routine will then import Castor data into columns named after the mappings rather than after the raw Castor names. The command's subcommands provide basic CRUD operations. The "read" operations can be performed by any enrolled user:
pepcli castor column-name-mapping list
lists all configured column name mappings.pepcli castor column-name-mapping read <castor>
reports the mapping that is applied to Castor entities whose name matches<castor>
. If no such mapping exists, the output will be empty.
Manipulation of column name mappings requires enrollment as a Data Administrator
:
pepcli castor column-name-mapping create <castor> <pep>
creates a new column name mapping, causing PEP to use the specified<pep>
replacement for Castor entities whose name matches<castor>
.pepcli castor column-name-mapping update <castor> <pep>
specifies a new<pep>
replacement for an existing column name mapping for the specified<castor>
name.pepcli castor column-name-mapping delete <castor>
removes an existing column name mapping for the specified<castor>
name. Subsequent import runs will revert to using the Castor name to determine the name of the column where data will be stored.
castor create-import-columns
This command ensures that all columns exist that will be needed when the specified Castor data are imported into PEP. Since the import process cannot create columns itself, Data Administrator can use this command to create them beforehand. When invoked without further parameters, the command creates missing columns for all Castor data that will be imported. Each created column is also automatically added to the Castor
column group, ensuring that the column is writable to the import process:
pepcli castor create-import-columns
The above switch-less invocation may lack information required to create import columns for some short pseudonym columns. These will be reported, and the command can then be invoked with switches to create columns for data import from affected Castor studies:
- the
--sp-column
switch specifies the column containing short pseudonym values that correspond with the Castor study's record IDs. - when data are to be imported from surveys that are answered multiple times by participants, the
--answer-set-count
must be used to indicate how many answer sets are to be expected. For example, if participants will answer a survey a maximum of 10 times, specify--answer-set-count 10
to have the command create sufficient column names for all survey answers to be imported.
E.g. when a Castor study is bound to short pseudonym column ShortPseudonym.Covid.Castor.CovidQuestionnaires
, and that study contains survey packages that are answered up to 30 times:
pepcli castor create-import-columns --sp-column ShortPseudonym.Covid.Castor.CovidQuestionnaires --answer-set-count 30
castor export
@@@ more here @@@
castor list-import-columns
This command lists the columns that are needed when the specified Castor data are imported into PEP. Since the import process cannot create columns itself, Data Administrator must create required columns beforehand. (S)he then uses this command to find out which columns are needed. The command must be invoked with the --sp-column
switch, specifying the column containing short pseudonym values that correspond with the Castor study's record IDs. E.g.:
pepcli castor list-import-columns --sp-column ShortPseudonym.Visit1.Castor.HomeQuestionnaires`
While the output lists the columns that will be needed by the import process, missing columns are not created by this command. Use the castor create-import-columns
command instead to have the columns created instead of just listed. Alternatively, Data Administrator can castor list
the columns, then invoke ama column
's create
and addTo
commands manually to create the columns and group them into the Castor
column group.
The castor list-import-columns
command accepts an optional switch:
- the
--answer-set-count
is used for Castor studies containing surveys that are answered multiple times by participants. PEP imports such survey data into numbered columns, and the switch indicates the number of answer sets to expect. For example, if participants will answer a survey a maximum of 10 times, specify--answer-set-count 10
to have the command list sufficient column names for all survey answers to be imported.
castor list-sp-columns
When invoked without further parameters, this command lists the names of all short pseudonym columns that refer to a Castor study. This means that the values in the listed columns are the Castor record IDs for the corresponding participant.
The command accepts an optional --imported-only
switch. If specified, the command outputs only those columns that are processed when importing data from Castor into PEP. These are the column names that can be passed to the --sp-column
switch of the castor list-import-columns
and castor create-import-columns
subcommands.
get
When the pepcli list
command has produced IDs referring to data, the associated data can be retrieved using the pepcli get
command:
pepcli get -t <ticket file> -i <identifier> -o <output file>
The flags are:
-t
The ticket you stored with the-T
flag of thelist
command-i
The identifier you got frompepcli list
-o
The file to write the output to.-
indicates stdout. This is the default.
Note that the pepcli get
command is not capable of data re-pseudonymization. Data from columns requiring such processing should be retrieved using pepcli pull
instead.
list
Use the pepcli list
command to determine which data is available in PEP:
pepcli list -C <column group> -P <participant group> -T <ticket out file>
The command outputs its information as a JSON array with one entry per subject. Every such entry contains the subject's polymorphic pseudonym, plus an array listing the columns in which data is stored for the subject. Depending on switches passed to the command, small data may be inlined, i.e. included in the command's output. For larger entries, the output will include an ID instead of the data itself. Such IDs can then be passed to the pepcli get
command to retrieve the associated data.
Important flags are:
-C
Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group*
that contains all columns.-c
Specific column to list data for. Can be repeated, and combined with-C
if you want multiple columns and column groups-P
Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group*
that contains all participants.-p
Specific participant to list data for. Can be repeated, and combined with-P
if you want multiple participants and participant groups-l
Include the local pseudonyms in the output. By default pepcli will only show polymorphic pseudonyms (PP). These are not constant, and cannot be used to see whether data belongs to the same participant. You need the local pseudonyms (LP) for that.-T
The first thing PEP does when you interact with it, is checking whether you have access to the participant(group)s and column(group)s you request. If you do have access, it will hand out a ticket. You can store this ticket with the-T
flag, to use it for later actions.-t
You can pass a ticket from an earlier request with the-t
flag. The column(group)s and participant(group)s of this request must be a subset of the earlier request.-s
The size limit (in bytes) for data that should be inlined, i.e. be included the thelist
command's output. Currently defaults to 1000. Setting this to 0 means that data will ALWAYS be inlined--no-inline-data
Never inline data-g
Data MAY show up grouped, when it belongs to the same participant. By default this depends on the order in which data comes in, so this grouping is not guaranteed. Use-g
to force grouping of data. This may impact performance.
Note that the pepcli list
command is not capable of data re-pseudonymization. Data from columns requiring such processing should be retrieved using pepcli pull
instead.
pull
The pull
command downloads a data set from PEP and stores the data in files. If you need more fine-grained control, use the list
and get
commands instead.
pepcli pull --all-accessible
pepcli pull -C <column group> -P <participant group>
This will by default store the data to the directory pulled-data
.
Important flags are:
--all-accessible
Download all the data the user has access to. You can use this instead of the-c
,-C
,-p
and-P
flags, and will just give you everything that you have been granted access to.-C
Column group to download data for. Can be repeated if you want data for more than one column group. There is a special column group*
that contains all columns.-c
Specific column to download data for. Can be repeated, and combined with-C
if you want multiple columns and column groups-P
Participant group to download data for. Can be repeated if you want data for more than one participant group. There is a special participant group*
that contains all participants.-p
Specific participant identifier or pseudonym to download data for. Can be repeated if you want data for more than one participant.-o
Directory to write files to. Default ispulled-data
.-f
Overwrite or remove existing data in output directory-r
Resume an interrupted download-u
Updates an existing output directory, e.g. when new data is available. This will use the same participant(group)s and column(group)s as the original download, so-c
,-C
,-p
and-P
are not allowed with this flag--report-progress
Show progress updates
query
Use the pepcli query
command to retrieve information about your access to the system. Note that such access depends on the access group for which you are enrolled. Enroll for a specific access group:
- either by means of a prior call to
pepLogon
, - or by passing an appropriate OAuth token to
pepcli
.
query column-access
The column-access
subcommand lists the columns and column groups that are accessible to you.
pepcli query column-access
The output will include all accessible columns and column groups, as well as whether you have read and/or write access.
query participant-group-access
The participant-group-access
subcommand lists the columns and column groups that are accessible to you.
pepcli query participant-group-access
The output will include whether you have access
and/or enumerate
privileges for each participant group.
- the
access
privilege allows you to retrieve data from rows included in the group. - the
enumerate
privilege allows you to list the rows included in the group.
Most actions require the user to hold both privileges. E.g. when specifying a -P groupname
to pepcli pull
, you'll want to list the rows included in the group and retrieve data from them.
Members of the Data Administrator
user group automatically have full access to all participant groups.
query enrollment
Use the enrollment
subcommand to find out how you're enrolled into the PEP system.
pepcli query enrollment
The output will include your user name (ID) and the user group to which you belong. The command will produce an error if you haven't enrolled yet, or if your enrollment has expired.
store
You can store data with this command:
pepcli store -c <column name> -p <participant> -i /PATH/TO/DATA/FILE
This will output the identifier of the stored entry.
The flags are:
-c
The column to store the data in-p
This is either the participant identifer, or the polymorphic pseudonym to store data for. PPs can be obtained withpepcli list
.-i
Path to the file to store.-
means stdin, and is the default.-d
Data to store. Use either this or-i
-T
By default, pepcli will request a write-only ticket. You can use-T
and give a path to store the ticket in. If you use this flag, pepcli will also request read access to the entry that is stored. You can then use the ID in the output, together with this ticket forpepcli get
. This way you can check whether the data was stored correctly. Note that pepcli also performs its own checks to see whether the data was stored correctly.
Specific usage scenarios
Uploading and downloading data
See Uploading and downloading data
Data administration
Creating participant groups, based on certain attributes
Lets say we want to create a participant group males
, which contains all male participants. The sex of a participant can be found in a column Castor.GeneralInfo
.
- We start by creating the participant group:
plaintext
pepcli ama group create males
- We then download the data for the column
Castor.GeneralInfo
, for all participants:
plaintext
pepcli pull -c "Castor.GeneralInfo" -P "*"
- The data administrator now filters the downloaded data: They take note of the directory names of those participants that, according to the downloaded data, are male. How this is done exactly, is outside of the scope of PEP. The result is a list of directory names. These directory names are the local pseudonyms of the participants.
- We can now use this list to add participants to the participant group:
plaintext
pepcli ama group addTo males <local pseudonym>
For each <local pseudonym>
from the list of step 3.