PEP Command Line interface

=========================

Before the commandline tool can be used, its environment has to be configured and it has to be issued an authorization.

Configuration

pepcli requires various configuration files of which ClientConfig.json is the most important. Configuration includes addresses of the various servers, TLS certificate authority public keys, access tokens, private keys, pseudonym public key, data public key and other cryptographic settings.

By default pepcli will look for configuration files in the executable's own directory. Specify the --client-working-directory flag to use configuration from another directory.

Authorization

Authorization proceeds in two steps. First one needs an access token, which is confusingly called an oauth token. This token can either be retrieved via an oauth dance from the authentication server, or a long-term oauth token can be provided.

In the second step pepcli uses this oauth token to start a short-term session in a process called enrollment. In the enrollment process a signing keypair, pseudonym keypair and data keypair are generated and stored in ClientKeys.json. These keys are typically valid for about a day.

By default pepcli will look for ClientKeys.json (which contains the short-term session keys) in the configuration directory and checks whether they expired. If it can’t find them or if they expired, pepcli will look for OAuthToken.json in the configuration directory for an oauth token. With this token it will automatically enroll to generate short-term keys. The oauth token can also be specified on the commandline with --oauth-token.

OAuth Tokens are generated using a shared secret between the authentication server and the key server, called an oauth token secret. If this secret is available (which is the case in the local development environment where it is stored in the OAuthTokenSecret.json file), then pepcli will generate an oauth token using the shared secret.

PEP data structure recap

The data stored in PEP can be seen as one big table, where there is a row for each participant. Examples of columns are

  • ParticipantInfo contains the personalia of a participant.
  • DeviceHistory contains the history of watches used by the participant.
  • WatchData.Week23 contains watch data of the 23rd week.

A cell has at most one verified & valid value. A value (valid or not) is referred to as a file. Besides the actual data, a file has the following metadata associated to it

  • Whether the file is valid. A file is marked invalid if there turns out to be an error in the data due to for instance measuring errors.
  • Whether the file is verified. When researchers upload data, it's initially marked as unverified and verified by data administrator.
  • A storage facility identifier
  • A timestamp

Column groups and (participant) groups

Columns are sorted in column groups. An example of a column group is ShortPseudonyms. A column might appear in several different column groups.

Participants are ordered in (participant) groups. A participant might appear in several different groups (or none at all). An example of a group is *, which contains all participants that have consented.

Access is given to full column groups and participant groups --- not to separate columns or participants.

Polymorphic pseudonyms

The main "identifier" used to point to a participant is a polymorphic pseudonym. By design, there are many different polymorphic pseudonyms that point to the same participant. Given two polymorphic pseudonyms, it's intentionally infeasable to check whether they point to the same participant. In this way, two research groups cannot link pieces of data of the same participant (if they don't share any column in common).

Tickets

When querying or retrieving data, the client requests a ticket from the access manager. A ticket serves two roles:

  • It is a signed statement by the access manager that the bearer has access to the given column (grous), participant (groups) with the given modes.
  • It contains a timestamp and is a snapshot in the following sense. When retrieving data using a ticket, only data that was verified and valid at the time the ticket was issued, is returned.

There are two reasons to temporarily store tickets

  • When retrieving multiple files, it's convenient to use the same ticket to guarantee is a consistent picture. (This is the snapshot role of a ticket.)
  • A ticket is relatively expensive to create. The time to create a ticket is proportional to the number of matched participants.

list — listing files

With pepcli list one can query the currently valid and verified files for certain participant (-p), (participant) groups (-P), columns (-c) and column groups (-C). For each matched file, either the data is included (if it's not too large) or a identifier is included with which the data can be queried (with pepcli get).

(These identifiers are polymorphic: a single file has many different identifiers and you will get a different one on every request.)

What follows is an example query.

$ pepcli list -P '*' -c DeviceHistory -C WatchData -T ticket
[{
    "data": {
        "DeviceHistory": "{\n    \"entries\": [\n        {\n            \"type\": \"start\",\n            \"serial\": \"4545454545454545\",\n            \"date\": \"1539085557696\"\n        },\n        {\n            \"type\": \"stop\",\n            \"serial\": \"4545454545454545\",\n            \"date\": \"1539085559216\"\n        },\n        {\n            \"type\": \"start\",\n            \"serial\": \"7777777777777777\",\n            \"date\": \"1539085563332\"\n        }\n    ]\n}\n",
    },
    "ids": {
        "WatchData.Week23": "0A0F4DE8CA173079056F6FBA2CEAC00655121036C7CE0842144E4BC62F3922381A57501A10C6E41E3BBF84738DA9D39EA63F1DDD8B"
        "WatchData.Week22": "0A0F7AA6D4C68A4B7DCE44D813717E31051210F0F483A6C75B30FEDEBBCE013F07ABCF1A10C8FB326728DB027C2B705FB3FFCBC081"
    },
    "pp": "1AD0CB2F25F3F96232F26083B211AF47BA3546A627E2B07E801084121A6C2330:9E554CFCC4E7724AE565BB3120EBB0EC6EC743F2AA314AB585F4EEDB666B1C72:CAB97D7E594B0DCA79BDFBDB090B08C153E761CF964CDA32D829BB746934B34B"
}
,{
    "data": {
        "DeviceHistory": "{\n    \"entries\": [\n        {\n            \"type\": \"start\",\n            \"serial\": \"1212121212121212\",\n            \"date\": \"1539085521343\"\n        },\n        {\n            \"type\": \"stop\",\n            \"serial\": \"1212121212121212\",\n            \"date\": \"1539085524948\"\n        },\n        {\n            \"type\": \"start\",\n            \"serial\": \"1212121212121212\",\n            \"date\": \"1539085529762\"\n        },\n        {\n            \"type\": \"stop\",\n            \"serial\": \"1212121212121212\",\n            \"date\": \"1539085534882\"\n        },\n        {\n            \"type\": \"start\",\n            \"serial\": \"3434343434343434\",\n            \"date\": \"1539085538277\"\n        }\n    ]\n}\n"
    },
    "pp": "BEB675E48D18F039FCCBE8B8FCEA1D9F82CA17797C473EB44E7AA86D22CC4948:A6EB38130C6A6ECBFCC2B35632E105E67CC6C1A0380FBF9EB73010A44CA34673:CAB97D7E594B0DCA79BDFBDB090B08C153E761CF964CDA32D829BB746934B34B"
}
]

To retrieve the watch data of the first participant, we need a follow-up query and therefore we ask pepcli to store the ticket in ticket with the -T flag.

The limit on the size of the inlined data can be controlled with the --inline-data-size-limit flag. It can be completely disabled with the --no-inline-data flag.

get — retrieve a file

With pepcli get one retrieves a file by identifier and ticket. To retrieve the watch data of the 23rd week in the previous example, one would run:

$ pepcli get -i 0A0F4DE8CA173079056F6FBA2CEAC00655121036C7CE0842144E4BC62F3922381A57501A10C6E41E3BBF84738DA9D39EA63F1DDD8B -t ticket 
[ watch data written to stdout ]

store — store a file

pepcli store stores a file for the given participant under the given column. If permitted, it will automatically verify the file as well.

For instance, the following command stores watchdata from the file watchdata for the 24th week under the first participant in the pepcli list example.

$ pepcli store -p 1AD0CB2F25F3F96232F26083B211AF47BA3546A627E2B07E801084121A6C2330:9E554CFCC4E7724AE565BB3120EBB0EC6EC743F2AA314AB585F4EEDB666B1C72:CAB97D7E594B0DCA79BDFBDB090B08C153E761CF964CDA32D829BB746934B34B -c WatchData.Week24 -i watchdata
{
    "id": "0A0EB5BB6A7870339EC5C5933ACF66501210A65D1571B0FB6C8B43942861383170561A10CC08783286546545C66D4A4B8728ED5C"
}

To retrieve the stored data immediately without going through pepcli list, one can instruct pepcli store with the --ticket-out option to request a ticket with read/write permissions and store it in the give file (which can then be passed to pepcli get with the -t option.)