Introduction to Formula Response Function Recur()

library(reda)
packageVersion("reda")
## [1] '0.5.4'

Overview

The Recur() function provides a flexible and widely applicable formula response interface for modeling recurrent event data with considerate data checking procedures. It combined the flexible interface of reSurv() (deprecated in reReg version 1.1.7) and the effective checking procedures embedded in the Survr() (deprecated in reda version 0.5.0).

Function Interface

The function interface of Recur() is given below.

Recur(time, id, event, terminal, origin, check = c("hard", "soft", "none"), ...)

A high-level introduction to each argument is as follows:

  • time: event and censoring times
  • id: subject’s id
  • event: recurrent event indicator, cost, or type
  • terminal: event indicator of terminal events
  • origin: time origin of subjects
  • check: how to run the data checking procedure
    • "hard": throw errors if the check_Recur() finds any issue on the data structure
    • "soft": throw warnings instead
    • "none": not to run the checking procedure

More details of arguments are provided in the function documentation by ?Recur.

The Recur Object

The function Recur() returns an S4-class Recur object representing model response for recurrent event data. The Recur class object mainly contains a numerical matrix object (in the .Data slot) that serves as a model response matrix. The other slots are

  • call: a function call producing the object.
  • ID: a factor storing the original subject’s ID, which originally can be a character vector, a numeric vector, or a factor). It is needed to pinpoint data issues for particular subjects with their original ID’s.
  • ord: indices that sort the response matrix (by rows) increasingly by id, time2, and - event. Sorting is often done in the model-fitting steps, where the indices stored in this slot can be used directly.
  • rev_ord: indices that revert the increasingly sorted response matrix by ord to its original ordering. This slot is provided to easily revert the sorting.
  • first_idx: indices that indicates the first record of each subject in the sorted matrix. It helps in the data checking produce and may be helpful in model-fitting step, such as getting the origin time.
  • last_idx: indices that indicates the last record of each subject in the sorted matrix. Similar to first_idx, it helps in the data checking produce and may be helpful in the model-fitting step, such as locating the terminal events.
  • check: a character string that records the specified check argument. It just records the option that users specified on data checking.

Usage

Among all the arguments, only the argument time does not have default values and thus has to be specified by users.

When only time is given

  • The function assumes that each time point is specified for each subject.
  • The id takes its default value: seq_along(time).
  • The event takes its default values: 0 (censoring) at the last record of each subject, and 1 (event) before censoring.
  • Both terminal and origin take zero for all subjects by default.
ex1 <- Recur(3:5)
head(ex1)
     time1 time2 id event terminal origin
[1,]     0     3  1     0        0      0
[2,]     0     4  2     0        0      0
[3,]     0     5  3     0        0      0

When time and id are given

  • The event takes its default values: 0 (censoring) at the last record of each subject, and 1 (event) before censoring.
  • Both terminal and origin take zero for all subjects by default.
ex2 <- Recur(6:1, id = rep(1:2, 3))
head(ex2)
     time1 time2 id event terminal origin
[1,]     4     6  1     0        0      0
[2,]     3     5  2     0        0      0
[3,]     2     4  1     1        0      0
[4,]     1     3  2     1        0      0
[5,]     0     2  1     1        0      0
[6,]     0     1  2     1        0      0
## sort by id, time2, and - event
head(ex2[ex2@ord, ])
     time1 time2 id event terminal origin
[1,]     0     2  1     1        0      0
[2,]     2     4  1     1        0      0
[3,]     4     6  1     0        0      0
[4,]     0     1  2     1        0      0
[5,]     1     3  2     1        0      0
[6,]     3     5  2     0        0      0
  • The slot ord stores the indices that sort the response matrix by id, time2, and - event.

Helper %to% for recurrent episodes

The function Recur() allows users to input recurrent episodes by time1 and time2, which can be specified with help of %to% (or its alias %2%) in Recur(). For example,

left <- c(1, 5, 7)
right <- c(3, 7, 9)
ex3 <- Recur(left %to% right, id = c("A1", "A1", "A2"))
head(ex3)
     time1 time2 id event terminal origin
[1,]     1     3  1     1        0      1
[2,]     5     7  1     0        0      1
[3,]     7     9  2     0        0      7

Internally, the function %to% returns a list with element named "time1" and "time2". Therefore, it is equivalent to specify such a list.

ex4 <- Recur(list(time1 = left, time2 = right), id = c("A1", "A1", "A2"))
stopifnot(all.equal(ex3, ex4, check.attributes = FALSE))

About origin and terminal

  • Both origin and terminal take a numeric vector.
  • The length of specified vector can be one, equal to the number of subjects, or the number of time. Some simple examples are given below.
ex5 <- Recur(3:5, origin = 1, terminal = 1)
head(ex5)
     time1 time2 id event terminal origin
[1,]     1     3  1     0        1      1
[2,]     1     4  2     0        1      1
[3,]     1     5  3     0        1      1
ex6 <- Recur(3:5, id = c("A1", "A1", "A2"), origin = 1:2, terminal = c(0, 1))
head(ex6)
     time1 time2 id event terminal origin
[1,]     1     3  1     1        0      1
[2,]     3     4  1     0        0      1
[3,]     2     5  2     0        1      2
ex7 <- Recur(3:5, id = c("A1", "A1", "A2"),
             origin = c(1, 1, 2), terminal = c(0, 0, 1))
stopifnot(all.equal(ex6, ex7, check.attributes = FALSE))
  • An error message will be thrown out if the length is inappropriate.
try(Recur(1:10, origin = c(1, 2)))
Error : Invalid length for 'origin'. See '?Recur' for details.
try(Recur(1:10, terminal = c(1, 2)))
Error : Invalid length for 'terminal'.  See '?Recur' for details.

Data Checking Rules

The Recur() (internally calls check_Recur() and) checks whether the specified data fits into the recurrent event data framework by several rules if check = "hard" or check = "soft". The existing rules and the corresponding examples are given below.

  1. Every subject must have one censoring not before any event time.
try(Recur(1:5, id = c(rep("A1", 3), "A2", "A3"), event = c(0, 0, 1, 0, 0)))
Error : Subjects having events at or after censoring: A1.
  1. Every subject must have one terminal event time.
try(Recur(1:3, id = rep("A1", 3), terminal = c(0, 1, 1)))
Error : Subjects having multiple terminal events: A1.
  1. Event or censoring times cannot be missing.
try(Recur(c(1:2, NA), id = rep("A1", 3)))
Error : Missing times! Please check subject: A1.
  1. Event times cannot be earlier than the origin time.
try(Recur(3:5, id = rep("A1", 3), origin = 10))
Error : Event times must be >= origin. Please check subject: A1.
try(Recur(3:5 %to% 1:3, id = rep("A1", 3)))
Error : Event times must be >= origin. Please check subject: A1.
  1. The recurrent episode cannot be overlapped.
try(Recur(c(0, 3, 5) %to% c(1, 6, 10), id = rep("A1", 3)))
Error : Recurrent episodes cannot be overlapped. Please check subject: A1.
  1. However, recurrent episode without events is allowed for possible time-varying covariates and risk-free gaps.
Recur(c(0, 2, 6) %to% c(1, 3, 8), id = rep("A1", 3), event = c(0, 1, 0))
[1] A1: (0, 1+], (2, 3], (6, 8+]

The Show() Method

A show() method is added for the Recur object in a similar fashion to the output of the function survival:::print.Surv(), which internally converts the input Recur object to character strings representing the recurrent episodes by a dedicated as.character() method. For each recurrent episode,

  • Censoring not due to terminal is indicated by a trailing + sign;
  • Censoring due to terminal is indicated by a trailing * sign;
  • Otherwise, an event happens at the end of the recurrent episode.

For a concise printing, the show() method takes the getOption("reda.Recur.maxPrint") to limit the maximum number of recurrent episodes to be printed for each process. By default, options(reda.Recur.maxPrint = 3) is set.

The Valve Seats Example

We may illustrate the results of the show() method by the example valve seats data, where terminal events are artificially added.

set.seed(123)
term_events <- rbinom(length(unique(valveSeats$ID)), 1, 0.5)
with(valveSeats, Recur(Days, ID, No., term_events))
 [1] 251: (0, 761+]                             
 [2] 252: (0, 759*]                             
 [3] 327: (0, 98], (98, 667*]                   
 [4] 328: (0, 326], (326, 653], ..., (653, 667*]
 [5] 329: (0, 665+]                             
 [6] 330: (0, 84], (84, 667*]                   
 [7] 331: (0, 87], (87, 663*]                   
 [8] 389: (0, 646], (646, 653*]                 
 [9] 390: (0, 92], (92, 653*]                   
[10] 391: (0, 651*]                             
[11] 392: (0, 258], (258, 328], ..., (621, 650+]
[12] 393: (0, 61], (61, 539], (539, 648*]       
[13] 394: (0, 254], (254, 276], ..., (640, 644*]
[14] 395: (0, 76], (76, 538], (538, 642*]       
[15] 396: (0, 635], (635, 641*]                 
[16] 397: (0, 349], (349, 404], ..., (561, 649+]
[17] 398: (0, 631*]                             
[18] 399: (0, 596*]                             
[19] 400: (0, 120], (120, 479], (479, 614+]     
[20] 401: (0, 323], (323, 449], (449, 582+]     
[21] 402: (0, 139], (139, 139], (139, 589*]     
[22] 403: (0, 593+]                             
[23] 404: (0, 573], (573, 589*]                 
[24] 405: (0, 165], (165, 408], ..., (604, 606+]
[25] 406: (0, 249], (249, 594*]                 
[26] 407: (0, 344], (344, 497], (497, 613*]     
[27] 408: (0, 265], (265, 586], (586, 595+]     
[28] 409: (0, 166], (166, 206], ..., (348, 389+]
[29] 410: (0, 601+]                             
[30] 411: (0, 410], (410, 581], (581, 601+]     
[31] 412: (0, 611+]                             
[32] 413: (0, 608+]                             
[33] 414: (0, 587*]                             
[34] 415: (0, 367], (367, 603+]                 
[35] 416: (0, 202], (202, 563], ..., (570, 585*]
[36] 417: (0, 587+]                             
[37] 418: (0, 578*]                             
[38] 419: (0, 578*]                             
[39] 420: (0, 586*]                             
[40] 421: (0, 585+]                             
[41] 422: (0, 582*]                             

On Missing times

The updated show() method preserves NA’s when check = "none". However, NA’s will always appear because times are sorted internally.

Recur(c(NA, 3:6, NA), id = rep(1:2, 3), check = "none")
[1] 1: (0, 4], (4, 6], (6, NA+] 2: (0, 3], (3, 5], (5, NA+]

Number of digits

The show() method takes the value of options("digits") - 3 to determine the largest number of digits for printing.

op <- options()
getOption("digits")
[1] 7
Recur(pi, 1)
[1] 1: (0.0000, 3.1416+]
options(digits = 10)
Recur(pi, 1)
[1] 1: (0.0000000, 3.1415927+]
options(op) # reset (all) initial options