Title: | Timing, Anatomical, Therapeutic and Chemical Based Medication Clustering |
---|---|
Description: | Agglomerative hierarchical clustering with a bespoke distance measure based on medication similarities in the Anatomical Therapeutic Chemical Classification System, medication timing and medication amount or dosage. Tools for summarizing, illustrating and manipulating the cluster objects are also available. |
Authors: | Anna Laksafoss [aut, cre] |
Maintainer: | Anna Laksafoss <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-11-07 04:57:46 UTC |
Source: | https://github.com/laksafoss/tame |
We use this data set in all the examples in the package.
complications
complications
An object of class data.frame
with 149 rows and 8 columns.
A Simulated Data Set About Eczema
eczema
eczema
An object of class data.frame
with 50644 rows and 7 columns.
Employ a clustering to new data
employ( object, new_data, only = NULL, additional_data = NULL, assignment_method = "nearest_cluster", parallel = FALSE, ... )
employ( object, new_data, only = NULL, additional_data = NULL, assignment_method = "nearest_cluster", parallel = FALSE, ... )
object |
A |
new_data |
A data frame in which to look for variables with |
only |
< |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
assignment_method |
A character naming the employment method. The
default assignment method The assignment method |
parallel |
A logical or an integer. If If |
... |
Additional arguments affecting the employment procedure. |
employ
returns a medic
object.
part1 <- complications[1:100,] part2 <- complications[101:149,] clust <- medic(part1, id = id, atc = atc, k = 3) # Nearest cluster matching employ(clust, part2) # Only exact matching employ(clust, part2, assignment_method = "exact_only")
part1 <- complications[1:100,] part2 <- complications[101:149,] clust <- medic(part1, id = id, atc = atc, k = 3) # Nearest cluster matching employ(clust, part2) # Only exact matching employ(clust, part2, assignment_method = "exact_only")
Enrich the parameter information in a clustering with user-defined data.
enrich(object, additional_data = NULL, by = NULL)
enrich(object, additional_data = NULL, by = NULL)
object |
A |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
by |
A character vector of variables to join by. This variables is
passed to the If To join by different variables on To join by multiple variables, use a vector with length > 1. For example,
For example, |
The enrich()
function is a joining function used for enriching the
clustering characteristics with user-defined data. This function is used in
all of the investigative functions with a additional_data
statement such as
frequencies()
and amounts()
.
An object of class medic.
clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3:5 ) new_parameters <- data.frame(k = 3:5, size = c("small", "small", "large")) enrich(clust, new_parameters)
clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3:5 ) new_parameters <- data.frame(k = 3:5, size = c("small", "small", "large")) enrich(clust, new_parameters)
Test if an object is a medic-object
is.medic(object)
is.medic(object)
object |
Any object. |
TRUE
is the object inherits from the medic
class and has the required
elements.
clust <- medic(complications, id = id, atc = atc, k = 3) is.medic(clust)
clust <- medic(complications, id = id, atc = atc, k = 3) is.medic(clust)
The medic
method uses agglomerative hierarchical clustering with a
bespoke distance measure based on medication ATC codes similarities,
medication timing and medication amount or dosage.
medic( data, k = 5, id, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, parallel = FALSE, return_distance_matrix = FALSE, set_seed = FALSE, ... ) ## S3 method for class 'medic' print(x, ...)
medic( data, k = 5, id, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, parallel = FALSE, return_distance_matrix = FALSE, set_seed = FALSE, ... ) ## S3 method for class 'medic' print(x, ...)
data |
A data frame containing all the variables for the clustering. |
k |
a vector specifying the number of clusters to identify. |
id |
< |
atc |
< |
timing |
< |
base_clustering |
< |
linkage |
The agglomeration method to be used in the clustering. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See stats::hclust for more information. For a discussion of linkage criterion choice see details below. |
summation_method |
The summation method used in the distance measure. This should be either "double_sum" or "sum_of_minima". See details below for more information. |
alpha |
A number giving the tuning of the normalization. See details below for more information. |
beta |
A number giving the power of the individual medication combinations. See details below for more information. |
gamma |
A number giving the weight of the timing terms. See details below for more information. |
p |
The power of the Minkowski distance used in the timing-specific distance. See details below for more information. |
theta |
A vector of length 6 specifying the tuning of the ATC measure. See details below for more information. |
parallel |
A logical or an integer. If If |
return_distance_matrix |
A logical. |
set_seed |
A logical or an integer. |
... |
Additional arguments not currently in use. |
x |
A |
The medic
method uses agglomerative hierarchical
clustering with a bespoke distance measure based on medication ATC codes and
timing similarities to assign medication pattern clusters to people.
Two versions of the distance measure are available:
The double sum:
and the sum of minima:
If the normalization tuning, alpha
, is 0, then no normalization is
preformed and the distance measure becomes highly dependent on the number of
distinct medications given. That is, people using more medication will have
larger distances to others. If the normalization tuning, alpha
, is 1 -
the default - then the summation is normalized with the number of terms in
the sum, in other words, the average is calculated.
The central idea of this method, namely the ATC distance, is given as
The ATC distance is tuned using the vector theta
.
Note that two ATC codes are said to match at level i when they are identical at level i. E.g. the two codes N06AB01 and N06AA01 match on level 1, 2, and 3 as they are both "N" at level 1, "N06" at level 2, and "N06A" at level 3, but at level 4 they differ ("N06AB" and "N06AA" are not the same).
The timing distance is a simple Minkowski distance:
When p
is 1, the default, the Manhattan distance is used.
An object of class medic which describes the clusters produced the hierarchical clustering process. The object is a list with components:
the inputted data frame data
with the cluster
assignments appended at the end.
a data frame with the person id as given by id
,
the .analysis_order
and the clusters found.
a list of the variables used in the clustering.
a data frame with all the inputted clustering
parameters and the corresponding method names. These method names
correspond to the column names for each cluster in the clustering
data frame described right above.
a list of keys used internally in the function to keep track of simplified versions of the data.
the distance matrices for each method if
return_distance_matrix
is TRUE
otherwise NULL
.
the matched call.
print(medic)
: Print method for medic-objects
summary.medic for summaries and plots.
employ for employing an existing clustering to new data.
enrich for enriching the meta data in the medic
object with additional
data.
bind for binding together two comparable lists of clusterings.
# A simple clustering based only on ATC clust <- medic(complications, id = id, atc = atc, k = 3) # A simple clustering with both ATC and timing clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3 )
# A simple clustering based only on ATC clust <- medic(complications, id = id, atc = atc, k = 3) # A simple clustering with both ATC and timing clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3 )
Given the input of the medic
this function checks the
input and constructs a data frame with the analysis parameters specified by
the user.
parameters_constructor( data, id, k = 5, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, ... )
parameters_constructor( data, id, k = 5, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, ... )
data |
A data frame containing all the variables for the clustering. |
id |
< |
k |
a vector specifying the number of clusters to identify. |
atc |
< |
timing |
< |
base_clustering |
< |
linkage |
The agglomeration method to be used in the clustering. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See stats::hclust for more information. For a discussion of linkage criterion choice see details below. |
summation_method |
The summation method used in the distance measure. This should be either "double_sum" or "sum_of_minima". See details below for more information. |
alpha |
A number giving the tuning of the normalization. See details below for more information. |
beta |
A number giving the power of the individual medication combinations. See details below for more information. |
gamma |
A number giving the weight of the timing terms. See details below for more information. |
p |
The power of the Minkowski distance used in the timing-specific distance. See details below for more information. |
theta |
A vector of length 6 specifying the tuning of the ATC measure. See details below for more information. |
... |
Additional arguments not currently in use. |
A data.frame with the parameters for clustering.
parameters_constructor( data = complications, k = 3, id = id, atc = atc )
parameters_constructor( data = complications, k = 3, id = id, atc = atc )
Refactor the levels of the chosen clusters.
refactor(object, ..., inheret_parameters = TRUE)
refactor(object, ..., inheret_parameters = TRUE)
object |
A |
... |
< The name gives the name of the new clustering in the output. The value can be:
When a recording uses the name of an existing clustering, this new clustering will overwrite the existing one. |
inheret_parameters |
A logical. If |
A medic
object with relevant clusterings refactored.
clust <- medic(complications, id = id, atc = atc, k = 3:4) # Refactor one clustering refactor( clust, `cluster_1_k=4` = dplyr::recode(`cluster_1_k=4`, IV = "III") ) # Refactor all clusterings refactor( clust, dplyr::across( dplyr::everything(), ~dplyr::recode(., IV = "III") ) )
clust <- medic(complications, id = id, atc = atc, k = 3:4) # Refactor one clustering refactor( clust, `cluster_1_k=4` = dplyr::recode(`cluster_1_k=4`, IV = "III") ) # Refactor all clusterings refactor( clust, dplyr::across( dplyr::everything(), ~dplyr::recode(., IV = "III") ) )
Make cluster characterizing summaries.
## S3 method for class 'medic' summary( object, only = NULL, clusters = NULL, outputs = c("frequencies", "medications", "amounts", "trajectories", "interactions"), additional_data = NULL, ... ) ## S3 method for class 'summary.medic' print(x, ...) ## S3 method for class 'summary.medic' plot(x, by, facet, ...)
## S3 method for class 'medic' summary( object, only = NULL, clusters = NULL, outputs = c("frequencies", "medications", "amounts", "trajectories", "interactions"), additional_data = NULL, ... ) ## S3 method for class 'summary.medic' print(x, ...) ## S3 method for class 'summary.medic' plot(x, by, facet, ...)
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
outputs |
A character vector naming the desired characteristics to output. The default names all possible output types. |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the internal summary function.
|
x |
A |
by |
|
facet |
A list of clustering characteristics of class summary.medic
is returned. It
can contain any of the following characteristics:
The number of individuals assigned to each cluster and the associated frequency of assignment.
The number of individuals with a specific ATC code within a cluster. Moreover, it calculates the percentage of people with this medication assigned to this cluster and the percent of people within the cluster with this medication.
The number of ATC codes an individual has, and then outputs the number of individuals within a cluster that has that many ATC codes. Moreover, various relevant percentages or calculated. See Value below for more details on these percentages.
The number of unique timing trajectories in each cluster, and the average timing trajectories in each cluster.
The number of people with unique timing trajectory and ATC group, as given by
atc_groups
, in each cluster.
print(summary.medic)
: Print method for medic-objects
plot(summary.medic)
: Plot method for medic-objects
clust <- medic(complications, id = id, atc = atc, k = 3:5)
clust <- medic(complications, id = id, atc = atc, k = 3:5)