| Title: | Timing, Anatomical, Therapeutic and Chemical Based Medication Clustering |
|---|---|
| Description: | Agglomerative hierarchical clustering with a bespoke distance measure based on medication similarities in the Anatomical Therapeutic Chemical Classification System, medication timing and medication amount or dosage. Tools for summarizing, illustrating and manipulating the cluster objects are also available. |
| Authors: | Anna Laksafoss [aut, cre]
|
| Maintainer: | Anna Laksafoss <[email protected]> |
| License: | GPL (>= 3) | file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-05-12 08:12:02 UTC |
| Source: | https://github.com/laksafoss/tame |
The function cluster_frequency() calculates the number and frequency of
individuals assigned to each cluster.
cluster_frequency( object, only = NULL, clusters = NULL, additional_data = NULL, ... )cluster_frequency( object, only = NULL, clusters = NULL, additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
cluster_frequency() calculates the number of individuals assigned to
each cluster and the associated frequency of assignment.
cluster_frequency() returns a data frame with class
cluster_frequency.
Clustering the name of the clustering.
Cluster the cluster name.
Count the number of individuals assigned to the cluster.
Percent the percent of individuals assigned to the cluster.
clust <- medic(complications, id = id, atc = atc, k = 3:5) # make frequency tables cluster_frequency(clust, k == 5) cluster_frequency(clust, k < 5, I:III)clust <- medic(complications, id = id, atc = atc, k = 3:5) # make frequency tables cluster_frequency(clust, k == 5) cluster_frequency(clust, k < 5, I:III)
The function comedication_count() calculates the number of unique
medications for each individual and presents the count frequencies by
cluster.
comedication_count( object, only = NULL, clusters = NULL, count_grouper = function(x) { cut(x, breaks = c(0, 1, 2, Inf), labels = c("1", "2", "3+")) }, additional_data = NULL, ... )comedication_count( object, only = NULL, clusters = NULL, count_grouper = function(x) { cut(x, breaks = c(0, 1, 2, Inf), labels = c("1", "2", "3+")) }, additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
count_grouper |
A function for grouping counts. As a standard it groups counts as 1 medication, 2 medications, and 3+ medications. |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
comedication_count() calculates the number of ATC codes an individual has,
and then outputs the number of individuals within a cluster that has that
many ATC codes. Moreover, various relevant percentages or calculated. See
Value below for more details on these percentages.
comedication_count() returns a data frame of class
comedication_count
Clustering the name of the clustering.
Cluster the name of the cluster.
Medication Count a number of medications. The numbers or groups are
given by the count_grouper() function.
Number of People the number of individuals in cluster who has
Medication Count number of comedications in study.
Number of medications the number of medications of individuals who has
Medication Count number of comedications in the cluster.
Percentage of All People the percentage of individuals is study who has
Medication Count number of comedications in the cluster.
Percentage of People in Cluster the percentage of individuals in the
cluster who has Medication Count number of comedications.
Percentage of All Medications the percentage of medication in study from
individuals who has Medication Count number of comedications in cluster.
Percentage of Medication in Cluster the percentage of medication in
cluster from individuals who has Medication Count number of
comedications.
Percentage of People with the Same Medication Count percentage of
individuals among those with Medication Count number of comedications in
this cluster.
Percentage of Medication with the Same Medication Count percentage of
medication among medication of individuals with Medication Count number
of comedications in this cluster.
clust <- medic(complications, id = id, atc = atc, k = 3:5) comedication_count(clust, k == 5, clusters = I:III)clust <- medic(complications, id = id, atc = atc, k = 3:5) comedication_count(clust, k == 5, clusters = I:III)
We use this data set in all the examples in the package.
complicationscomplications
An object of class data.frame with 149 rows and 8 columns.
This function finds the default ATC groups for the summaries. It is used in
the summary.medic function.
default_atc_groups(object, min_n = 2)default_atc_groups(object, min_n = 2)
object |
A |
min_n |
The minimum number of ATC groups to be found. |
A data frame with two columns: regex and atc_groups.
A Simulated Data Set About Eczema
eczemaeczema
An object of class data.frame with 50644 rows and 7 columns.
Employ a clustering to new data
employ( object, new_data, only = NULL, additional_data = NULL, assignment_method = "nearest_cluster", parallel = FALSE, ... )employ( object, new_data, only = NULL, additional_data = NULL, assignment_method = "nearest_cluster", parallel = FALSE, ... )
object |
A |
new_data |
A data frame in which to look for variables with |
only |
< |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
assignment_method |
A character naming the employment method. The
default assignment method The assignment method |
parallel |
A logical or an integer. If If |
... |
Additional arguments affecting the employment procedure. |
employ returns a medic object.
part1 <- complications[1:100,] part2 <- complications[101:149,] clust <- medic(part1, id = id, atc = atc, k = 3) # Nearest cluster matching employ(clust, part2) # Only exact matching employ(clust, part2, assignment_method = "exact_only")part1 <- complications[1:100,] part2 <- complications[101:149,] clust <- medic(part1, id = id, atc = atc, k = 3) # Nearest cluster matching employ(clust, part2) # Only exact matching employ(clust, part2, assignment_method = "exact_only")
Enrich the parameter information in a clustering with user-defined data.
enrich(object, additional_data = NULL, by = NULL)enrich(object, additional_data = NULL, by = NULL)
object |
A medic object for enrichment. |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
by |
A character vector of variables to join by. This variables is
passed to the If To join by different variables on To join by multiple variables, use a vector with length > 1. For example,
For example, |
The enrich() function is a joining function used for enriching the
clustering characteristics with user-defined data. This function is used in
all of the investigative functions with a additional_data statement such as
summary(), cluster_frequency() and medication_frequency().
An object of class medic.
clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3:5 ) new_parameters <- data.frame(k = 3:5, size = c("small", "small", "large")) enrich(clust, new_parameters)clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3:5 ) new_parameters <- data.frame(k = 3:5, size = c("small", "small", "large")) enrich(clust, new_parameters)
Test if an object is a medic-object
is.medic(object)is.medic(object)
object |
Any object. |
TRUE is the object inherits from the medic class and has the required
elements.
clust <- medic(complications, id = id, atc = atc, k = 3) is.medic(clust)clust <- medic(complications, id = id, atc = atc, k = 3) is.medic(clust)
The medic method uses agglomerative hierarchical clustering with a
bespoke distance measure based on medication ATC codes similarities,
medication timing and medication amount or dosage.
medic( data, k = 5, id, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, parallel = FALSE, return_distance_matrix = FALSE, set_seed = FALSE, ... ) ## S3 method for class 'medic' print(x, ...)medic( data, k = 5, id, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, parallel = FALSE, return_distance_matrix = FALSE, set_seed = FALSE, ... ) ## S3 method for class 'medic' print(x, ...)
data |
A data frame containing all the variables for the clustering. |
k |
a vector specifying the number of clusters to identify. |
id |
< |
atc |
< |
timing |
< |
base_clustering |
< |
linkage |
The agglomeration method to be used in the clustering. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See stats::hclust for more information. For a discussion of linkage criterion choice see details below. |
summation_method |
The summation method used in the distance measure. This should be either "double_sum" or "sum_of_minima". See details below for more information. |
alpha |
A number giving the tuning of the normalization. See details below for more information. |
beta |
A number giving the power of the individual medication combinations. See details below for more information. |
gamma |
A number giving the weight of the timing terms. See details below for more information. |
p |
The power of the Minkowski distance used in the timing-specific distance. See details below for more information. |
theta |
A vector of length 6 specifying the tuning of the ATC measure. See details below for more information. |
parallel |
A logical or an integer. If If |
return_distance_matrix |
A logical. |
set_seed |
A logical or an integer. |
... |
Additional arguments not currently in use. |
x |
A |
The medic method uses agglomerative hierarchical
clustering with a bespoke distance measure based on medication ATC codes and
timing similarities to assign medication pattern clusters to people.
Two versions of the distance measure are available:
The double sum:
and the sum of minima:
If the normalization tuning, alpha, is 0, then no normalization is
preformed and the distance measure becomes highly dependent on the number of
distinct medications given. That is, people using more medication will have
larger distances to others. If the normalization tuning, alpha, is 1 -
the default - then the summation is normalized with the number of terms in
the sum, in other words, the average is calculated.
The central idea of this method, namely the ATC distance, is given as
The ATC distance is tuned using the vector theta.
Note that two ATC codes are said to match at level i when they are identical at level i. E.g. the two codes N06AB01 and N06AA01 match on level 1, 2, and 3 as they are both "N" at level 1, "N06" at level 2, and "N06A" at level 3, but at level 4 they differ ("N06AB" and "N06AA" are not the same).
The timing distance is a simple Minkowski distance:
When p is 1, the default, the Manhattan distance is used.
An object of class medic which describes the clusters produced the hierarchical clustering process. The object is a list with components:
the inputted data frame data with the cluster
assignments appended at the end.
a data frame with the person id as given by id,
the .analysis_order and the clusters found.
a list of the variables used in the clustering.
a data frame with all the inputted clustering
parameters and the corresponding method names. These method names
correspond to the column names for each cluster in the clustering
data frame described right above.
a list of keys used internally in the function to keep track of simplified versions of the data.
the distance matrices for each method if
return_distance_matrix is TRUE otherwise NULL.
the matched call.
print(medic): Print method for medic-objects
summary.medic for summaries and plots.
employ for employing an existing clustering to new data.
enrich for enriching the meta data in the medic object with additional
data.
# A simple clustering based only on ATC clust <- medic(complications, id = id, atc = atc, k = 3) # A simple clustering with both ATC and timing clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3 )# A simple clustering based only on ATC clust <- medic(complications, id = id, atc = atc, k = 3) # A simple clustering with both ATC and timing clust <- medic( complications, id = id, atc = atc, timing = first_trimester:third_trimester, k = 3 )
The function medications() calculates the frequency of the different
unique ATC codes within each cluster.
medication_frequency( object, only = NULL, clusters = NULL, additional_data = NULL, ... )medication_frequency( object, only = NULL, clusters = NULL, additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
medication_frequency() calculates the number of individuals with a specific
ATC code within a cluster. Moreover, it calculates the percentage of people
with this medication assigned to this cluster and the percent of people
within the cluster with this medication.
medication_frequency() returns a data frame with class
medication_frequency.
Clustering the name of the clustering.
Cluster the cluster name.
atc ATC codes.
Count number of individuals with this ATC code in this cluster.
Percent of All Medication the percentage of individuals in the study
with this ATC code and cluster.
Percent of Medication in Cluster the percent of individuals in the
cluster with this ATC code.
clust <- medic(complications, id = id, atc = atc, k = 3:5) medication_frequency(clust, k == 5, clusters = I:III)clust <- medic(complications, id = id, atc = atc, k = 3:5) medication_frequency(clust, k == 5, clusters = I:III)
Given the input of the medic this function checks the
input and constructs a data frame with the analysis parameters specified by
the user.
parameters_constructor( data, id, k = 5, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, ... )parameters_constructor( data, id, k = 5, atc, timing, base_clustering, linkage = "complete", summation_method = "sum_of_minima", alpha = 1, beta = 1, gamma = 1, p = 1, theta = (5:0)/5, ... )
data |
A data frame containing all the variables for the clustering. |
id |
< |
k |
a vector specifying the number of clusters to identify. |
atc |
< |
timing |
< |
base_clustering |
< |
linkage |
The agglomeration method to be used in the clustering. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See stats::hclust for more information. For a discussion of linkage criterion choice see details below. |
summation_method |
The summation method used in the distance measure. This should be either "double_sum" or "sum_of_minima". See details below for more information. |
alpha |
A number giving the tuning of the normalization. See details below for more information. |
beta |
A number giving the power of the individual medication combinations. See details below for more information. |
gamma |
A number giving the weight of the timing terms. See details below for more information. |
p |
The power of the Minkowski distance used in the timing-specific distance. See details below for more information. |
theta |
A vector of length 6 specifying the tuning of the ATC measure. See details below for more information. |
... |
Additional arguments not currently in use. |
A data.frame with the parameters for clustering.
parameters_constructor( data = complications, k = 3, id = id, atc = atc )parameters_constructor( data = complications, k = 3, id = id, atc = atc )
This function plots the cluster frequency.
plot_cluster_frequency(object, ...) ## S3 method for class 'medic' plot_cluster_frequency(object, ...) ## S3 method for class 'summary.medic' plot_cluster_frequency(object, ...) ## S3 method for class 'cluster_frequency' plot_cluster_frequency(object, scale = "percent", with_population = FALSE, ...)plot_cluster_frequency(object, ...) ## S3 method for class 'medic' plot_cluster_frequency(object, ...) ## S3 method for class 'summary.medic' plot_cluster_frequency(object, ...) ## S3 method for class 'cluster_frequency' plot_cluster_frequency(object, scale = "percent", with_population = FALSE, ...)
object |
The object containing the cluster frequency data. |
... |
Additional arguments passed to the plotting functions. |
scale |
The scale of the y-axis. Must be either "percent" or "count". |
with_population |
Logical value indicating whether to include the population cluster. |
A ggplot object.
clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_cluster_frequency() clust |> cluster_frequency() |> plot_cluster_frequency() clust |> summary() |> plot_cluster_frequency()clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_cluster_frequency() clust |> cluster_frequency() |> plot_cluster_frequency() clust |> summary() |> plot_cluster_frequency()
This function plots the comedication count.
plot_comedication_count(object, ...) ## S3 method for class 'medic' plot_comedication_count(object, ...) ## S3 method for class 'summary.medic' plot_comedication_count(object, ...) ## S3 method for class 'comedication_count' plot_comedication_count( object, scale = "percent", scope = "cluster", focus = "people", with_population = FALSE, ... )plot_comedication_count(object, ...) ## S3 method for class 'medic' plot_comedication_count(object, ...) ## S3 method for class 'summary.medic' plot_comedication_count(object, ...) ## S3 method for class 'comedication_count' plot_comedication_count( object, scale = "percent", scope = "cluster", focus = "people", with_population = FALSE, ... )
object |
The object containing the comedication count data. |
... |
Additional arguments passed to the plotting functions. |
scale |
The scale of the y-axis. Must be either "percent" or "count". |
scope |
The scope of the plot. Must be one of "cluster", "global" or "medication count". |
focus |
The focus of the plot. Must be either "people" or "medication". |
with_population |
Logical value indicating whether to include the population cluster. |
A ggplot object.
clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_comedication_count() clust |> comedication_count() |> plot_comedication_count() clust |> summary() |> plot_comedication_count()clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_comedication_count() clust |> comedication_count() |> plot_comedication_count() clust |> summary() |> plot_comedication_count()
This function plots the medication frequency.
plot_medication_frequency(object, ...) ## S3 method for class 'medic' plot_medication_frequency(object, ...) ## S3 method for class 'summary.medic' plot_medication_frequency(object, ...) ## S3 method for class 'medication_frequency' plot_medication_frequency( object, scale = "percent", scope = "cluster", with_population = FALSE, ... )plot_medication_frequency(object, ...) ## S3 method for class 'medic' plot_medication_frequency(object, ...) ## S3 method for class 'summary.medic' plot_medication_frequency(object, ...) ## S3 method for class 'medication_frequency' plot_medication_frequency( object, scale = "percent", scope = "cluster", with_population = FALSE, ... )
object |
The object containing the medication frequency data. |
... |
Additional arguments passed to the plotting functions. |
scale |
The scale of the y-axis. Must be either "percent" or "count". |
scope |
The scope of the plot. Must be one of "cluster", "global" or "medication". |
with_population |
Logical value indicating whether to include the population cluster. |
A ggplot object.
clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_medication_frequency() clust |> medication_frequency() |> plot_medication_frequency() clust |> summary() |> plot_medication_frequency()clust <- medic(complications, id = id, atc = atc, k = 3) clust |> plot_medication_frequency() clust |> medication_frequency() |> plot_medication_frequency() clust |> summary() |> plot_medication_frequency()
This function plots the summary of the clustering results.
plot_summary(object, ...) ## S3 method for class 'medic' plot_summary(object, only = NULL, clusters = NULL, additional_data = NULL, ...) ## S3 method for class 'summary.medic' plot_summary( object, n_breaks = 5, plot_individual = FALSE, labels = FALSE, alpha_individual = 0.1, label_y_value = 0.1, ... )plot_summary(object, ...) ## S3 method for class 'medic' plot_summary(object, only = NULL, clusters = NULL, additional_data = NULL, ...) ## S3 method for class 'summary.medic' plot_summary( object, n_breaks = 5, plot_individual = FALSE, labels = FALSE, alpha_individual = 0.1, label_y_value = 0.1, ... )
object |
The object containing the summary data. |
... |
Additional arguments passed to the plotting functions. |
only |
< The default |
clusters |
< The default |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
n_breaks |
The number of breaks for the time scale. |
plot_individual |
Logical value indicating whether to plot individual trajectories. |
labels |
Logical value indicating whether to include labels. |
alpha_individual |
The alpha value for the individual trajectories. |
label_y_value |
A number between 0 and 1 that defines the height of the label text hight. |
A ggplot object.
clust <- medic( complications, id = id, atc = atc, k = 3, timing = first_trimester:third_trimester ) clust |> plot_summary() clust |> summary() |> plot_summary() # If the clustering object contains more than one clustering, it is necessary # to filter the clustering, as only one clustering can be plotted at a time. clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_summary(only = k == 4) clust |> summary(only = k == 4) |> plot_summary()clust <- medic( complications, id = id, atc = atc, k = 3, timing = first_trimester:third_trimester ) clust |> plot_summary() clust |> summary() |> plot_summary() # If the clustering object contains more than one clustering, it is necessary # to filter the clustering, as only one clustering can be plotted at a time. clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_summary(only = k == 4) clust |> summary(only = k == 4) |> plot_summary()
This function plots the timing ATC group.
plot_timing_atc_group(object, ...) ## S3 method for class 'medic' plot_timing_atc_group(object, ...) ## S3 method for class 'summary.medic' plot_timing_atc_group(object, ...) ## S3 method for class 'timing_atc_group' plot_timing_atc_group( object, focus = "average", with_population = FALSE, max_lines = 50, ... )plot_timing_atc_group(object, ...) ## S3 method for class 'medic' plot_timing_atc_group(object, ...) ## S3 method for class 'summary.medic' plot_timing_atc_group(object, ...) ## S3 method for class 'timing_atc_group' plot_timing_atc_group( object, focus = "average", with_population = FALSE, max_lines = 50, ... )
object |
The object containing the timing ATC group data. |
... |
Additional arguments passed to the plotting functions. |
focus |
The focus of the plot. Must be either "average", "individual" or "both". |
with_population |
Logical value indicating whether to include the population cluster. |
max_lines |
The maximum number of lines to plot. |
A ggplot object.
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_timing_atc_group() clust |> timing_atc_group() |> plot_timing_atc_group() clust |> summary() |> plot_timing_atc_group()clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_timing_atc_group() clust |> timing_atc_group() |> plot_timing_atc_group() clust |> summary() |> plot_timing_atc_group()
This function plots the timing trajectory.
plot_timing_trajectory(object, ...) ## S3 method for class 'medic' plot_timing_trajectory(object, ...) ## S3 method for class 'summary.medic' plot_timing_trajectory(object, ...) ## S3 method for class 'timing_trajectory' plot_timing_trajectory( object, focus = "average", with_population = FALSE, max_lines = 50, ... )plot_timing_trajectory(object, ...) ## S3 method for class 'medic' plot_timing_trajectory(object, ...) ## S3 method for class 'summary.medic' plot_timing_trajectory(object, ...) ## S3 method for class 'timing_trajectory' plot_timing_trajectory( object, focus = "average", with_population = FALSE, max_lines = 50, ... )
object |
The object containing the timing trajectory data. |
... |
Additional arguments passed to the plotting functions. |
focus |
The focus of the plot. Must be either "average", "individual" or "both". |
with_population |
Logical value indicating whether to include the population cluster. |
max_lines |
The maximum number of lines to plot. |
A ggplot object.
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_timing_trajectory() clust |> timing_trajectory() |> plot_timing_trajectory() clust |> summary() |> plot_timing_trajectory()clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) clust |> plot_timing_trajectory() clust |> timing_trajectory() |> plot_timing_trajectory() clust |> summary() |> plot_timing_trajectory()
This function prints a summary of medication information.
## S3 method for class 'summary.medic' print(x, ...)## S3 method for class 'summary.medic' print(x, ...)
x |
An object of class |
... |
currently only included for compatibility with generic. Has no effect. |
This function prints various information about medication, including cluster frequency, medication frequency, number of different medication taken in the study period, average exposure trajectories, and average exposure trajectories by ATC groups.
The function is called for its side effects and does not return any value.
Refactor the levels of the chosen clusters.
refactor(object, ..., inheret_parameters = TRUE)refactor(object, ..., inheret_parameters = TRUE)
object |
A |
... |
< The name gives the name of the new clustering in the output. The value can be:
When a recording uses the name of an existing clustering, this new clustering will overwrite the existing one. |
inheret_parameters |
A logical. If |
A medic object with relevant clusterings refactored.
clust <- medic(complications, id = id, atc = atc, k = 3:4) # Refactor one clustering refactor( clust, `cluster_1_k=4` = dplyr::recode(`cluster_1_k=4`, IV = "III") ) # Refactor all clusterings refactor( clust, dplyr::across( dplyr::everything(), ~dplyr::recode(., IV = "III") ) )clust <- medic(complications, id = id, atc = atc, k = 3:4) # Refactor one clustering refactor( clust, `cluster_1_k=4` = dplyr::recode(`cluster_1_k=4`, IV = "III") ) # Refactor all clusterings refactor( clust, dplyr::across( dplyr::everything(), ~dplyr::recode(., IV = "III") ) )
fuzzyjoin was kicked out of CRAN, so I quickly made an extremely simple version of regex_inner_join, that would suit our needs here.
regex_inner_join(x, y, by)regex_inner_join(x, y, by)
x |
A data frame with valid ATC codes. |
y |
A data frame with regex codes and corresponding groups. |
by |
A named vector of length 1 where the name is the name of the
ATC column in This function assumes that x has the full ATC codes and that y has the regex, and that by is only of length 1. And we're simply doing a cross-join caus i'm lazy like that. |
A data frame with added columns from y to x based on a regex match.
Summary of a medic-object using str function
## S3 method for class 'summary.medic' str(object, ...)## S3 method for class 'summary.medic' str(object, ...)
object |
A |
... |
Additional arguments passed to This function provides a summary of an object by using the |
Functions for cropping summarized cluster data.
summary_crop(object, ...) ## S3 method for class 'cluster_frequency' summary_crop(object, top_n = 5L, min_count = 0, min_percent = 0, ...) ## S3 method for class 'medication_frequency' summary_crop( object, top_n = 5L, min_count = 0, min_percent = 0, scope = "cluster", ... ) ## S3 method for class 'comedication_count' summary_crop(object, ...) ## S3 method for class 'timing_trajectory' summary_crop(object, sample_n_individual = 100L, weighted_sample = TRUE, ...) ## S3 method for class 'timing_atc_group' summary_crop( object, sample_n_individual = 100L, weighted_sample = TRUE, min_count = 0L, ... ) ## S3 method for class 'summary.medic' summary_crop(object, which = "all", ...)summary_crop(object, ...) ## S3 method for class 'cluster_frequency' summary_crop(object, top_n = 5L, min_count = 0, min_percent = 0, ...) ## S3 method for class 'medication_frequency' summary_crop( object, top_n = 5L, min_count = 0, min_percent = 0, scope = "cluster", ... ) ## S3 method for class 'comedication_count' summary_crop(object, ...) ## S3 method for class 'timing_trajectory' summary_crop(object, sample_n_individual = 100L, weighted_sample = TRUE, ...) ## S3 method for class 'timing_atc_group' summary_crop( object, sample_n_individual = 100L, weighted_sample = TRUE, min_count = 0L, ... ) ## S3 method for class 'summary.medic' summary_crop(object, which = "all", ...)
object |
The summary object to be cropped. |
... |
Additional arguments to be passed to the specific method. |
top_n |
integer. In the case of |
min_count |
integer. The minimum count of a cluster or medication to keep it in the summary. If 0, the default, the minimum count is zero, i.e. there is not a minimum count. |
min_percent |
numeric. The minimum percentage of a cluster or medication to keep it in the summary. If 0, the default, the minimum percentage is zero, i.e. there is not a minimum percentage. |
scope |
character. The scope of the summary crops |
sample_n_individual |
a logical or integer. If FALSE, no individual
timing trajectories are sampled. If integer, |
weighted_sample |
a logical, but only used if |
which |
A character vector specifying which summaries to crop. The options are "cluster_frequency", "medication_frequency", "comedication_count", "timing_trajectory", and "timing_atc_group". The default is "all". |
A summary object, which is a modified version of the input summary object.
cluster_frequency summary cropExtracts the top top_n clusters by count. If top_n is Inf, all clusters
are kept. If min_count is greater than 0, clusters with a count less than
min_count are removed. If min_percent is greater than 0, clusters with a
percentage less than min_percent are removed. The remaining clusters are
grouped into a "Remaining" cluster.
medication_frequency summary cropExtracts the top top_n medications by count. If top_n is Inf, all
medications are kept. If min_count is greater than 0, medications with a
count less than min_count are removed. If min_percent is greater than 0,
medications with a percentage less than min_percent are removed. The
remaining medications are grouped into a "Remaining" cluster.
The scope argument determines the scope of the crop. If scope is
"cluster", the crop is based on the percentage of medication in the cluster.
If scope is "global", the crop is based on the percentage of all
medication.
comedication_count summary cropTO DO
timing_trajectory summary cropSamples sample_n_individual individual timing trajectories. If
sample_n_individual is Inf, all individual timing trajectories are kept.
If weighted_sample is TRUE, the individual timing trajectories are
sampled weighted by the number of medications in the individual timing
trajectory.
timing_atc_group summary cropSamples sample_n_individual individual timing trajectories. If
sample_n_individual is Inf, all individual timing trajectories are kept.
If weighted_sample is TRUE, the individual timing trajectories are
sampled weighted by the number of medications in the individual timing
trajectory.
summary.medic summary cropCrops multiple summaries. The which argument is a character vector
specifying which summaries to crop. The options are "cluster_frequency",
"medication_frequency", "comedication_count", "timing_trajectory", and
"timing_atc_group". If which is "all", all summaries are cropped.
The ... argument is passed to the specific methods, e.g. top_n and
min_count are passed to cluster_frequency and medication_frequency.
summary, cluster_frequency,
medication_frequency, comedication_count,
timing_trajectory, timing_atc_group
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) # Crop the cluster frequency summary clust |> cluster_frequency() |> summary_crop(top_n = 3) clust |> summary() |> summary_crop(which = "cluster_frequency", top_n = 3) # Crop the medication frequency summary clust |> medication_frequency() |> summary_crop(top_n = 3) clust |> summary() |> summary_crop(which = "medication_frequency", top_n = 3) # Crop the co-medication count summary clust |> comedication_count() |> summary_crop(min_count = 10) clust |> summary() |> summary_crop(which = "comedication_count", min_count = 10) # crop the timing trajectory summary clust |> timing_trajectory() |> summary_crop() clust |> summary() |> summary_crop(which = "timing_trajectory") # crop the timing ATC group summary clust |> timing_atc_group() |> summary_crop() clust |> summary() |> summary_crop(which = "timing_atc_group") # crop multiple summaries clust |> summary() |> summary_crop( which = c("cluster_frequency", "medication_frequency"), top_n = 3 )clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) # Crop the cluster frequency summary clust |> cluster_frequency() |> summary_crop(top_n = 3) clust |> summary() |> summary_crop(which = "cluster_frequency", top_n = 3) # Crop the medication frequency summary clust |> medication_frequency() |> summary_crop(top_n = 3) clust |> summary() |> summary_crop(which = "medication_frequency", top_n = 3) # Crop the co-medication count summary clust |> comedication_count() |> summary_crop(min_count = 10) clust |> summary() |> summary_crop(which = "comedication_count", min_count = 10) # crop the timing trajectory summary clust |> timing_trajectory() |> summary_crop() clust |> summary() |> summary_crop(which = "timing_trajectory") # crop the timing ATC group summary clust |> timing_atc_group() |> summary_crop() clust |> summary() |> summary_crop(which = "timing_atc_group") # crop multiple summaries clust |> summary() |> summary_crop( which = c("cluster_frequency", "medication_frequency"), top_n = 3 )
Make cluster characterizing summaries.
## S3 method for class 'medic' summary( object, only = NULL, clusters = NULL, outputs = "all", additional_data = NULL, ... )## S3 method for class 'medic' summary( object, only = NULL, clusters = NULL, outputs = "all", additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
outputs |
A character vector naming the desired characteristics to output. The default names all possible output types. |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
A list of clustering characteristics of class summary.medic is returned. It
can contain any of the following characteristics:
The number of individuals assigned to each cluster and the associated frequency of assignment.
The number of individuals with a specific ATC code within a cluster. Moreover, it calculates the percentage of people with this medication assigned to this cluster and the percent of people within the cluster with this medication.
The number of ATC codes an individual has, and then outputs the number of individuals within a cluster that has that many ATC codes. Moreover, various relevant percentages or calculated. See Value below for more details on these percentages.
The number of unique timing trajectories in each cluster, and the average timing trajectories in each cluster.
The number of people with unique timing trajectory and ATC group, as given by
atc_groups, in each cluster.
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) summary(clust)clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) summary(clust)
The function timing_atc_group() calculates the frequencies of distinct
timing and ATC combinations within clusters.
timing_atc_group( object, only = NULL, clusters = NULL, atc_groups = default_atc_groups, additional_data = NULL, ... )timing_atc_group( object, only = NULL, clusters = NULL, atc_groups = default_atc_groups, additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
atc_groups |
A data.frame specifying the ATC groups to summaries by or a funciton that returns such a data.frame. The data.frame must have two columns:
As a standard the anatomical level (first level) of the ATC codes is used. |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
timing_atc_group() calculates both the number of people with unique timing
trajectory and ATC group, as given by atc_groups, in each cluster.
timing_atc_group() returns a list of class
timing_atc_group with two data frames:
Clustering the name of the clustering.
Cluster the name of the cluster.
ATC Groups the name of the ATC group. The groups are given by the
atc_groups input.
timing variables the average timing value in the ATC group and cluster.
Number of Medications the number of medications in the ATC group in
the cluster.
Percentage of Medications the percentage of medication in the cluster
with this ATC group.
Number of Distinct Timing Trajectories the number of unique timing
trajectories in the ATC group in the cluster.
Clustering the name of the clustering.
Cluster the name of the cluster.
timing variables a unique timing pattern in the ATC group and cluster.
Number of Medications with Timing Trajectory the number of medications
with this unique timing trajectory and ATC group.
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) timing_atc_group(clust, k == 5, clusters = I:III)clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) timing_atc_group(clust, k == 5, clusters = I:III)
timing_trajectory() calculates the average timing paths within clusters.
timing_trajectory( object, only = NULL, clusters = NULL, additional_data = NULL, ... )timing_trajectory( object, only = NULL, clusters = NULL, additional_data = NULL, ... )
object |
An object for which a summary is desired. |
only |
< The default |
clusters |
< The default |
additional_data |
A data frame with additional data that may be
(left-)joined onto the |
... |
Additional arguments passed to the specific summary sub-function. |
timing_trajectory() calculates both the number of unique timing
trajectories in each cluster and the average timing trajectories in each
cluster.
timing_trajectory() returns a list of class timing_trajectory with two
data frames:
Clustering the name of the clustering.
Cluster the cluster name.
timing variables the average timing value in the cluster.
Count the number of people in the cluster.
Clustering the name of the clustering.
Cluster the cluster name.
timing variables unique timing pattern in the cluster.
Count number of people with this unique timing pattern.
clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) timing_trajectory(clust, k == 5, clusters = I:III)clust <- medic( complications, id = id, atc = atc, k = 3:5, timing = first_trimester:third_trimester ) timing_trajectory(clust, k == 5, clusters = I:III)