ADNI-Mixed-Effect-Models.Rmd
library(HLSGUtils)
wgs_adno_partitioner
function create data partitions from ADNI datasets. To run mixed effect model on ADNI data we have three data source:
We also set the variables from each data source in function arguments: - clinical_variables: variables in ADNI dataset - pca_components: column related to PCA components
After fix input data path, we set number of partitions and partitions save directory.
wgs_adno_partitioner(
partition_number = 300,
clinical_variables = c("MMSE", "GENDER","AGE", "MMSE.bl", "PTEDUCAT", "PTID", "VISID"),
pca_components = c("PC1", "PC2", "PC3"),
wgs_path = "/data/WGScompletedQC_phenoed.raw",
pca_path = "/data/ADNIprunedpostQCPCA.eigenvec",
adni_path = "/data/adnimerge.RData",
partitions_save_path = "/data/ADNI/data_partitions/")
We need a R script that runs on each core to execute LMER modelling in parallel. function_to_Rscript
helps to generate an R script from a function in a package or source from the local code. This script is designed to run independently on each core. To create script we need set:
script_name
: the name of the created script.function_name
: the name of a function in a package or the address of a function’s source file.packages
: list of packages that are loaded in the script.arguments
: includes function input argumentsarguments_class
: contains a vector of argument types (character, integer, numeric).
function_to_Rscript(
script_name = "/scripts/Parallel_Modeling.R",
function_from_package = "lmer_modeling",
packages = c("HLSGUtils"),
arguments = c("data_path", "simulation_name", "formula", "save_model_path"),
arguments_class = c("character", "character", "character", "character")
)
lmer_modeling
function is written to run LMER model on each data partitions. It needs :
data_path
: the path of partition data,formula
: lmer formula that contains random effect term,simulation_name
: is used in the save model file name.save_model_path
: The directory of saving model output.Finally we use parallel_rscripts to run scripts on parallel cores. The main arguments of function are:
rscript_path
: path to the script that is run concurrently.args
: script’s input arguments.free_memory_treshold
: upper bound on memory usage percentagefree_cpu_treshold
: upper bound on CPU percentage
library(HLSGUtils)
# `lmer_modeling` input arguments
partitions_files = list.files("/data/data_partitions/", full.names = T)
save_model_path = "/data/models/"
formula = paste0("'","MMSE~GENDER+AGE+MMSE.bl+PC1+PC2+PC3+PTEDUCAT+copy_number+(1|PTID)+(1|VISID)","'")
simulation_name = "full_model"
parallel_rscripts(
rscript_path = "/scripts/Parallel_Modeling.R",
args = list(data_path = partitions_files,
simulation_name = simulation_name,
formula = formula,
save_model_path = save_model_path),
used_memory_treshold = 80,
used_cpu_treshold = 80,
sleep_time = 10
)
To aggregate models coefficients use aggregate_coefficients
function.
aggregate_coefficients(
save_model_directory = "/data/models/",
model_names_pattern = "full_model",
save_model_path = "/data/aggregated_models/full_model.rds"
)