Title: | A Novel Multi-Disease Automated Phenotyping Method for the Electronic Health Record |
---|---|
Description: | A Novel Multi-Disease Automated Phenotyping Method for the Electronic Health Record. |
Authors: | Yuri Ahuja [aut, cre], Tianxi Cai [aut], PARSE LTD [aut] |
Maintainer: | Yuri Ahuja <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2025-01-22 02:40:54 UTC |
Source: | https://github.com/celehs/surelda |
Surrogate-guided ensemble Latent Dirichlet Allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on two surrogate features for each target disease, and then leverages these probabilities to guide the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities.
Click HERE to view details.
simdata
simdata
An object of class list
of length 6.
str(simdata)
str(simdata)
Surrogate-guided ensemble Latent Dirichlet Allocation
sureLDA( X, ICD, NLP, HU, filter, prior = "PheNorm", weight = "beta", nEmpty = 20, alpha = 100, beta = 100, burnin = 50, ITER = 150, phi = NULL, nCores = 1, labeled = NULL, verbose = FALSE )
sureLDA( X, ICD, NLP, HU, filter, prior = "PheNorm", weight = "beta", nEmpty = 20, alpha = 100, beta = 100, burnin = 50, ITER = 150, phi = NULL, nCores = 1, labeled = NULL, verbose = FALSE )
X |
nPatients x nFeatures matrix of EHR feature counts |
ICD |
nPatients x nPhenotypes matrix of main ICD surrogate counts |
NLP |
nPatients x nPhenotypes matrix of main NLP surrogate counts |
HU |
nPatients-dimensional vector containing the healthcare utilization feature |
filter |
nPatients x nPhenotypes binary matrix indicating filter-positives |
prior |
'PheNorm', 'MAP', or nPatients x nPhenotypes matrix of prior probabilities (defaults to PheNorm) |
weight |
'beta', 'uniform', or nPhenotypes x nFeatures matrix of feature weights (defaults to beta) |
nEmpty |
Number of 'empty' topics to include in LDA step (defaults to 10) |
alpha |
LDA Dirichlet hyperparameter for patient-topic distribution (defaults to 100) |
beta |
LDA Dirichlet hyperparameter for topic-feature distribution (defaults to 100) |
burnin |
number of burnin Gibbs iterations (defaults to 50) |
ITER |
number of subsequent iterations for inference (defaults to 150) |
phi |
(optional) nPhenotypes x nFeatures pre-trained topic-feature distribution matrix |
nCores |
(optional) Number of parallel cores to use only if phi is provided (defaults to 1) |
labeled |
(optional) nPatients x nPhenotypes matrix of a priori labels (set missing entries to NA) |
verbose |
(optional) indicating whether to output verbose progress updates |
scores nPatients x nPhenotypes matrix of weighted patient-phenotype assignment counts from LDA step
probs nPatients x nPhenotypes matrix of patient-phenotype posterior probabilities
ensemble Mean of sureLDA posterior and PheNorm/MAP prior
prior nPatients x nPhenotypes matrix of PheNorm/MAP phenotype probability estimates
phi nPhenotypes x nFeatures topic distribution matrix from LDA step
weights nPhenotypes x nFeatures matrix of topic-feature weights