Package 'PheNorm' reference manual

Title:	Unsupervised Gold-Standard Label Free Phenotyping Algorithm for EHR Data
Description:	The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification. In particular, it normalizes the surrogate to resemble gaussian mixture and leverages the remaining features through random corruption denoising. Background and details about the method can be found at Yu et al. (2018) <doi:10.1093/jamia/ocx111>.
Authors:	Sheng Yu [aut], Victor Castro [aut], Clara-Lea Bonzel [aut, cre], Molei Liu [aut], Chuan Hong [aut], Tianxi Cai [aut], PARSE LTD [aut]
Maintainer:	Clara-Lea Bonzel <[email protected]>
License:	GPL-3
Version:	0.1.1
Built:	2025-02-28 04:16:14 UTC
Source:	https://github.com/celehs/phenorm

Fit the phenotyping algorithm PheNorm using EHR features

Description

The function requires as input: * a surrogate, such as the ICD code * the healthcare utilization It can leverage other EHR features (optional) to assist risk prediction.

Usage

PheNorm.Prob(
  nm.logS.ori,
  nm.utl,
  dat,
  nm.X = NULL,
  corrupt.rate = 0.3,
  train.size = 10 * nrow(dat)
)
PheNorm.Prob(
  nm.logS.ori,
  nm.utl,
  dat,
  nm.X = NULL,
  corrupt.rate = 0.3,
  train.size = 10 * nrow(dat)
)

Arguments

`nm.logS.ori`	name of the surrogates (log(ICD+1), log(NLP+1) and log(ICD+NLP+1))
`nm.utl`	name of healthcare utilization (e.g. note count, encounter_num etc)
`dat`	all data columns need to be log-transformed and need column names
`nm.X`	additional features other than the main ICD and NLP
`corrupt.rate`	rate for random corruption denoising, between 0 and 1, default value=0.3
`train.size`	size of training sample, default value 10 * nrow(dat)

Value

list containing probability and beta coefficient

Examples

## Not run: 
set.seed(1234)
fit.dat <- read.csv("https://raw.githubusercontent.com/celehs/PheNorm/master/data-raw/data.csv")
fit.phenorm=PheNorm.Prob("ICD", "utl", fit.dat, nm.X = NULL,
                          corrupt.rate=0.3, train.size=nrow(fit.dat));
head(fit.phenorm$probs)

## End(Not run)
## Not run: 
set.seed(1234)
fit.dat <- read.csv("https://raw.githubusercontent.com/celehs/PheNorm/master/data-raw/data.csv")
fit.phenorm=PheNorm.Prob("ICD", "utl", fit.dat, nm.X = NULL,
                          corrupt.rate=0.3, train.size=nrow(fit.dat));
head(fit.phenorm$probs)

## End(Not run)