tabularTools demonstrates a clean, testable software engineering approach to tabular data analysis in biomedical research, with a focus on
- explicit configuration with YAML
- reproducable preprocessing using dplyr/tidyverse
- extendable modeling APIs
- parallel-safe execution
- unit-tested components
#Reading the yaml config file
read_config()
#Reading data file specified in config
read_data()
#Validating the data is valid tabular data
validate_data()
#Proprocessing based on config definitions
preprocess_data()
#Fitting defined models
fit_models()
#Evaluation of model results
evaluate_results()
#Creating visualizations based on evaluations
visualize_results()
library(tabularTools)
library(future)
#Enable parallel execution
plan(multisession, workers = 4)
cfg <- read_config("example/config.yaml")
data <- read_data(cfg)
validate_data(data, cfg)
pdata <- preprocess_data(cfg, data)
models <- fit_models(pdata, cfg)
#Inspect fitted model
summary(models$logistic$`0_vs_1`$model)Analysis is driven by a YAML configuration file
data:
file: "heart_disease_uci.csv"
analysis:
outcome: num
predictors:
- age
- sex
- chol
- cp
- trestbps
- fbs
- restecg
- thalch
- exang
- oldpeak
- slope
- ca
- thal
models:
- logistic
- svm
contrasts:
- [0, 1]
- [0, 2]
- [0, 3]
- [0, 4]
preprocessing:
scale_numeric: true
impute_missing: median #options are "drop", "median", or "none"
visualization:
roc_curve: true
coefficient_plot: true
report:
title: "Heart Disease Analysis"
- R/ - source code
- tests/ - testthat unit tests
- vignettes/ - sample quarto markdown files
This package is under active development and is intended to show R's useful software development tools to simplify the management of complex tabular data