Skip to content

tabularTools -- A biomedical ML-based data analysis pipeline

Notifications You must be signed in to change notification settings

amcim/tabularTools

Repository files navigation

R-CMD-check

Tabular Tools

A config-driven R package for reproducable analysis of tabular biomedical data

Motivation

tabularTools demonstrates a clean, testable software engineering approach to tabular data analysis in biomedical research, with a focus on

  • explicit configuration with YAML
  • reproducable preprocessing using dplyr/tidyverse
  • extendable modeling APIs
  • parallel-safe execution
  • unit-tested components
#Reading the yaml config file
read_config()

#Reading data file specified in config
read_data()

#Validating the data is valid tabular data
validate_data()

#Proprocessing based on config definitions
preprocess_data()

#Fitting defined models
fit_models()

#Evaluation of model results
evaluate_results()

#Creating visualizations based on evaluations
visualize_results()

Example Usage

library(tabularTools)
library(future)

#Enable parallel execution
plan(multisession, workers = 4)

cfg   <- read_config("example/config.yaml")
data  <- read_data(cfg)

validate_data(data, cfg)

pdata <- preprocess_data(cfg, data)

models <- fit_models(pdata, cfg)

#Inspect fitted model
summary(models$logistic$`0_vs_1`$model)

User Configuration

Analysis is driven by a YAML configuration file

data:
  file: "heart_disease_uci.csv"
  
analysis:
  outcome: num
  predictors:
    - age
    - sex
    - chol
    - cp
    - trestbps
    - fbs
    - restecg
    - thalch
    - exang
    - oldpeak
    - slope
    - ca
    - thal
  models:
    - logistic
    - svm
  contrasts:
    - [0, 1]
    - [0, 2]
    - [0, 3]
    - [0, 4]

preprocessing:
  scale_numeric: true
  impute_missing: median #options are "drop", "median", or "none"

visualization:
  roc_curve: true
  coefficient_plot: true

report:
  title: "Heart Disease Analysis"

Repository structure

  • R/ - source code
  • tests/ - testthat unit tests
  • vignettes/ - sample quarto markdown files

Status

This package is under active development and is intended to show R's useful software development tools to simplify the management of complex tabular data

About

tabularTools -- A biomedical ML-based data analysis pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages