Skip to content

autocodebook

Automatic Codebook and Tracking for 'Spark' and 'dplyr' Pipelines

v0.1.0 · Jun 8, 2026 · MIT + file LICENSE

Description

Wraps 'dplyr' verbs (mutate, summarise, filter) to automatically capture variable metadata (type, source columns, categories, and source code), producing a codebook and eligibility tracking table with zero manual documentation. Works with both 'sparklyr' (tbl_spark) and local data frames. Adds big-data optimizations (caching, assume-unique counting, checkpointing) and a standardized report module with an eligibility flowchart, editable codebook export (HTML, DOCX, XLSX), and cross-sectional or longitudinal variable inspection. The eligibility flowchart follows the CONSORT statement (Schulz, Altman and Moher (2010) <doi:10.1136/bmj.c332>) and the reporting of observational cohort studies follows the STROBE recommendations (von Elm and others (2007) <doi:10.1371/journal.pmed.0040296>).

CRAN Check Status

10 OK
Show all 10 flavors
Flavor Status
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK

Check History

OK 7 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Jun 9, 2026

Dependency Network

Dependencies Reverse dependencies dplyr rlang tibble gt autocodebook

Version History

new 0.1.0 Jun 8, 2026