-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathInspect_sqlite_Notebook.qmd
More file actions
128 lines (95 loc) · 3.53 KB
/
Inspect_sqlite_Notebook.qmd
File metadata and controls
128 lines (95 loc) · 3.53 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "SQLite Databases (.sqlite, .db)"
author: "Daniel Manrique-Castano"
date: "2025-01-05"
format:
html:
toc: true
toc-location: left
code-fold: true
bibliography: references.bib
params:
target_dir: "data/Inspect_sqlite/"
---
## 1. Overview
**SQLite** is the most widely deployed database engine in the world (see [SQLite.org](https://sqlite.org/about.html)). Unlike client-server database management systems (DBMS) like PostgreSQL or Oracle, an SQLite database is a single, self-contained disk file.
::: {.callout-note title="Curation Goal"}
From a curation perspective, SQLite is a highly sustainable format. [The Library of Congress](https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml) lists it as a preferred format for datasets due to its open specification and long-term stability. Our goal is to "unbox" the database to ensure all tables are documented and accessible.
:::
::: {.callout-warning title="Preservation Risk"}
"Self-Contained" does not imply "Self-Documenting." A curator receiving a `.db` file often faces a **Black Box**. Without a schema map, relationships between tables (Foreign Keys) can be lost, making the data contextually unusable.
:::
------------------------------------------------------------------------
## Setup
We use `RSQLite` [@RSQLite] to interact with the database files directly from R.
### R Packages
```{r}
# install.packages(c("tidyverse", "RSQLite", "DBI", "rstudioapi", "DT"))
```
### Load Libraries
```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
library(DBI)
library(RSQLite)
library(rstudioapi)
library(DT)
```
## Select Target Directory
```{r}
#| label: select-directory
if (interactive() && .Platform$OS.type == "windows") {
selected_dir <- rstudioapi::selectDirectory(caption = "Select SQLite Directory")
} else {
selected_dir <- NULL
}
target_dir <- if (!is.null(selected_dir)) selected_dir else params$target_dir
print(paste("Analyzing directory:", target_dir))
```
## Inventory and Schema Extraction
This process iterates through all SQLite files, connects to them, and extracts the structure of every table found.
```{r}
#| label: database-inspection
db_files <- list.files(target_dir, pattern = "\\.(sqlite|db|sqlite3)$", full.names = TRUE, recursive = TRUE)
inspect_sqlite <- function(file_path) {
con <- dbConnect(SQLite(), file_path)
tables <- dbListTables(con)
details <- map_dfr(tables, function(tbl) {
# Get column info
cols <- dbGetQuery(con, paste0("PRAGMA table_info(", tbl, ")"))
# Get row count
row_count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM ", tbl))$count
tibble(
file = basename(file_path),
table_name = tbl,
columns = paste(cols$name, collapse = ", "),
row_count = row_count
)
})
dbDisconnect(con)
return(details)
}
if (length(db_files) > 0) {
db_inventory <- map_dfr(db_files, inspect_sqlite)
datatable(db_inventory, caption = "Inventory of Tables and Row Counts")
} else {
message("No SQLite files found.")
}
```
## Archival Reporting
We save the database manifest to support long-term preservation documentation.
```{r}
#| label: save-results
if (exists("db_inventory")) {
output_dir <- "Results/Inspect_sqlite"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
timestamp <- format(Sys.Date(), "%Y%m%d")
write.csv(db_inventory, file.path(output_dir, paste0("Database_Manifest_", timestamp, ".csv")), row.names = FALSE)
message("Report saved to: ", output_dir)
}
```
## References
::: {#refs}
:::