--- title: "Storing & grouping by arbitrary populations" output: html_document: toc: true vignette: > %\VignetteIndexEntry{Population metadata} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, message=FALSE, echo=FALSE} logging::setLevel('WARN') library(unittest) # Redirect ok() output to stderr ok <- function(...) capture.output(unittest::ok(...), file = stderr()) library(mfdb) # Remove our attributes from a dataframe - only used for testing unattr <- function (obj) { attributes(obj) <- attributes(obj)[c('names', 'row.names', 'class')] obj } ``` The following examples use the following table_string helper to succintly define tables: ```{r} # Convert a string into a data.frame table_string <- function (text, ...) read.table( text = text, blank.lines.skip = TRUE, header = TRUE, stringsAsFactors = FALSE, ...) ``` Firstly, connect to a database and set up some areas/divisions: ```{r} mdb <- mfdb(tempfile(fileext = '.duckdb')) mfdb_import_area(mdb, table_string(' name division size 45G01 divA 10 45G02 divA 200 45G03 divB 400 ')) ``` ## Importing data Populations are arbitrary groupings for defining logical stocks that can't be derived from other MFDB data. As with other metadata, we have to import valid values before using: ```{r} mfdb_import_population_taxonomy(mdb, table_string(' name description t_group ns "Northern Shrimp" ns ns_s "Northern Shrimp in Skjalfandi" ns ns_a "Northern Shrimp in Arnarfjordur" ns ns_i "Northern Shrimp in Isafjardardjup" ns as "Aesop Shrimp" as as_s "Aesop Shrimp in Skjalfandi" as ')) ``` Notice that we have used the ``t_group`` column to define groupings of within our population groups. This means that the ``ns`` group will include all samples from ``ns``, ``ns_s`, ``ns_a``, ``ns_i``. Now we can import data that uses these groupings: ```{r} mfdb_import_survey(mdb, data_source = "x", table_string(" year month areacell species population length count 2019 1 45G01 PRA ns_s 10 285 2019 1 45G01 PRA ns_s 20 273 2019 1 45G01 PRA ns_a 10 299 2019 1 45G01 PRA ns_a 20 252 2019 1 45G01 PRA ns_i 10 193 2019 1 45G01 PRA ns_i 20 322 ")) ``` ## Querying data We can now use the ``mfdb_sample_*`` functions to select this data back out again. We can group and filter by any of the tow attributes. We can query for individual fjords as well as the whole group: ```{r} agg_data <- mfdb_sample_count(mdb, c('population', 'length'), list( population = mfdb_group(ns_s = 'ns_s', ns = 'ns'), length = mfdb_unaggregated())) agg_data ``` ```{r, message=FALSE, echo=FALSE} # Can get aggregate ns group as well as ns_s ok(ut_cmp_equal(unattr(agg_data[[1]]), table_string(' year step area population length number all all all ns 10 777 all all all ns 20 847 all all all ns_s 10 285 all all all ns_s 20 273 ')), "Group/filter by populations") ``` ```{r} mfdb_disconnect(mdb) ```