---
title: "Storing & grouping by arbitrary populations"
output:
  html_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{Population metadata}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, message=FALSE, echo=FALSE}
logging::setLevel('WARN')
library(unittest)
# Redirect ok() output to stderr
ok <- function(...) capture.output(unittest::ok(...), file = stderr())
library(mfdb)

# Remove our attributes from a dataframe - only used for testing
unattr <- function (obj) {
    attributes(obj) <- attributes(obj)[c('names', 'row.names', 'class')]
    obj
}
```

The following examples use the following table_string helper to succintly
define tables:

```{r}
# Convert a string into a data.frame
table_string <- function (text, ...) read.table(
    text = text,
    blank.lines.skip = TRUE,
    header = TRUE,
    stringsAsFactors = FALSE,
    ...)
```

Firstly, connect to a database and set up some areas/divisions:

```{r}
mdb <- mfdb(tempfile(fileext = '.duckdb'))
mfdb_import_area(mdb, table_string('
name  division size
45G01     divA   10
45G02     divA  200
45G03     divB  400
'))
```

## Importing data

Populations are arbitrary groupings for defining logical stocks that can't be
derived from other MFDB data.

As with other metadata, we have to import valid values before using:

```{r}
mfdb_import_population_taxonomy(mdb, table_string('
name 	description 				t_group
ns	"Northern Shrimp"			ns
ns_s	"Northern Shrimp in Skjalfandi"		ns
ns_a	"Northern Shrimp in Arnarfjordur"	ns
ns_i	"Northern Shrimp in Isafjardardjup"	ns
as	"Aesop Shrimp"				as
as_s	"Aesop Shrimp in Skjalfandi"		as
'))
```

Notice that we have used the ``t_group`` column to define groupings of within
our population groups. This means that the ``ns`` group will include all
samples from ``ns``, ``ns_s`, ``ns_a``, ``ns_i``.

Now we can import data that uses these groupings:

```{r}
mfdb_import_survey(mdb, data_source = "x",
table_string("
year    month   areacell   species population length  count
2019    1       45G01      PRA     ns_s    	10      285
2019    1       45G01      PRA     ns_s    	20      273

2019    1       45G01      PRA     ns_a    	10      299
2019    1       45G01      PRA     ns_a    	20      252

2019    1       45G01      PRA     ns_i    	10      193
2019    1       45G01      PRA     ns_i    	20      322
"))
```

## Querying data

We can now use the ``mfdb_sample_*`` functions to select this data back out
again. We can group and filter by any of the tow attributes. We can query for
individual fjords as well as the whole group:

```{r}
agg_data <- mfdb_sample_count(mdb, c('population', 'length'), list(
        population = mfdb_group(ns_s = 'ns_s', ns = 'ns'),
        length = mfdb_unaggregated()))
agg_data
```
```{r, message=FALSE, echo=FALSE}
# Can get aggregate ns group as well as ns_s
ok(ut_cmp_equal(unattr(agg_data[[1]]), table_string('
year step area population length number
all  all  all         ns     10    777
all  all  all         ns     20    847
all  all  all       ns_s     10    285
all  all  all       ns_s     20    273
    ')), "Group/filter by populations")
```

```{r}
mfdb_disconnect(mdb)
```