Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ palette = "0.7"
# Utilities
regex = "1.10"
chrono = "0.4"
rand = "0.8"
const_format = "0.2"
uuid = { version = "1.0", features = ["v4"] }

Expand Down
7 changes: 6 additions & 1 deletion doc/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,12 @@ website:
href: syntax/clause/label.qmd
- section: Layers
contents:
- auto: syntax/layer/*
- section: Types
contents:
- auto: syntax/layer/type/*
- section: Position adjustment
contents:
- auto: syntax/layer/position/*
- section: Scales
contents:
- section: Types
Expand Down
5 changes: 5 additions & 0 deletions doc/styles.scss
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ code {
font-variant-ligatures: none
}

// Add spacing below rendered plots so text doesn't crowd them
.cell-output-display {
margin-bottom: 1.5rem;
}

.hero-banner {
padding: 0;
margin: 0;
Expand Down
22 changes: 11 additions & 11 deletions doc/syntax/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ ggsql augments the standard SQL syntax with a number of new clauses to describe
## Layers
There are many different layers to choose from when visualising your data. Some are straightforward translations of your data into visual marks such as a point layer, while others perform more or less complicated calculations like e.g. the histogram layer. A layer is selected by providing the layer name after the `DRAW` clause

- [`point`](layer/point.qmd) is used to create a scatterplot layer
- [`line`](layer/line.qmd) is used to produce lineplots with the data sorted along the x axis
- [`path`](layer/path.qmd) is like `line` above but does not sort the data but plot it according to its own order
- [`area`](layer/area.qmd) is used to display series as an area chart.
- [`ribbon`](layer/ribbon.qmd) is used to display series extrema.
- [`polygon`](layer/polygon.qmd) is used to display arbitrary shapes as polygons.
- [`bar`](layer/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar
- [`density`](layer/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable
- [`violin`](layer/violin.qmd) displays a rotated kernel density estimate
- [`histogram`](layer/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it
- [`boxplot`](layer/boxplot.qmd) displays continuous variables as 5-number summaries
- [`point`](layer/type/point.qmd) is used to create a scatterplot layer
- [`line`](layer/type/line.qmd) is used to produce lineplots with the data sorted along the x axis
- [`path`](layer/type/path.qmd) is like `line` above but does not sort the data but plot it according to its own order
- [`area`](layer/type/area.qmd) is used to display series as an area chart.
- [`ribbon`](layer/type/ribbon.qmd) is used to display series extrema.
- [`polygon`](layer/type/polygon.qmd) is used to display arbitrary shapes as polygons.
- [`bar`](layer/type/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar
- [`density`](layer/type/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable
- [`violin`](layer/type/violin.qmd) displays a rotated kernel density estimate
- [`histogram`](layer/type/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it
- [`boxplot`](layer/type/boxplot.qmd) displays continuous variables as 5-number summaries

## Scales
A scale is responsible for translating a data value to an aesthetic literal, e.g. a specific color for the fill aesthetic, or a radius in points for the size aesthetic. A scale is a combination of a specific aesthetic and a scale type
Expand Down
46 changes: 46 additions & 0 deletions doc/syntax/layer/position/dodge.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Dodge
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING`subclause. Read the documentation for this clause for a thorough description of how to use it.

The dodge adjustment is intended to move entities that share the same position on a discrete scale side by side so they don't overlap. It is most often used for boxplots and violin plots but can also be used in e.g. bar plots as an alternative to [stacking](stack.qmd).

## Scale requirements
Dodge doesn't have specific requirements to the position scale type of the plot but will only affect discrete scales (including binned and ordinal). If only one scale is discrete the dodging happens in the scale direction. If both scales are discrete the dodging happens as a 2D grid.

## Settings
Apart from the settings of the layer type, setting `position => 'dodge'` will allow these additional settings:

* `width`: The total width the dodging will occupy as a proportion of the space available on the scale. Defaults to 0.9

## Examples

Dodging is default in boxplots (and violin plots)

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW boxplot
```

Turning it off allows you to see the effect of it

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW boxplot SETTING position => 'identity'
```

Dodge can be used for bar plots as an alternative to the default stack

```{ggsql}
VISUALISE species AS x, island AS fill FROM ggsql:penguins
DRAW bar SETTING position => 'dodge'
```

Often `width` is part of the layer settings and gets used directly by the dodge position, but for layers with no inherent width setting dodge provides that setting as well

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS shape FROM ggsql:penguins
DRAW point SETTING position => 'dodge', width => 0.5
```

7 changes: 7 additions & 0 deletions doc/syntax/layer/position/identity.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Identity
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING`subclause. Read the documentation for this clause for a thorough description of how to use it.

The identity position is a position adjustment that does nothing, i.e. it leaves the data where it is. It is used to turn off any position adjustments for layers that defaults to something else. It takes no arguments and has no requirements.
65 changes: 65 additions & 0 deletions doc/syntax/layer/position/jitter.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Jitter
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING`subclause. Read the documentation for this clause for a thorough description of how to use it.

Jitter adjustment adds a random offset to the data point to avoid overplotting on discrete axes. It is mainly used in conjunction with point layers.

## Scale requirements
Jitter requires at least one axis to be discrete as it only jitters along discrete axes. For the `'density'` and `'intensity'` distributions (see [settings](#settings)) the other axis *must be* continuous

## Settings
Apart from the settings of the layer type, setting `position => 'jitter'` will allow these additional settings:

* `width`: The total width the jittering will occupy as a proportion of the space available on the scale. Defaults to 0.9
* `dodge`: Should dodging be applied before jittering. The dodging behavior follows the [dodge position](dodge.qmd) behavior? Default to `true`
* `distribution`: Which kind of distribution should the jittering follow? One of:
- `'uniform'` (default): Jittering is sampled from a uniform distribution between `-width/2` and `width/2`
- `'normal'`: Jittering is sampled from a normal distribution with σ as `width/4` resulting in 95% of the points falling inside the given width
- `'density'`: Jittering follows the density distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with density remapped to offset
- `'intensity'`: Jittering follows the intensity distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with intensity remapped to offset
* `bandwidth`: A numerical value setting the smoothing bandwidth to use for the `'density'` and `'intensity'` distributions. If absent (default), the bandwidth will be computed using Silverman's rule of thumb.
* `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default.

## Examples
When plotting points on a discrete axis they are all placed in the middle

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
```

Use jittering to better see the individual points

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter'
```

By default, dodging is applied to separate the groups. Turn this off if you want the jitter to occupy the same space regardless of grouping

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter', dodge => false
```

Use a `'density'` distribution to also indicate the distribution shape with the jitter

```{ggsql}
VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
DRAW point
SETTING position => 'jitter', distribution => 'density'
```

When both axes are discrete the dodging follows a grid

```{ggsql}
VISUALISE species AS x, sex AS y, body_mass AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter'
SCALE BINNED fill
SETTING breaks => 4, pretty => false
```
61 changes: 61 additions & 0 deletions doc/syntax/layer/position/stack.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Stack
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING`subclause. Read the documentation for this clause for a thorough description of how to use it.
The stack position adjustment works by stacking objects on top of each other. It makes the most sense for layer types where their height is the primary encoding (i.e. they naturally extend from 0). Stack is the default position for bar and area plots

## Scale requirements
Stack requires a continuous position scale with a range mapping (e.g. either `y` + `yend` or `ymin` + `ymax`) and all ranges be positive with a baseline of zero. The axis that satisfies this will be used as the stacking direction

## Settings
Apart from the settings of the layer type, setting `position => 'stack'` will allow these additional settings:

* `center`: Should the full stack be centered around 0. Can be used in conjunction with area layers to create steamgraphs. Default to `false`
* `total`: Sets a value each stack height should be normalised to. Defaults to `null` (no normalisation)

## Examples

Stack is the default for bar and area

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
FILTER Day <= 30
SCALE ORDINAL fill
```

Turn it off to see the effect (stacking is nonsensical for wind measurements)

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING position => 'identity'
FILTER Day <= 30
SCALE ORDINAL fill
```

Set `center => true` to create a steamgraph

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING center => true
FILTER Day <= 30
SCALE ORDINAL fill
```

Use `total` to see the percentage contribution from each group

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING total => 100
FILTER Day <= 30
SCALE ORDINAL fill
```
25 changes: 14 additions & 11 deletions doc/syntax/layer/area.qmd → doc/syntax/layer/type/area.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Area"
---

> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

The area layer is used to display absolute amounts over a sorted x-axis. It can be seen as a [ribbon layer](ribbon.qmd) where the `ymin` is anchored at zero.

Expand All @@ -21,10 +21,7 @@ The following aesthetics are recognised by the area layer.
* `linewidth`: The width of the contour lines.

## Settings
* `stacking`: Determines how multiple groups are displayed. One of the following:
* `'off'`: The groups `y`-values are displayed as-is (default).
* `'on'`: The `y`-values are stacked per `x` position, accumulating over groups.
* `'fill'`: Like `'on'` but displayed as a fraction of the total per `x` position.
* `position`: Determines the position adjustment to use for the layer (default is `'stack'`)

## Data transformation
The area layer does not transform its data but passes it through unchanged.
Expand Down Expand Up @@ -56,17 +53,23 @@ VISUALISE Date AS x, Value AS y FROM long_airquality
DRAW area MAPPING Series AS colour
```

We can stack the series by using `stacking => 'on'`. The line serves as a reference for 'unstacked' data.
By default the areas are stacked on top of each other. If you'd rather see all with a 0 baseline set the position to identity

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING stacking => 'on', opacity => 0.5
DRAW line
DRAW area SETTING position => 'identity', opacity => 0.5
```

When `stacking => 'fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)
When `position => 'stack_fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING stacking => 'fill'
```
DRAW area SETTING position => 'fill'
```

An alternative is to center the stacks to create a steamgraph

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING position => 'stack', center => true
```
17 changes: 15 additions & 2 deletions doc/syntax/layer/bar.qmd → doc/syntax/layer/type/bar.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Bar"
---

> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

The bar layer is used to create bar plots. You can either specify the height of the bars directly or let the layer calculate it either as the count of records within the same group or as a weighted sum of the records.

Expand All @@ -23,7 +23,7 @@ The bar layer has no required aesthetics
* `linetype`: The type of stroke, i.e. the dashing pattern

## Settings

* `position`: Determines the position adjustment to use for the layer (default is `'stack'`)
* `width`: The width of the bars as a proportion of the available width

## Data transformation
Expand Down Expand Up @@ -68,6 +68,15 @@ DRAW bar
MAPPING species AS x, island AS fill
```

Or change the position setting to e.g. get a dodged bar chart

```{ggsql}
VISUALISE FROM ggsql:penguins
DRAW bar
MAPPING species AS x, sex AS fill
SETTING position => 'dodge'
```

Map to y if the dataset already contains the value you want to show

```{ggsql}
Expand All @@ -87,3 +96,7 @@ DRAW bar
SCALE BINNED x
SETTING breaks => 10
```

And use with a polar coordinate system to create a pie chart

**TBD**
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Boxplot"
---
> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

Boxplots display a summary of a continuous distribution. In the style of Tukey, it displays the median, two hinges and two whiskers as well as outlying points.

Expand All @@ -23,6 +23,7 @@ The following aesthetics are recognised by the boxplot layer.
* `shape` The shape of outlier points.

## Settings
* `position`: Determines the position adjustment to use for the layer (default is `'dodge'`)
* `outliers`: Whether to display outliers as points. Defaults to `true`.
* `coef`: A number indicating the length of the whiskers as a multiple of the interquartile range (IQR). Defaults to `1.5`.
* `width`: Relative width of the boxes. Defaults to `0.9`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Density"
---

> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

Visualise the distribution of a single continuous variable by computing a kernel density estimate. It has a similar interpretation as a histogram but smoothing out observations rather than binning them.

Expand All @@ -21,10 +21,7 @@ The following aesthetics are recognised by the density layer.
* `linetype` The dash pattern of the contour line.

## Settings
* `stacking`: Determines how multiple groups are displayed. One of the following:
* `'off'`: The groups `y`-values are displayed as-is (default).
* `'on'`: The `y`-values are stacked per `x` position, accumulating over groups.
* `'fill'`: Like `'on'` but displayed as a fraction of the total per `x` position.
* `position`: Determines the position adjustment to use for the layer (default is `'identity'`)
* `bandwidth`: A numerical value setting the smoothing bandwidth to use. If absent (default), the bandwidth will be computed using Silverman's rule of thumb.
* `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default.
* `kernel`: Determines the smoothing kernel shape. Can be one of the following:
Expand Down Expand Up @@ -87,7 +84,7 @@ Stacking the different groups instead of overlaying them.

```{ggsql}
VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW density SETTING stacking => 'on'
DRAW density SETTING position => 'stack'
```

Using weighted estimates by mapping a column to the optional weight aesthetic. Note that the difference in output is subtle.
Expand Down
Loading
Loading