Lookup Transform#
The Lookup transform extends a primary data source by looking up values from
another data source; it is similar to a one-sided join. A lookup can be added
at the top level of a chart using the Chart.transform_lookup()
method.
By way of example, imagine you have two sources of data that you would like
to combine and plot: one is a list of names of people along with their height
and weight, and the other is some information about which groups they belong
to. This example data is available in vega_datasets
:
from vega_datasets import data
people = data.lookup_people()
groups = data.lookup_groups()
We know how to visualize each of these datasets separately; for example:
import altair as alt
top = alt.Chart(people).mark_square(size=200).encode(
x=alt.X('age:Q', scale=alt.Scale(zero=False)),
y=alt.Y('height:Q', scale=alt.Scale(zero=False)),
color='name:N',
tooltip='name:N'
).properties(
width=400, height=200
)
bottom = alt.Chart(groups).mark_rect().encode(
x='person:N',
y='group:O'
).properties(
width=400, height=100
)
alt.vconcat(top, bottom)
If we would like to plot features that reference both datasets (for example, the
average age within each group), we need to combine the two datasets.
This can be done either as a data preprocessing step, using tools available
in Pandas, or as part of the visualization using a LookupTransform
in Altair.
Combining Datasets with pandas.merge#
Pandas provides a wide range of tools for merging and joining datasets; see Merge, Join, and Concatenate for some detailed examples. For the above data, we can merge the data and create a combined chart as follows:
import pandas as pd
merged = pd.merge(groups, people, how='left',
left_on='person', right_on='name')
alt.Chart(merged).mark_bar().encode(
x='mean(age):Q',
y='group:O'
)
We specify a left join, meaning that for each entry of the “person” column in the groups, we seek the “name” column in people and add the entry to the data. From this, we can easily create a bar chart representing the mean age in each group.
Combining Datasets with a Lookup Transform#
For some data sources (e.g. data available at a URL, or data that is streaming),
it is desirable to have a means of joining data without having to download
it for pre-processing in Pandas.
This is where Altair’s transform_lookup()
comes in.
To reproduce the above combined plot by combining datasets within the
chart specification itself, we can do the following:
alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age', 'height'])
)
Here lookup
names the field in the groups dataset on which we will match,
and the from_
argument specifies a LookupData
structure where
we supply the second dataset, the lookup key, and the fields we would like to
extract.
Example: Lookup Transforms for Geographical Visualization#
Lookup transforms are often particularly important for geographic visualization, where it is common to combine tabular datasets with datasets that specify geographic boundaries to be visualized; for example, here is a visualization of unemployment rates per county in the US:
import altair as alt
from vega_datasets import data
counties = alt.topo_feature(data.us_10m.url, 'counties')
unemp_data = data.unemployment.url
alt.Chart(counties).mark_geoshape().encode(
color='rate:Q'
).transform_lookup(
lookup='id',
from_=alt.LookupData(unemp_data, 'id', ['rate'])
).properties(
projection={'type': 'albersUsa'},
width=500, height=300
)
Transform Options#
The transform_lookup()
method is built on the LookupTransform
class, which has the following options:
Property |
Type |
Description |
---|---|---|
as |
The output fields on which to store the looked up data values. For data lookups, this property may be left blank if For selection lookups, this property is optional: if unspecified, looked up values will be stored under a property named for the selection; and if specified, it must correspond to |
|
default |
any |
The default value to use if lookup fails. Default value: |
from |
anyOf( |
Data source or selection for secondary data reference. |
lookup |
|
Key in primary data source. |