Sample Use-Cases¶

Biological niche-finding¶

Note

The first use-case concerns niche-finding, i.e. the problem of finding species’ bioclimatic envelope, an important task in biology.

The video tutorial provides a step by step use-case with this data.

Data¶

The Bio data describes spatial areas of Europe, 2575 squares of side roughly 50 kilometers. The left hand side data contains information about the mammals that live in these areas denoted with the names of the respective species. The minimum, maximum and average monthly temperatures in degrees Celsius as well as average monthly precipitation in millimeters, make up the right hand side data, denoted as +x, t-x, t=xand p=x, respectively, where x is the number of the month. The data comes from two publicly available datasets: European Mammals atlas and Worldclim climate data.

Exploration¶

1. The first tab contains a list of entities. Expanding columns of interest reveals the actual values of the variables.
1. Next comes the list of left-hand side variables and right-hand side variables.
1. Let’s consider the Moose, plot this variable in a map to create a new redescription.
1. Then, we can let the algorithm find best expansions.
1. After a few seconds, we get a number of expansions and can sort them by accuracy or let the algorithm process them to select the top non-redundant ones, here we obtain one.
1. Let’s have a look at this redescription about the Moose, simultaneously plotting it in a Multidimensional Scaling projection and on a map.
1. We can select a dot to highlight the entities. The views are linked together so the entity will be highlighted across all views for that specific redescription.
1. To select a set of points lying in some area, we can draw an enclosing polygon. Similarly, when we click Edit ‣ (De)select polygon (or press the corresponding key) to select the dots inside the polygon, they will be highlighted across the different views.
1. With this selected entities we can recompute the left-hand side query while adding weight on this area, to try removing it from E10. To do this, we clear the left-hand side query, press ENTER to update the redescription then choose Process ‣ Overfit LHS.
1. And we automatically obtain a new left-hand side query involving Mountain Hare, removing this region from the support.

US census and election funds¶

Note

We now present a short sample use-case with US census and election data.

Data¶

The US data describes the counties of continental United-States. The left hand side data contains socio-economic statistical indicators about these areas, see details here.

The right hand side consists of data about funding of the electoral campaigns in 2006, 2008 and 2010, the total funds, percentage allotted to republican and to democratic party, respectively.

The data has been gathered from two public websites: FedStats and Open Secrets. Logarithmic transformation was applied to variables having a log-normal distribution to obtain a better dispersion.

Exploration¶

1. Again, the first tab contains a list of entities...
1. ... and the following two the list of left-hand side variables and right-hand side variables, respectively.
1. Here, the Redescriptions tab contains redescriptions mined previously and loaded from a file.
1. Now, we will recompute these redescription over a subset of the entities, say, excluding the east side of the US. To do so we plot a redescription with a variable set to its full range to see all entities on the map and draw a polygon enclosing the area we wish to exclude.
1. Next we click Edit ‣ (De)select polygon (or press the corresponding key) to select the entities inside the polygon.
1. Then clicking Edit ‣ (Dis)able selected (or press the corresponding key) to disable the entities. This automatically recomputes all redescriptions restricted to the remaining entities (observe the updated statistics in the list and the map).

DBLP Computer Science Bibliography¶

Note

In this third example, we consider non-geospatial data, namely from the DBLP Computer Science Bibliography.

Data¶

This dataset comes from the DBLP data base of computer science bibliography. Here the entities are researchers, the left-hand side data indicates major CS conferences in which they have published while the right-hand side data contains co-authorship information.

Unlike in the previous examples, the entities in this data are not associated to goegraphic coordinates, hence this data is not geospatial and the redescriptions cannot be plotted on maps. However, the other visualizations can be used for exploration.

Exploration¶

1. Once more, we can consult the list of entities in the first tab.
1. Similarly, left-hand side variables and right-hand side variables are listed in the next two tabs.
1. Redescriptions are listed in the fourth tab.
1. We can visualize the second redescription as a parallel coordinates plot. Using a slider, we can adjust the level of details, i.e. the amount of entities drawn.
1. We can also visualize it under a couple of projections.
1. These various views are linked so that when we select a dot or a subset of dots the will be highlighted across the views.
1. We can edit the redescription directly in the parallel coordinate, modifying the bounds of the literal by dragging the bottom and top sides of the gray interval boxes.
1. Let’s now build another redescription by selecting a pair of conferences and setting their intervals to generate a new left-hand side query.
1. Now we let the algorithm find matching queries for the right-hand side and obtain, for instance, the following redescription.

Finnish 2011 parliamentary elections (Try it out!)¶

Note

In this use case we consider the case of exploring open data about the candidates to a national election using redescription mining. The prepared dataset is available here.

Data¶

The data was collected from www.vaalikone.fi, the election engine of the Finnish newspaper Helsingin Sanomat and made publicly available. One view contains candidate personal profile attributes, such as party, age, and education, while the answers provided to 30 multiple-choice questions and assigned importance form the other view.

Exploration¶

1. Download the data as .siren file ( as well as the english translation of the Q&A).
2. Start siren and open the data via File ‣ Open.
3. Explore the premined redescriptions, write and mine new ones, and learn about the finnish political scene...
1. For instance, duplicate redescription R7 [Left click on the redescription] ‣ Duplicate, open R7 and the copy side-by-side in Parallel Coordinates windows [Left click on the redescription] ‣ Parallel Coordinates and remove the term about the candidate’s age from the left-hand side query and see how it affects the accuracy of the redescription...