The presented visualization project aims at the a spatio-temporal visualization of a dataset in the field of digital humanities. The data is part and result of an ongoing research project in literary and publicistic history. The research project focuses on women authors with migration background in the Austro-Hungarian Monarchy, with the main period of interest being the turn of the 19th/20th century. The project focuses extensively on the collection, analysis and publication of biographical and bibliographical informations on a substantial number of researched women authors and publicists, providing access to the results though a website of the University of Vienna. The data to be visualized is comprised of migrations, which the authors have undertaken in their lives.
The project is a real-world application example. It is the ´result of a development workflow with the following key aspects:
Considering all these factors, the following documentation provides insight into the structure of the application, detailing its key components, ad well as explanations for the design choices in accordance to these guidelines.
The first challenge in visualizing the data is the fact that location information is only present in the form of names of cities, regions, states and sometimes only continents. Since the project is managed by a team of two people on a very tight schedule, there are no resources available to research, retrieve and store the geolocations manually. For this purpose, a separate program has been created to look up geolocations online based on the location names stored in the database. The scraper can be found in the geoLocationScraper.php file.
The process of scraping is as follows:
Tests showed that this scraping method has a high rate of success in retrieving correct geolocation data for the places named in the database. (over 90%) However, there are still locations for which retrieval fails in one of two ways:
Case 1. is automatically handled by the visualization program and thus only leads to a minimal, temporary loss of information until the corresponding data can be updated by hand, if possible. For the second case however, project administrator action is required. To aid this process, the geolocation table of the database will contain a separate variable for administrator approval. Since the automatic lookup success rate is high, all retrieved entries are approved by default. If visual inspection shows wrong locations, administrators can quickly hide a location from visualization and have a marker for later manual data update once resources are available.
The main visualization program works as follows:
1. Content management and backend administration functionalities are implemented through a Joomla CMS deployment. The visualization is integrated into a separate Joomla frontend template created for this task. The visualization is accessible from the main menu, the template is only switched from the default site template to the visualization template on this subpage.
2. The database query is constructed through the query functions provided by the Joomla CMS [cite Joomla Database query]. To keep in line with the details explained above concerning very limited project administrator resources, the query incorporates a table join operation on a varchar field, the location name. This way, administrator never have to concern themselves with inputting numerical geolocation data while defining migration routes and do not need to keep managing a separate data table for location information; in the normal work flow, locations can be recorded as names in simple text fields. Each migration is stored in the database as a person connected to two places (start and destination of the migration) and additional informations (year, name variants of locations, etc.). This table is joined with the author information and twice with the geolocation data (once for starting location, once for the destination).
3. The query results are processed to fit the algorithms detailed in Andrienko[2010]. The migration data is extended with fields keeping track of the grid cell the migration is assigned to in the aggregation phase. The grid is constructed and the centroid calculated. After this setup, the aggregation takes place. The process is only partially adapted, with the following changes:
4. The javascript fragments containing the polylines depicting migration routes and additional cartographic elements.
5. The main javascript creating and configuring the embedded map is hardcoded in the index page. The map is configured to have a historic-map-like aesthetic and a large number of default map elements are removed to further this impression as well as to give a cleaner look.
6. At the time of the submission, the user interface is very minimalistic and has only three buttons.
For testing purpouses, one can manually change the grid resolution parameter to see the changes in aggregation.
Since the project is an interplay of all the programming languages and frameworks/libraries detailed above, the generated documentation is unfortunately more than lackluster. Despite this however, the code itself is commented rigorously and should provide the insight into the inner workings of the program. But, to counter this fact, this html documentation has been adjusted in size to contain background information about the conception, planning and implementation of the project.
The project utilizes the following tools and libraries: