Description
Timeseries data represents values that have been recorded over time. In most of the cases, timeseries data is generated by sensors (e.g., temperature, wind speed, pressure) that record measurements over time. Timeseries datasets can become large very quickly, if a lot of sensors record a lot of information over a long period of time. The visualization of big timeseries data (i.e., GBs and TBs of data) is still an unsolved problem when working with large sensor datasets. In this master's thesis, we want to explore the usability of different existing techniques and methods for working with large timeseries data. This includes methods for aggregation, but also existing libraries (e.g., PlotlyResampler, Vaex), and research prototypes (e.g., Mosaic). It is not yet known which methods, libraries, and techniques perform best for exploratory analysis.
Tasks
The task of this master's thesis are the following:
- Research and creation of a long list on existing libraries, tools, and research prototypes.
- Creation of a short list of tools that should be tested.
- Survey of tools and evaluation of performance under different constraints and with different datasets.
- Summary of the results.
- Development of guidelines and best-practices for working with big timeseries data.
Requirements
- Knowledge of scripting languages and web based APIs.
- Knowledge about scientific writing.
- Basic knowledge about data visualization.
Environment
The project results will be a summary of the evaluation of different tools.