WordStream-Extension

Take your analysis to the next level

Learn More

Description

The WordStream Sentiment Analysis Visualization is an extension of the original WordStream tool, developed as part of the Visualization 2 course project. This project introduces a new tab to WordStream dedicated to sentiment analysis, which combines word cloud visualization with temporal analysis. The extension supports multiple datasets, interactive features, and customizable parameters to analyze and explore the evolution of sentiment over time.

This extension enhances WordStream’s capabilities by incorporating sentiment analysis, allowing users to:

Paper

"WordStream" by Dang et al. (2019) presents an innovative interactive visualization tool for analyzing and illustrating the evolution of topics over time. It synthesizes two popular techniques, word clouds and stacked graphs, to create a hybrid visualization method that provides both temporal and spatial insights into text data. The tool is evaluated on datasets like political blogs, news articles, and academic publications.

The key contributions of the "WordStream" paper include the development of a hybrid visualization method that combines word clouds and stacked graphs. Word clouds represent important terms with varying font sizes based on frequency or significance, while stacked graphs depict temporal trends of topics, with stream layers representing the evolution of topic significance over time. The integration of word clouds within stream layers optimizes space usage and visually links terms to their corresponding time periods.

The design and implementation of the tool were carried out as an interactive prototype using D3.js, enabling users to explore topic trends dynamically. A space-sharing approach was introduced to maximize term placement efficiency while preserving the temporal context, and the tool allows customization of visual settings such as font scaling, number of displayed terms, and layout dimensions. The algorithms include a spiral placement algorithm for terms within stream layers, ensuring compactness and collision avoidance, with terms arranged to reflect their temporal context and stream orientation, providing an intuitive flow.

The evaluation of the tool involved quantitative metrics, such as compactness (coverage efficiency of terms within layers), to assess layout quality across datasets, and qualitative feedback from informal studies with domain experts, highlighting the tool’s usability for longitudinal trend analysis and its limitations in handling highly cluttered streams or showing term relationships explicitly.

Implementation

The implementation of the WordStream-Extension project involved several key components, including data acquisition and preprocessing, extending the existing WordStream visualization, and enhancing interactivity using D3.js.

Data Acquisition and Preprocessing

We integrated three new datasets into WordStream: Rotten Tomatoes movie reviews, CNN news articles, and Reddit posts from the /datasets subreddit. The preprocessing pipeline for these datasets included:

Other datasets containing data such as social media posts and fact-check articles were added. However, due to reasons that we were not able to understand, these datasets proved to be difficult to visualize in the WordStream. In order not to waste the effort invested in these datasets, they are only visualizible in the SentimentCloud and SentimentStream tabs.

Extending the WordStream Visualization

Building upon the original WordStream tool, we introduced two new visualization tabs: SentimentCloud and SentimentStream.

Enhancing Interactivity with D3.js

To improve user engagement and interactivity, we leveraged D3.js to implement dynamic sliders - interactive sliders for sentiment thresholds and word ranking allow real-time updates to the visualization based on user input. And diverging color schemes - transitioned from categorical to diverging color schemes to better represent the spectrum of sentiments.

For explanation of the classes and function we created and used, see Code Documentation.

Program

Running the Application

To run the WordStream-Extension application on the web, click here. Note that this version will not display some of the data, due to GitHub's file size limit, in order to visualize WordStream-Extension correctly follow the steps described bellow.

Ensure you have Python installed on your machine. Then, open the command line, navigate to the folder where WordStream-Extension is located, and execute the following command:

python -m http.server 8000

This command starts a simple HTTP server on port 8000. Navigate to http://localhost:8000 in your web browser to access the application.

Additional Datasets

We expanded the original WordStream with several new datasets to enhance analysis capabilities:

Datasets GIF

Each dataset underwent extensive preprocessing, including data cleaning, keyword extraction with SpaCy, sentiment analysis using VADER, and aggregation by year to facilitate temporal visualization.

SentimentCloud Tab

The SentimentCloud tab offers an interactive word cloud that visualizes sentiment scores:

SentimentCloud GIF

SentimentStream Tab

The SentimentStream tab integrates temporal analysis with sentiment visualization:

Sentiment scores are visually represented using a diverging color scheme for clarity:

Stream GIF

References

Dang, T., Nguyen, H. N., & Pham, V. (2019). WordStream: Interactive Visualization for Topic Evolution. In J. Johansson, F. Sadlo, & G. E. Marai (Eds.), EuroVis 2019 - Short Papers. The Eurographics Association. https://doi.org/10.2312/evs.20191178

iDataVisualizationLab. (2019). WordStream [Source code]. GitHub. https://github.com/iDataVisualizationLab/WordStream

Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu, H., & Tong, X. (2011). TextFlow: Towards Better Understanding of Evolving Topics in Text. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2412-2421. https://doi.org/10.1109/TVCG.2011.239

Liu, S., Zhou, M. X., Pan, S., Qian, W., Cai, W., & Lian, X. (2009). Interactive, topic-based visual text summarization and analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (pp. 543-552). Association for Computing Machinery. https://doi.org/10.1145/1645953.1646023

Wang, X., Liu, S., Chen, Y., Peng, T.-Q., Su, J., Yang, J., & Guo, B. (2016). How ideas flow across multiple social groups. In 2016 IEEE Conference on Visual Analytics Science and Technology (VAST) (pp. 51-60). IEEE. https://doi.org/10.1109/VAST.2016.7883511

Rotten Tomatoes Movies and Critic Reviews Dataset

CNN Articles After Basic Cleaning

The Reddit Dataset

Social Media Sentiments Analysis Dataset

Fake and Real News Dataset