Unlock the potential of sentiment analysis using Snowflake-Snowpark-Python and Amazon Beauty product reviews. By combining Snowpark's processing capabilities with ThoughtSpot’s AI-Powered Analytics, organizations can make complex data science insights readily available to their business users.
With this code, we apply Amazon Beauty product review data to perform sentiment analysis, process data with Snowpark Python, and visualize results via ThoughtSpot.
Download and install VSCode, then follow the installation guides for Mac, Linux, or Windows.
Open VSCode, click on the Extensions icon, search, and install ‘Jupyter’ and ‘Python’.
To perform sentiment analysis, ensure you install the following libraries via the terminal using the 'pip install
Create a new .json file with your Snowflake account details for connection.
{
"account" : "ADD YOUR SNOWFLAKE ACCOUNT NAME ",
"user" : "***********",
"password" : "***********",
"role" : "USER_ROLE",
"warehouse" : "WAREHOUSE_NAME",
"database" : "DBNAME",
"schema" : "PUBLIC"
}
Replace the placeholder AWS keys with actual keys in the script to read the S3 bucket JSON data.
def read_json(bucket:str,filename:str)-> T.Variant:
import boto3
import json
import pandas as pd
The Amazon review dataset consists of two files: AllBeauty.json, metaAll_Beauty.json. Scripts are provided to read and process the AWS data.
Employ the scripts for data cleaning and merging, creating a dataset ready for analysis.
Using helper functions, calculate Vader sentiment scores. The cleaned data, with sentiment scores included, is then added to a Snowflake table named ‘BEAUTY_PRODUCT_REVIEWS’
for further analysis.
Apply the text2emotion library to determine the emotions expressed in each review. The results are stored in a Snowflake table named ‘EMOTIONS_OVERALL’
.
Categorize products and perform an analysis based on these categories. Results can be stored and analyzed in another Snowflake table ‘Category Analysis’ In the provided code, we extract only the necessary columns from the refined dataset. Notably, we apply a filter to obtain data from the top 10 product categories, focusing on those with substantial review counts.
Leverage sentiment scores to classify products as recommended or not based on sentiment expressed in reviews. We establish a classification threshold of 0.75 to determine whether a product should be categorized as recommended or not. Store the recommendations in a Snowflake table named ‘RECOMMENDATIONS’ for easy retrieval and analysis.
Examining sentiment trends over time involves tracking how sentiments expressed in reviews change across different periods. This analysis helps uncover patterns and fluctuations in sentiments, offering valuable insights into customer opinions and reactions. Store this data in the ‘SENTIMENT_TRENDS’ Snowflake table.
After connecting to Snowflake, ThoughtSpot’s AI-powered Analytics allow anyone to ask and answer questions from our Snowpark’s machine-learning results. That’s Snowpark for ThoughtSpot, bridging the gap between data science and business outcomes.