Data Modeling with Cassandra
Sparkify Data Modeling
A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming application. The analytics team is particularly interested in understanding what songs users are listening to. Currently, they don't have an easy way to query their data, which resides in a directory of JSON logs on user activity on the application, as well as a directory with JSON meta-data on the songs in their application.
They'd like a data engineer to create a Apache Cassandra database which can create queries on song play data to answer the questions and make meaningful insights. The role of this project is to create a database schema and ETL pipeline for this analysis.
We will model the data with Apache Cassandra and build an ETL pipeline using Python. The ETL pipeline transfers data from a set of CSV files within a directory to create a streamlined CSV file to model and insert data into Apache Cassandra tables. We will create separate denormalized tables for answering specific queries, properly using partition keys and clustering columns.
Event Dataset
Event dataset is a collection of CSV files containing the information of user activity across a period of time. Each file in the dataset contains the information regarding the song played, user information and other attributes .
List of available data columns :
artist, auth, firstName, gender, itemInSession, lastName, length, level, location, method, page, registration, sessionId, song, status, ts, userId
Follow this link for more information.
Create a Data Model for an Email System
Follow this link for more information.
Create a Data Model for a Digital Music Library
Follow this link for more information.
Create a Data Model for Temperature Monitoring Sensor Networks
Follow this link for more information.
Create a Data Model for Investment Accounts or Portfolios
Follow this link for more information.
Create a Data Model for Online Shopping Carts
Follow this link for more information.
Hotel Reservations Data Modeling
Read the Description here.
Follow this link for more information.