Featured Project: Netflix Exploratory Data Analysis (EDA)

This project investigates patterns in Netflix's catalog to understand what factors play a critical role in Netflix's decision to produce or acquire certain movies or tv-shows. The three factors that will be focused on are: release periods, genres, and maturity ratings. By examining these together, we can discover how Netflix's production strategy and audience targeting shape its media library.

Sample code

#change date to consistent date format dataset['date_added']=pd.to_datetime(dataset['date_added'], errors='coerce') dataset['date_added']=dataset['date_added'].dt.strftime('%Y-%m-%d') #update table with new dates dataset.to_sql('db_flix',connection,if_exists='replace',index=False) print("Table updated successfully")

Sample Graph

query=""" SELECT strftime('%Y',date_added) AS year, type, listed_in FROM db_flix WHERE type LIKE 'tv show' """ result=pd.read_sql(query,connection) #call separate_rows method result=separate_rows(result,'listed_in') #count and group by year year_genre_count = result.value_counts(['year','listed_in']).reset_index(name='total') #sort the values on x-axis year_genre_count = year_genre_count.sort_values(by='year',ascending=True) #use seaborn to create multi-bar graph plt.figure(figsize=(12,5)) graph_bar=sns.barplot(data=year_genre_count, x='year', y='total', hue='listed_in',palette='tab20') plt.title('Figure 3.2:\nNetflix Yearly Content Distribution by Genre (TV)') plt.xlabel('Year') plt.ylabel('Total') plt.legend(title='Genres', fontsize=8) plt.tight_layout() plt.show()

This EDA uncovered trends that provided a starting point for deeper reasearch into timing, genre patterns, and target audience strategies. Below you will find two possible future endevours to expand on this analysis:

  • One pathway of expansion, can be to examine how regional preferences can influence Netflix's decision making around content acquisition/production.
  • Another direction, would be to study the gap between a title's original release date and the date it was added to Netflix, which could reveal any patterns in important timing decisions.

More information on my GitHub