The Business of Show Business: A Deep Dive into the Film Industry

Fatima W. | 10/18/2024

Theatre in London, Photo by Rawpixel under Creative Commons

What comes to mind when you’re picking a movie to watch? Do you look for a certain genre, actors/actresses, or directors/producers? There’s hundreds of thousands of movies that exist, and there’s thousands more to be made. One important fact about the entertainment industry is that its consumers have an insatiable demand for content – but not just any content. Surely, there are thousands of lesser known films/television shows that flopped or didn’t receive the recognition it may have deserved. At the same time, there are hundreds of movies deemed successful in their own ways, whether it’s due to earned box office sales, critic reviews, award nominations, or average rating. With that being said, what determines the success of a movie? The motivation of this project is to not only understand what makes a film successful, but to develop a non-financial success metric to be used in deciding what kind of movie should be remade, and how. So grab some popcorn and get comfy, you’re in for a treat!

The Data Explained

To begin the analysis, we’ll be using a collection of non-commercial datasets from IMDb, the Internet Movie Database that provides information on millions of films/television shows. These databases include information about movies/TV series, the cast and crew members, average ratings submitted by users, number of people who have rated these films, and more. More detailed information can be found here, however I’ve included a mapped graphic to help visualize the breakdown of each dataset I’ll be using in conjunction with one another. This mapping will prove to be useful to refer to when analyzing the coding portions in this analysis.

Visual Mapping of Data Sets
Click to view code
# load in all our data and packages

library(dplyr)
library(ggplot2)
library(stringr)
library(DT)
library(tidyr)

get_imdb_file <- function(fname){
  BASE_URL <- "https://datasets.imdbws.com/"
  fname_ext <- paste0(fname, ".tsv.gz")
  if(!file.exists(fname_ext)){
    FILE_URL <- paste0(BASE_URL, fname_ext)
    download.file(FILE_URL, 
                  destfile = fname_ext)
  }
  as.data.frame(readr::read_tsv(fname_ext, lazy=FALSE))
}
Click to view code
NAME_BASICS      <- get_imdb_file("name.basics")

TITLE_BASICS     <- get_imdb_file("title.basics")

TITLE_EPISODES   <- get_imdb_file("title.episode")

TITLE_RATINGS    <- get_imdb_file("title.ratings")

TITLE_CREW       <- get_imdb_file("title.crew")

TITLE_PRINCIPALS <- get_imdb_file("title.principals")

Cleaning the Data for Usability

Because the data sets are so large, we’ll restrict our attention to people with at least two “known for” credits within the NAME_BASICS table. Since there are also a long list of obscure records, we’ll filter out and remove titles with less than 100 ratings. We can see that these titles make up about 75% of the entire data set. The same filtering will be performed on our other data sets.

Click to view code
NAME_BASICS <- NAME_BASICS |> 
  filter(str_count(knownForTitles, ",") > 1)

TITLE_RATINGS |>
    pull(numVotes) |>
    quantile()
     0%     25%     50%     75%    100% 
      5      11      26     101 2950150 
Click to view code
TITLE_RATINGS <- TITLE_RATINGS |>
  filter(numVotes >= 100)

TITLE_BASICS <- TITLE_BASICS |>
  semi_join(TITLE_RATINGS, 
            join_by(tconst == tconst))

TITLE_CREW <- TITLE_CREW |>
  semi_join(TITLE_RATINGS, 
            join_by(tconst == tconst))

TITLE_EPISODES_1 <- TITLE_EPISODES |>
  semi_join(TITLE_RATINGS, 
            join_by(tconst == tconst))
TITLE_EPISODES_2 <- TITLE_EPISODES |>
  semi_join(TITLE_RATINGS, 
            join_by(parentTconst == tconst))

TITLE_EPISODES <- bind_rows(TITLE_EPISODES_1,
                            TITLE_EPISODES_2) |>
  distinct()

TITLE_PRINCIPALS <- TITLE_PRINCIPALS |>
  semi_join(TITLE_RATINGS, join_by(tconst == tconst))

rm(TITLE_EPISODES_1)
rm(TITLE_EPISODES_2)

After extracting our data, our first task is to ensure each variable is properly assigned to its data type or mode. Most columns in these datasets are read in as character vectors, however some should be classified as numeric or logical. So we’ll clean these columns in each table.

Click to view code
NAME_BASICS <- NAME_BASICS |>
  mutate(birthYear = as.numeric(birthYear),
         deathYear = as.numeric(deathYear))

TITLE_BASICS <- TITLE_BASICS |>
  mutate(isAdult = as.logical(isAdult),
         startYear = as.numeric(startYear),
         endYear = as.numeric(endYear),
         runtimeMinutes = as.numeric(runtimeMinutes))

TITLE_EPISODES <- TITLE_EPISODES |>
  mutate(seasonNumber = as.numeric(seasonNumber),
         episodeNumber = as.numeric(episodeNumber))

TITLE_RATINGS <- TITLE_RATINGS |>
  mutate(averageRating = as.numeric(averageRating),
         numVotes = as.numeric(numVotes))

TITLE_CREW <- TITLE_CREW |>
  mutate(tconst = as.character(tconst),
         directors = as.character(directors))

TITLE_PRINCIPALS <- TITLE_PRINCIPALS |>
  mutate(job = as.character(job),
         characters = as.character(characters))

An Exploratory Analysis of the TV Production & Film Industry

Now that we’ve gathered and cleaned our data, we’ll begin analyzing and uncovering insights. Firstly, let’s find out how many movies, TV series, and TV episodes we have present in our data set. Shown below, there are roughly just over 130K movies, 155K TV episodes, and nearly 30K TV series.

Click to view code
title_counts <- TITLE_BASICS |>
    group_by(titleType) |>
  summarize(count = n()) |>
  filter(titleType %in% c('movie', 'tvSeries', 'tvEpisode'))

colnames(title_counts) <- c('Title Type', 'Number of Movies')
title_counts |>
  DT::datatable()

With this many titles across various types of title types, I wanted to know who is the oldest living person in our data set and what their profession is. One important thing to consider, however, is within the NAME_BASICS data set, there is a double meaning to the NA values in the deathYear column. These NA values may indicate that the year of death is “unknown” or “still alive/not dead yet.” According to the Guinness World Records, the oldest person alive in the world is Tomiko Itooka, who was born in 1908. Therefore, to approach this finding, we’ll filter out people who were born in 1908 onward and NA values. As a result, the oldest living person in our data set is Angel Acciaresi, who was an assistant director, director, and writer. Seeing this person’s age, it’s a nice reminder and interesting to think about how long the entertainment industry has existed and how far it’s come.

Click to view code
oldest_person <- NAME_BASICS |>
  filter(is.na(deathYear), birthYear >= 1908) |>
  mutate(age = 2024 - birthYear) |>
  arrange(desc(age)) |>
  select(primaryName, birthYear, primaryProfession, age) |>
  slice(1)

colnames(oldest_person) <- c('Name', 'Year of Birth', 'Profession', 'Age')
oldest_person |>
  DT::datatable()

TV Series Observations

I’m interested in uncovering some insights about TV series productions and their ratings given a baseline of at least 200,000 IMDb ratings. There exists one TV Episode in this data set that fits this criteria, that is, the episode Ozymandias from the American crime drama television series, Breaking Bad. In fact, this episode is ranked the number one “Best TV Episodes” for its perfect 10/10 rating.

Click to view code
perfect_tv_ep <- TITLE_BASICS |>
  filter(titleType == 'tvEpisode') |>
  left_join(TITLE_RATINGS, by = 'tconst') |>
  filter(averageRating == 10, numVotes >= 200000) |>
  select(primaryTitle, genres, averageRating, numVotes)

colnames(perfect_tv_ep) <- c('TV Episode Title', 'Genres', 'Avg. Rating', 'Number of Votes')
perfect_tv_ep |>
  DT::datatable()

I also wanted to know which TV series has the highest average rating. To answer this, I’ll be setting the benchmark to series with more than 12 episodes, and I’ll use the average rating of the TV series as a whole, rather than averaging the sum of each TV series’ episodes’ ratings. As a result, the highest average rating of a TV series is 9.71, which belongs to “The Youth Memories.”

Of all the TV series that exist across cultures worldwide in multiple languages and genres, a Chinese romance drama ranked the highest average rating. What made this specific drama stand out? Could it be due to its cultural appeal – for example, are Chinese dramas more successful/popular than Turkish dramas? Or maybe it’s the series’ romance elements that attract viewers to its show. As someone who occasionally watches C-dramas, I think this finding is interesting and important to consider as we deepen our analysis going further about what makes a movie/series successful.

Specific Movie Observations

Next, I wanted to take a look at well known actors/actresses and the projects they’re known for. More specifically, I’ll take a look at American actor Mark Hamill and the average ratings of movies he took part in. Unsurprisingly, Hamill is known for his role in the Star Wars original and sequel trilogies. However, take a look at the average ratings and the ranking of each sequel. Normally, some may assume that movie sequels are particularly bad, which is, of course, subjective and opinion-based. Keeping this in mind, we see that’s not exactly the case where Star Wars: Episode V - The Empire Strikes Back receives a higher ranking than Star Wars: Episode IV - A New Hope, which was released three years prior.

Click to view code
mark_hamill_top4_projects <- NAME_BASICS |>
  filter(primaryName == 'Mark Hamill') |>
  separate_longer_delim(knownForTitles, ',') |>
  rename(tconst = knownForTitles) |>
  left_join(TITLE_BASICS, by = "tconst") |> 
  left_join(TITLE_RATINGS, by = "tconst") |> 
  arrange(desc(averageRating), desc(numVotes)) |>
  slice_head(n = 4) |>
  select(primaryTitle, startYear, averageRating, numVotes)

colnames(mark_hamill_top4_projects) <- c('Movie Title', 'Year of Release', 'Avg. Rating', 'Number of Votes')
mark_hamill_top4_projects |>
  DT::datatable()

The Rise & Fall of TV Series: Happy Days

Have you ever heard of the phrase “jump the shark”? This is a common idiom used to describe a moment where a once-great show becomes ridiculous and rapidly loses watchers due to its quality. This idiom actually originated from a 1974 American sitcom that ran for 11 seasons, called “Happy Days.” In season 5 episode 3, one of the show’s characters, Fonzie (Henry Winkler) takes on the challenge to prove his bravery by water-skiing over a confined shark in the water. As the series continued on with their seasons, watchers grew tired of the show and mentioned that it was this season’s episode where the entire series began to go downhill, hence the phrase, “jump the shark.” The reason why I bring up this point is to see how the show performed before and after this season’s episode. More specifically, is it true that episodes from later seasons of Happy Days have lower average ratings than the early seasons?

Because there are 11 seasons, we’ll determine seasons 1 through 5 to be “early seasons” and seasons 6 through 11 to be “later seasons.” As shown below, the series indeed had a higher average rating in earlier seasons than that of later seasons.

Click to view code
# Is it true that episodes from later seasons of Happy Days have lower average ratings than the early seasons?

happydays <- TITLE_BASICS |>
  filter(primaryTitle == 'Happy Days', titleType == 'tvSeries') |>
  select(startYear, endYear, tconst)
# Happy Days tconst = tt0070992

happydays_getavg <- TITLE_EPISODES |>
  inner_join(happydays, join_by(parentTconst == tconst)) |>
  left_join(TITLE_RATINGS, join_by(tconst == tconst)) |>
  select(seasonNumber, episodeNumber, averageRating)

happydays_earlyavg <- happydays_getavg |>
  filter(seasonNumber <= 5) |>
  summarize(avg_rating1 = round(mean(averageRating, na.rm = TRUE), 2))

happydays_lateravg <- happydays_getavg |>
  filter(seasonNumber > 5) |>
  summarize(avg_rating2 = round(mean(averageRating, na.rm = TRUE), 2))
  
happydays_avg <- cbind(happydays_earlyavg, happydays_lateravg)
colnames(happydays_avg) <- c('Avg Rating Seasons 1-5', 'Avg Rating Seasons 6-11')

happydays_avg |>
  DT::datatable()

Quantifying Success – Development of a Success Metric

Now that we’ve explored some of our data, our main goal is to propose new movies deemed to be successful. In order to do that, however, we need to come up with a way of measuring the success of a movie given our non-financial data sets, or in other words, IMDb ratings and votes. And while there is no right way to measure success, we’ll assume that successful projects will have both a high average IMDb rating and a large number of ratings, which would indicate quality and broad awareness, respectively.

I had a few approaches in my development of a success metric. Initially, I thought about adding the averageRating with the log of numVotes , where taking the log of vote count will help compress large values. However, this seemed too simple of a calculation to me because I felt that there needed to be some kind of weight added to each factor. I thought about weighing the average rating and number of votes equally, shown below:

Click to view code
title_ratings_test2 <- TITLE_RATINGS |>
  mutate(success_score = round((averageRating * 0.50)  + (log(numVotes) * 0.50), 2)) |>
  arrange(desc(success_score)) |>
  head(10)

title_ratings_test2 |>
  DT::datatable()

This method may have been satisfactory, however I still felt like there is a better way to develop a success metric because I assumed that average rating and number of votes should be weighed equally. Although our data sets contain all votes for each title, I felt that not all votes have the same impact/weight on the final rating. During this process, I wanted to remain mindful of the possibility that people will normally rate and vote on titles that they have strong feelings for, whether they are good or bad. For example, a movie may have been so horrible for someone that they went out of their way to submit a low rating, whereas someone who may have felt indifferent to a movie didn’t bother to submit a rating at all. Had they submitted one though, maybe it would’ve been an average rating like 5/10. I gave one more try where I used the weighted average method and decided to use a benchmark of 10,000 votes as the minimum number of votes that makes a movie successful. To put it into a simple mathematical formula:

Using weighted average to find the success score

Success Score = [(R * v) + (C * m)] / (v + m)

where

R = average rating of each title

v = number of votes of each title

C = the mean rating of all titles

m = minimum number of votes that considers a movie to be successful (10,000 votes)

This ended up being my chosen method, where I also used log in calculating the number of votes.

Click to view code
min_numvotes <- 10000
mean_rating_alltitles <- mean(TITLE_RATINGS$averageRating)

TITLE_RATINGS_FINAL <- TITLE_RATINGS |>
  left_join(TITLE_BASICS, by = 'tconst') |>
  filter(titleType == 'movie') |>
  mutate(success_score = round((averageRating * (log(numVotes))) + 
                            (min_numvotes * mean_rating_alltitles) / ((log(numVotes)) + min_numvotes), 2))
TITLE_RATINGS_FINAL |>
  DT::datatable()

Putting Our Success Metric to Test

Now that the success metric is chosen, we’ll perform data validation by answering a series of questions. First, we’ll choose the top 5-10 movies based on the success metric and confirm that they were indeed box office successes.

Click to view code
top_10movies <- TITLE_RATINGS_FINAL |>
  arrange(desc(success_score)) |>
  select(primaryTitle, averageRating, numVotes, success_score, startYear) |>
  head(10)

colnames(top_10movies) <- c('Movie Title', 'Avg Rating', 'Number of Votes', 'Success Score', 'Year Released')
top_10movies |>
  DT::datatable()

The majority of the movies listed above are indeed box office successes, however there is one exception, that is, The Shawshank Redemption. This movie was considered a box office flop when it was released in 1944, where its initial box office earned only $16 million. Despite this disappointment, the movie performed successfully by shipping VHS rental copies across the U.S., obtaining a much larger audience than it ever did in its box office opening. Although its success wasn’t immediate, word of mouth spread as the movie became a beloved and popular film.

Continuing with our data validation, we’ll now choose 3 to 5 movies with large numbers of IMDb votes yet low success scores and confirm that they are indeed of low quality. Shown below, our results are indeed low quality.

Click to view code
poor_performance_movies <- TITLE_RATINGS_FINAL |>
  arrange(success_score, desc(numVotes)) |>
  select(primaryTitle, averageRating, numVotes, success_score, startYear) |>
  head(5)

colnames(poor_performance_movies) <- c('Movie Title', 'Avg Rating', 'Number of Votes', 'Success Score', 'Year Released')
poor_performance_movies |>
  DT::datatable()

Additionally, we’ll choose a prestigious actor director and confirm that they have many projects with high scores based on our success metric. For this, I’ll test actress Meryl Streep. Shown below, we can see Streep’s various works, many of which performed well with high success scores.

Click to view code
streep_projects <- NAME_BASICS |>
  filter(primaryName == 'Meryl Streep') |>
  left_join(TITLE_PRINCIPALS, by = c('nconst' = 'nconst')) |>
  left_join(TITLE_RATINGS_FINAL, by = c('tconst' = 'tconst')) |>
  select(tconst, primaryTitle, success_score, averageRating) |>
  drop_na() |>
  arrange(desc(success_score)) |>
  select(primaryTitle, averageRating, success_score)

colnames(streep_projects) <- c('Movie Title', 'Avg Rating', 'Success Score')
streep_projects |>
  DT::datatable()

And in the spirit of Halloween and all things spooky, I’ll test 5 horror movies that have won Oscars in the past as my next “spot check” validation. The Oscars are widely considered to be one of the most prestigious awards in the film industry, so we should expect to see these award-winning horror films to have high success scores. Familiar titles shown below prove this to be true.

Click to view code
horror_films <- c('Alien', 'The Exorcist', 'Sleepy Hollow', 'The Silence of the Lambs', 'Beetlejuice')
  
horror_awards <- TITLE_RATINGS_FINAL |>
  filter(primaryTitle %in% horror_films) |>
  select(primaryTitle, averageRating, numVotes, success_score, startYear)

colnames(horror_awards) = c('Movie Title', 'Average Rating', "Number of Votes", 'Success Score', 'Year of Release')
horror_awards |>
  DT::datatable()

Lastly, we’ll come up with a numerical threshold for a project deemed to be successful, which will be the 70th percentile. In other words, movies with success scores higher than 70% of all success score values are determined to be “solid” or better.

Click to view code
threshold <- quantile(TITLE_RATINGS_FINAL$success_score, 0.7, na.rm = TRUE)

Solid_Movies <- TITLE_RATINGS_FINAL |>
  filter(success_score > quantile(TITLE_RATINGS_FINAL$success_score, 0.7, na.rm = TRUE)) |>
  select(primaryTitle, averageRating, numVotes, success_score, startYear, genres)

Solid_Movies |>
  DT::datatable()

Examining Success by Genre & Decade

Now that we have a working proxy for success, it’s time to look at trends in success over time. To begin, let’s take a look at the genre with the most successes in each decade. Shown below, we can see that drama earns the most successes. Because many titles are recorded to have multiple genres, it’s very likely that there is more to be said about these drama titles. For example, dramas that also contain romance or action.

Click to view code
successful_genre <- Solid_Movies |>
  separate_longer_delim(genres, ',') |>
  mutate(decade = (floor(startYear / 10)) * 10) |>
  group_by(decade, genres) |>
  summarize(movie_count = n()) |>
  ungroup() |>
  group_by(decade) |>
  slice_max(movie_count, n = 1) |>
  ungroup()

colnames(successful_genre) = c('Decade', 'Genre', 'Movie Count')
successful_genre |>
  DT::datatable()
Click to view code
successful_genre_plot <- ggplot(successful_genre, aes(x = Decade, y = `Movie Count`, fill = Genre)) +
  geom_col() +
  xlab('Decade') +
  ylab('Number of Movies') +
  theme_bw() +
  scale_fill_brewer(type = 'qual', palette=4) +
  ggtitle('Most Successful Movie Genre by Decade')
  
successful_genre_plot

To dig deeper into these genres, let’s take a look at genres that consistently have the most successes over time. Because IMDb lists 31 movie genres in total, we’ll just look at the top 10 since it’ll be difficult to grasp everything visually. Unsurprisingly, drama is a prevalent genre that continues to thrive throughout the decades, however it’s also worth mentioning that comedy and action genres have also been growing successful. We’ll keep this information in mind as we progress further.

Click to view code
genre_success_by_decade <- Solid_Movies |>
  separate_longer_delim(genres, ',') |>
  mutate(decade = (floor(startYear / 10)) * 10) |>
  group_by(decade, genres) |>
  summarize(movie_count = n()) |>
  ungroup() |>
  group_by(decade) |>
  slice_max(movie_count, n = 10, with_ties = FALSE) |>
  mutate(movies_cumulative = cumsum(movie_count)) |>
  ungroup() |>
  arrange(decade, movies_cumulative)

# It's easier to visualize this output below:

genre_success_by_decade_plot <- ggplot(genre_success_by_decade, aes(x = decade, 
                                y = movie_count, color = genres)) +
  geom_point() +
  xlab('Decade') +
  ylab('Number of Movies') +
  geom_line() +
  theme_bw() +
  scale_color_brewer(type = 'qual', palette = 2) +
  ggtitle('Successful Movie Genres by Decade Accumulated Over Time')

genre_success_by_decade_plot

In more recent years since 2010, we can see the genres shown below with the most successes. Again, drama proves itself to be the leading genre with most successes. Yet it’s crucial to consider that it’s possible these genres only have a large number of successes because there are many productions in that genre. In other words, drama may be an over saturated genre in the entertainment industry.

Click to view code
successes_2010 <- Solid_Movies |>
  separate_longer_delim(genres, ',') |>
  mutate(decade = (floor(startYear / 10)) * 10) |>
  filter(decade >= 2010) |>
  group_by(genres) |>
  summarize(movie_count = n()) |>
  slice_max(movie_count, n = 10) |>
  arrange(desc(movie_count))

successes_2010_plot <- ggplot(successes_2010, aes(x = genres, y = movie_count, fill = genres)) +
  geom_col() +
  xlab('Genre') +
  ylab('Number of Movies') +
  theme_bw() +
  scale_fill_brewer(type = 'qual', palette=8) + 
  ggtitle('Count of Successful Movies by Genre Since 2010')

successes_2010_plot

As we’ve mentioned earlier, many of these drama titles are intertwined with many other genres such as action, adventure, comedy, and sci-fi. We can see this in popular movies released in 2010 onward as most titles fall under more than just one category. Keeping this collection of observations in mind, I’ve decided to select action intertwined with sci-fi and adventure for my upcoming movie project remake.

Click to view code
genres_recent_years <- Solid_Movies |>
  filter(startYear >= 2010) |>
  arrange(desc(startYear), desc(success_score)) |>
  select(primaryTitle, startYear, genres, success_score)

colnames(genres_recent_years) <- c('Movie Title', 'Year of Release', 'Genres', 'Success Score')
genres_recent_years |>
  DT::datatable(caption = 'Genres of Successful Movies Made in 2010 & Onward')

Finding Successful Personnel in the Genres

In producing my project remake, I’ve decided to work with director David Leitch, who is an American filmmaker that frequently works in action, stunts, and other genres. Take a look at some of his projects and their success scores. His strong background in action films allowed him to direct successful action-packed movies such as Atomic Blonde and Deadpool 2, all of which include elaborate fight scenes, shootouts, and car chases. Surely, David Leitch proves to be a competent, skilled director in this genre.

Click to view code
leitch <- NAME_BASICS |>
  filter(primaryName == 'David Leitch')

leitch_movies <- TITLE_PRINCIPALS |>
  filter(nconst == 'nm0500610') |>
  left_join(TITLE_RATINGS, join_by(tconst == tconst)) |>
  left_join(TITLE_BASICS, join_by(tconst == tconst)) |>
  filter(category == 'director' & titleType == 'movie') |>
  arrange(desc(averageRating)) |>
  select(primaryTitle)

leitch_movies_final <- leitch_movies |>
  left_join(Solid_Movies, join_by(primaryTitle == primaryTitle)) |>
  arrange(desc(success_score))

leitch_movies_final_plot <- ggplot(leitch_movies_final, aes(x = primaryTitle, y = success_score)) +
  geom_col(fill = 'darkseagreen3') +
  xlab('Movie Title') +
  ylab('Success Score') +
  theme_bw() +
  ggtitle("David Leitch's Directed Movies With Success Scores")

leitch_movies_final_plot

In addition to David Leitch, I think Ryan Reynolds and Chris Pratt would be great actors to anchor this project of mine. Similar to Leitch, both Reynolds and Pratt have played major roles in successful movies such as Deadpool and Deadpool 2, and Guardians of the Galaxy, respectively; all of which are classified as action, adventure, and comedy. Not only this, but Reynolds has also worked with Leitch in the past in their making of Deadpool 2, making this a great opportunity to balance action with a little bit of humor.

Click to view code
actor_nconst <- NAME_BASICS |>
  filter(primaryName == 'Ryan Reynolds' | primaryName == 'Chris Pratt') |>
  select(nconst, primaryName, knownForTitles) |>
  slice_head(n = 2) |>
  separate_longer_delim(knownForTitles, ',') |>
  left_join(TITLE_BASICS, join_by(knownForTitles == tconst))

actor_movies <- actor_nconst |>
  left_join(Solid_Movies, join_by(primaryTitle == primaryTitle)) |>
  arrange(desc(success_score)) |>
  select(primaryName, primaryTitle, startYear.x, genres.x, success_score)

colnames(actor_movies) <- c('Actor', 'Movie Title', 'Year of Release', 'Genres', 'Success Score')

actor_movies |>
  DT::datatable()

Nostalgia & Remakes

For my project, I’m choosing to remake the 1981, action/sci-fi film, Escape from New York. With an average rating of 7.1 and roughly over 160K votes, I believe this movie would be a great opportunity to remake given that it has not been remade yet.

Click to view code
escape_ny <- TITLE_BASICS |>
  filter(primaryTitle == 'Escape from New York') |>
  left_join(TITLE_RATINGS, join_by(tconst == tconst)) |>
  select(primaryTitle, startYear, averageRating, numVotes) |>
  slice(1)

colnames(escape_ny) <- c('Movie Title', 'Year of Release', 'Average Rating', 'Number of Votes')
escape_ny |>
  DT::datatable()

Of course, however, there are quite a few legal matters at hand. A select few actors, directors, and writers are still alive since this movie was made just over 40 years ago. This would prompt me to contact our legal department to ensure our chances of securing the rights to our project. Now, it’s finally time to piece everything together.

Click to view code
# Derive tconst of movie Escape from New York
escape_ny_tconst <- TITLE_BASICS |>
  filter(primaryTitle == 'Escape from New York') |>
  select(tconst) |>
  slice(1)

# tconst = tt0082340

escape_ny_members <- TITLE_PRINCIPALS |>
  filter(tconst == 'tt0082340') |>
  left_join(NAME_BASICS, join_by(nconst == nconst)) |>
  select(primaryName, birthYear, deathYear, primaryProfession)

colnames(escape_ny_members) <- c('Name', 'Birth Year', 'Death Year (leave blank if not applicable)', 'Profession(s)')
escape_ny_members |>
  DT::datatable()

Escape from New York: Remaking of a Classic

From director David Leitch, the visionary mind behind Bullet Train and Atomic Blonde2…

And from Ryan Reynolds, highly praised star of Deadpool 2…

And from Chris Pratt, Hollywood icon of action-adventure hit Guardians of the Galaxy…

Comes the timeless tail, Escape from New York…

A journey of survival, betrayal, and redemption…

One man, one mission, one chance…to survive a world in chaos

Escape from New York. Coming soon to a theater near you.

We propose a remake of the classic action-packed film, Escape from New York, starring Ryan Reynolds as Snake Plissken and Chris Pratt as Romero. With David Leitch directing the film, we firmly believe this remake will attract and satisfy a large audience while producing great returns. The analysis above explains how drama titles have successfully won large audiences over the years, however, this genre has potential to become oversaturated, if not already. We want to attract and appeal to watchers who enjoy action, adventure, and slight comedy especially given that the market for action films has been increasing in recent decades. In fact, action movies make up roughly 35% of movies made since 2010 that are considered successful. David Leitch has an excellent background in directing films in these genres, with over 80% of movies he’s directed being successes and massive hits.

Similarly, Ryan Reynolds and Chris Pratt have proven their capabilities in main leading roles in movies they’ve worked in recently, which are also great successes. Reynolds, who has played a major role in the successful action/adventure film Deadpool 2, will bring his wittiness and charm in playing the role of Snake Plissken. He’ll work alongside Chris Pratt, known for his lead role in the highly rated movie/series, Guardians of the Galaxy, making their combined talents a dynamic power duo.

As we continue to live in a society where many of us feel trapped as political and social issues continue on with no end in sight, a remake of Escape from New York is an opportunity for us to portray resonating messages with those who feel similarly. We truly believe that with riveting action, iconic actors, and powerful messages, our modern take has much potential and is primed for success.