The World's Most Powerful Rocket

Image credit: SpaceX

For Now…

SpaceX launched Falcon Heavy this week and I remembered how Elon Musk noted that it would have twice the thrust of any rocket currently in existence. I was intrigued by this statement and decided to look further and compare the thrusts of other rockets of the past and rockets that are planned in the future.

First, I used the rvest package to scrape the data then get the information in a dataframe using the tidyverse. Next I used stringr to clean up the strings in the dataframe and plotted it using ggplot (tidyverse) as well as cowplot and RColorBrewer to improve the look of the plot.

library(rvest)
library(stringr)
library(cowplot)
library(tidyverse)
library(RColorBrewer)
theme_set(theme_cowplot())

One of the first steps is to find a website that has the information of interest, in my case I found an article on CNN.com and then I used the selector gadget tool to determine the correct xpath to get the information I wanted. Since the information I wanted wasn’t in a table format I had a rather long xpath.

site <- read_html("http://money.cnn.com/2018/02/06/technology/future/biggest-rockets-falcon-heavy-comparison/index.html")

data <- '//p[(((count(preceding-sibling::*) + 1) = 67) and parent::*)] | 
        //p[(((count(preceding-sibling::*) + 1) = 66) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 65) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 64) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 59) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 60) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 58) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 57) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 51) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 50) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 49) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 48) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 43) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 42) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 41) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 40) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 35) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 34) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 33) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 32) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 26) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 25) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 24) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 23) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 17) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 16) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 15) and parent::*)] |
        //p[(((count(preceding-sibling::*) + 1) = 14) and parent::*)] |
        //*[contains(concat( " ", @class, " " ), concat( " ", "inStoryHeading", " " ))]'

Next I used rvest to read scrape the information extract the text and make it into a tibble. The tibble wasn’t in the proper format so I had to do some data wrangling to tidy it up so I used mutate to create another column so I could extract out the rocket names and make another dataframe and then perform a join with the original dataset.

table <- site %>% 
  html_nodes(xpath = data) %>% 
  html_text() %>% 
  tibble() %>% 
  mutate(ID = seq(1,5,1) %>% rep(7) %>% as.factor())
table
## # A tibble: 35 x 2
##    .                                                                       ID   
##    <chr>                                                                   <fct>
##  1 Falcon Heavy                                                            1    
##  2 " Status: First test flight took place February 6 "                     2    
##  3 " Height: 229.6 feet (70 meters) "                                      3    
##  4 " Liftoff thrust: 5 million pounds "                                    4    
##  5 " Capability: 140,660 pounds (63,800 kilograms) to LEO "                5    
##  6 Space Launch System                                                     1    
##  7 " Status: No earlier than late 2019 "                                   2    
##  8 " Height: 322 - 365 feet (98.1 - 111.3 meters) "                        3    
##  9 " Liftoff thrust: up to 11.9 million pounds (5 million kg) "            4    
## 10 " Capability: 150,000 - 290,000 pounds (70,000 - 130,000 kilograms) to… 5    
## # … with 25 more rows

After the rocket names were isolated I needed to repeat them multiple times to match with the ‘table’ dataframe.

names <- table %>%
  filter(ID == 1) %>% 
  slice(rep(1:7, each=4)) # Repeats each row  4 times
names
## # A tibble: 28 x 2
##    .                   ID   
##    <chr>               <fct>
##  1 Falcon Heavy        1    
##  2 Falcon Heavy        1    
##  3 Falcon Heavy        1    
##  4 Falcon Heavy        1    
##  5 Space Launch System 1    
##  6 Space Launch System 1    
##  7 Space Launch System 1    
##  8 Space Launch System 1    
##  9 "Saturn V "         1    
## 10 "Saturn V "         1    
## # … with 18 more rows

The next step is to combine the ‘names’ dataframe with the original ‘table’ and then clean up the column names using rename. One of the trickiest parts of data wrangling is string manipulations and regular expressions (regex) which can be hard to grasp at first. Typically when you scrape data, you’ll inevitably get more information than you want or need so we can use regex to extract the specific information that we want. In this case, I used str_replace_all to find a specific pattern and replace it with nothing so I just have the infromation that I want (numerical values in my case).

df <- table %>%
  filter(!ID == 1) %>% 
  bind_cols(names) %>%
  rename(Data = ".", Rocket = ".1") %>% 
  separate(Data , into = c("Feature","Value"), sep =":") %>% 
  select(Rocket, Feature, Value) %>% 
  mutate(Value = str_replace_all(Value, "\\s*\\([^\\)]+\\)|[:alpha:]|\\s*",""), # regex
         Feature = str_trim(Feature))
df
## # A tibble: 28 x 3
##    Rocket              Feature        Value          
##    <chr>               <chr>          <chr>          
##  1 Falcon Heavy        Status         6              
##  2 Falcon Heavy        Height         229.6          
##  3 Falcon Heavy        Liftoff thrust 5              
##  4 Falcon Heavy        Capability     140,660        
##  5 Space Launch System Status         2019           
##  6 Space Launch System Height         322-365        
##  7 Space Launch System Liftoff thrust 11.9           
##  8 Space Launch System Capability     150,000-290,000
##  9 "Saturn V "         Status         1973           
## 10 "Saturn V "         Height         363            
## # … with 18 more rows

Now that we have a tidy dataframe we can graph it by filtering the liftoff thrust data and then changing the Value column type to numeric and make sure we reorder the Rockets by value with the fct_reorder function. I manually specified the y axis scale and added the YlOrRd pallette to exentuate the thrust of the various rockets.

thrust_plot <- df %>% 
  filter(Feature == "Liftoff thrust") %>% 
  mutate(Value = Value %>% as.numeric(),
         Rocket = fct_reorder(Rocket,Value)) %>% 
  ggplot(aes(x = Rocket, y = Value, fill = Rocket )) +
  geom_col()+
  coord_flip() +
  scale_y_continuous(expand= c(0,0), limits = c(0,12),breaks = seq(0,12,2))+
  guides(fill = FALSE) +
  scale_fill_brewer(palette="YlOrRd") + 
  labs(title = "Rocket Thrust",
       x = "",
       y = "Thrust (Millions of pounds)")

thrust_plot

We can see that Falcon Heavy may very well be the currently world’s most powerful rocket but we see that the decommissioned Saturn V, Space Shuttle had more thrust and the soon to be built Space Launch System will have the most thrust by far. Although Falcon Heavy isn’t the most poweful rocket of all time it is definitely one of the coolest, just take a look at the booster rockets landing simultaneously after launch!

Falcon Heavy Booster Landing

Falcon Heavy Booster Landing

Avatar
Sean Nguyen
Incoming Data Science Fellow