library(ggplot2)
library(dplyr)
library(patchwork)
theme_set(theme_void())
In this Tidy Tuesday project, we delve into the fascinating world of solar eclipses that occurred in the US in 2023 and will occur in 2024. This analysis uses datasets provided for the 2023 annular eclipse, known for its “ring of fire” effect, and the upcoming 2024 total eclipse that will completely obscure the sun.
Environment Setup
The first step in our analysis involves setting up the R environment and loading necessary libraries like ggplot2
for plotting, dplyr
for data manipulation, and patchwork
for arranging plots. We also apply a minimalistic theme to our plots for a cleaner presentation.
To enhance the interpretability of our visualizations, we craft a descriptive plot caption and a comprehensive plot description that will serve as a subtitle. These elements are designed to provide context to our plots, explaining the significance of the data points and the story they tell about the solar eclipses.
<- paste("Source: NASA's Scientific Visualization Studio |",
plot_caption "Graphic: Hanzholah Shobri")
<-
plot_desc paste("Celestial Spectacles Over the Americas: On October 14, 2023, an",
"annular solar eclipse created a 'ring of fire' visible across the",
"Americas. This phenomenon was followed by a total solar eclipse on",
"April 8, the following year, during which the Sun was entirely",
"obscured. Here, the graphs illustrate the eclipse durations",
"and its starting times per state, with special emphasis on specific",
"cities where each type of eclipse can be observed.") |>
::str_wrap(width = 80) stringr
Data Preparation
Our analysis requires detailed geographic and eclipse data. We start by loading a dataset containing U.S. state mappings, which will help in plotting the state-wise distribution of eclipse data. This is followed by merging the geographic data with the eclipse data to ensure each plot is accurately annotated with state and eclipse information.
<- c(
year_labels "2023" = "2023 Annular Eclipse",
"2024" = "2024 Total Eclipse"
)
# Load US states map
<-
us_states ::read_csv("us-states.csv") |>
readrmutate(ID = stringr::str_to_lower(State)) |>
select(ID, Abbreviation)
<-
us_maps ::st_as_sf(maps::map("state", plot = FALSE, fill = TRUE)) |>
sfleft_join(us_states, by = join_by(ID))
# Load eclipse data focusing on the US mainland
<- tidytuesdayR::tt_load('2024-04-09')
tuesdata
<-
eclipse_annular_2023 $eclipse_annular_2023 |>
tuesdatafilter(state %in% us_maps$Abbreviation) |>
mutate(year = 2023, type = "annular", label = year_labels["2023"])
<-
eclipse_total_2024 $eclipse_total_2024 |>
tuesdatafilter(state %in% us_maps$Abbreviation) |>
mutate(year = 2024, type = "total", label = year_labels["2024"])
<-
eclipse_partial_2023 $eclipse_partial_2023 |>
tuesdatafilter(state %in% us_maps$Abbreviation) |>
mutate(year = 2023, type = "partial", label = year_labels["2023"])
<-
eclipse_partial_2024 $eclipse_partial_2024 |>
tuesdatafilter(state %in% us_maps$Abbreviation) |>
mutate(year = 2024, type = "partial", label = year_labels["2024"])
Before creating our visualizations, we need to prepare the data by calculating the average duration of the eclipses for each state. The provided R code segment does this by first determining the duration of each eclipse in 2023 and 2024, converting these times from seconds to minutes. It then combines these individual records and calculates an average duration for each state. This step ensures that our plots will represent a clear and averaged view of how long the eclipses lasted across different regions.
# Extract eclipse durations
<-
duration_2023 bind_rows(
|>
eclipse_annular_2023 mutate(duration = as.numeric(eclipse_6 - eclipse_1) / 60),
|>
eclipse_partial_2023 mutate(duration = as.numeric(eclipse_5 - eclipse_1) / 60)
|>
) summarise(duration = mean(duration), .by = c(state))
<-
duration_2024 bind_rows(mutate(eclipse_total_2024,
duration = as.numeric(eclipse_6 - eclipse_1) / 60),
mutate(eclipse_partial_2024,
duration = as.numeric(eclipse_5 - eclipse_1) / 60)) |>
summarise(duration = mean(duration), .by = c(state))
Now, let’s take a look on our 2023 eclipse duration data. We now have a table with two columns: state
and duration
. The first column refers to the USA states abbreviation code, and the second specify how long an eclipse can be observed given the state, on average.
glimpse(duration_2023)
Rows: 49
Columns: 2
$ state <chr> "AZ", "CA", "CO", "NV", "NM", "OR", "TX", "UT", "AL", "AR", "…
$ duration <dbl> 170.9693, 159.3336, 172.1362, 162.2005, 175.9719, 154.4370, 1…
Following the calculation of the average durations of the eclipses per state, the next step involves extracting detailed information about the start times of these celestial events for both 2023 and 2024. For each year, it combines data from different types of eclipses (annular and partial for 2023, total and partial for 2024). It then converts the starting time of each eclipse into a standard date-time format.
# Extract eclipse starting time
<-
eclipse_start_2023 bind_rows(eclipse_annular_2023, eclipse_partial_2023) |>
mutate(year = 2023,
label = year_labels["2023"],
eclipse_1 = as.numeric(eclipse_1),
start_time = lubridate::as_datetime(eclipse_1),
type = factor(type, levels = c("annular", "partial"))) |>
select(type, state, name, lat, lon, start_time)
<-
eclipse_start_2024 bind_rows(eclipse_total_2024, eclipse_partial_2024) |>
mutate(year = 2024,
label = year_labels["2024"],
eclipse_1 = as.numeric(eclipse_1),
start_time = lubridate::as_datetime(eclipse_1),
type = factor(type, levels = c("total", "partial"))) |>
select(type, state, name, lat, lon, start_time)
Here is the eclipse starting time data. The table contains information when an eclipse can be observed in a given state with the details of the type of the eclipse and the latitude and longitude coordinates of the observation.
glimpse(eclipse_start_2023)
Rows: 31,364
Columns: 6
$ type <fct> annular, annular, annular, annular, annular, annular, annul…
$ state <chr> "AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ",…
$ name <chr> "Chilchinbito", "Chinle", "Del Muerto", "Dennehotso", "Fort…
$ lat <dbl> 36.49200, 36.15115, 36.18739, 36.82900, 35.74750, 36.71717,…
$ lon <dbl> -110.0492, -109.5787, -109.4359, -109.8757, -109.0680, -110…
$ start_time <dttm> 1970-01-01 15:10:50, 1970-01-01 15:11:10, 1970-01-01 15:11…
Plotting
The core of our analysis is the creation of visualizations that illustrate the impact of the eclipses. We design a function to plot eclipse data for a given year, which includes configuring the aesthetic elements such as colors and markers to differentiate between annular and total eclipses.
<- function(us_maps, duration_data, eclipse_start_data, title) {
plot_us_eclipse <- MetBrewer::met.brewer("Renoir")
pal
# add duration info into maps data
$duration <- pull(duration_data, duration, state)[us_maps$Abbreviation]
us_maps
# generate graphs
ggplot(us_maps) +
# plot US map
geom_sf(aes(fill = duration), color = NA, alpha = 0.3) +
geom_sf(color = "#aaaaaa", fill = NA, linewidth = 0.2) +
scale_fill_viridis_c(option = "C", direction = -1, guide = "none") +
labs(fill = "Duration (Min)") +
# plot observation points
::new_scale_fill() +
ggnewscalegeom_point(data = eclipse_start_data,
mapping = aes(x = lon,
y = lat,
fill = start_time,
color = type,
alpha = type),
shape = 21,
size = 0.35) +
scale_alpha_manual(values = c(1, 0.1), guide = "none") +
scale_color_manual(values = c("white", NA), guide = "none") +
scale_fill_stepsn(colors = pal, trans = "time", n.breaks = 7) +
labs(fill = "Start Time (UTC)") +
# setup theme
labs(title = title) +
theme(plot.title = element_text(size = 8, face = "bold", hjust = .5),
legend.position = "bottom",
legend.spacing.x = unit(5, "lines"),
legend.key.width = unit(2, "lines"),
legend.key.height = unit(0.3, "lines"),
legend.text = element_text(size = 5),
legend.title.position = "top",
legend.title = element_text(hjust = 0.5, size = 5)
) }
Using this function, we generate separate plots for each eclipse year. These plots are then combined into a single visual output, adding a unified title, subtitles, and annotations that describe the dataset and the sources of our data.
<-
p23 plot_us_eclipse(us_maps, duration_2023, eclipse_start_2023, year_labels["2023"])
<-
p24 plot_us_eclipse(us_maps, duration_2024, eclipse_start_2024, year_labels["2024"])
<-
p_out + p24) +
(p23 plot_annotation(
title = "2023 and 2024 US Eclipses",
subtitle = plot_desc,
caption = plot_caption,
theme = theme(
plot.title = element_text(size = 20, hjust = .5, face = "bold",
margin = margin(b = 15)),
plot.subtitle = element_text(size = 11, hjust = .5, face = "italic",
margin = margin(b = 30)),
plot.caption = element_text(margin = margin(t = 30, r = 15))
)
)
p_out
The resultant plots provide a visual representation of the eclipses’ durations and their start times across the U.S., highlighted with specific focus on the cities significantly affected by these celestial events. This visualization not only conveys the temporal and geographical spread of the eclipses but also enhances our understanding of their visibility across different regions.
In conclusion, this analysis not only sheds light on the fascinating phenomenon of solar eclipses but also demonstrates the power of data visualization in interpreting complex datasets. By carefully preparing data, setting a narrative with descriptive captions, and employing effective visualizations, we can provide insightful observations that are both informative and engaging.