Introduction

Column

Motivation and Background

Working Women: A study on female participation in the labor force around the world
Historically, women around the world tend to face barriers when entering and staying in the workforce. Using data from around the world, this study will compare labor rates in different countries and look at correlating factors.

Some of the research questions explored include:

  • How do female participation rates vary from country to country?

  • What variables in the data set correlate with female participation rates?

  • Is there a relationship between other variables, such as life expectancy and region?

The source of the data is The World Bank. The gender section was used to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.

Variable Explanations

The original data set contained more than 60 numerical variables, but a subset was selected to focus on key indicators.

  • Country: the country of the observation
    -Not all countries are represented, and some have more data than others

  • Year: the year of the observation
    -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.
    -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.

  • Region: the region of the country
    -There are 7 regions

  • Income Level: the income level of the country
    -There are 4 income levels
    -According to the World Bank, “the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year.” More about the income classification can be found here.

  • Male Life Expectancy: life expectancy at birth, male (years)

  • Female Life Expectancy: life expectancy at birth, female (years)

  • Fertility Rate: Number of children born per woman on average (births per woman)

  • Female Labor: Female labor force as a proportion of the total labor force (percentage)
    -Shows how active women are in relation to others in the labor force
    -The labor force is made up of people 15 or older that supply labor

  • Female Participation: Rate of women ages 15 or older that supply labor (percentage)

  • Male Participation: Rate of men ages 15 or older that supply labor (percentage)

Analysis

In the summary statistics and correlation tabs, only data from 2020 will be used.

Summary Statistics

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top.

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.

  • Female life expectancy is higher on average than male life expectancy

  • Male participation rate tends to be higher than female participation rate

  • Both female percentage of labor force and female participation rate show significant variation in the data


Correlation Plot

The correlation plot shows relationships between the numerical variables in the data set.

  • Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

  • Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

  • Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.

  • Female participation is not strongly correlated with any other numerical values in the data set.

In the next few tabs, the relationship between female participation and region and income will be explored.

Column

Summary Statistics


Categorical Variables by Country

 Region                          Income Group            
 East Asia & Pacific       :37   Low income         :28  
 Europe & Central Asia     :58   Lower middle income:54  
 Latin America & Caribbean :42   Upper middle income:54  
 Middle East & North Africa:21   High income        :80  
 North America             : 3   NA's               : 1  
 South Asia                : 8                           
 Sub-Saharan Africa        :48                           

Numerical Variables

Variable Min Mean Max Missing Values (%)
Male Life Expectancy 51.45 70.57 82.9 8.29
Female Life Expectancy 55.88 75.47 88 8.29
Fertility Rate 0.84 2.57 6.74 7.83
Female Percentage of Labor Force 8.27 41.17 54.91 13.82
Male Participation Rate 44.24 69.2 95.44 13.82
Female Participation Rate 6.08 49.69 83.05 13.82

Correlation

Exploration

Column

Source Data


The table shows countries and corresponding values from 2020.

Worldwide Map

Female Participation

Column

Histogram

The distribution of female participation follows close to a normal distribution, but is skewed slightly to the left.


The mean value for female participation is 49.69%.

The country with the smallest percentage is Yemen, Rep. at 6.08%. Yemen is in the region category of Middle East & North Africa and is classified as low income.

The country with the highest participation is Solomon Islands at a rate of 83.05%. The Solomon Islands are classified as lower middle income and located in the East Asia & Pacific region.

Plot Analysis

Region
Region is displayed as a map by country for the 2020 values (Exploration tab) and the average participation by region over time.

  • On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.

  • The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.

  • The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.


Income
The distribution by income shows some interesting results.

  • The median differences in income levels are not as drastic as the differences in region levels.

  • Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.

  • The category lower middle income has the largest spread of data.

  • The two ends of the spectrum have the highest median rates of female participation


Male Participation
Female and male participation do not correlate.

  • In all regions, the average male participation is higher than the average female participation.

  • The average for male participation remains about the same for each region, but the average for female participation varies.

Column

Region

Income

Male Participation

Regional Correlations

Column

Fertility Rate

Income

Female Life Expectancy

Column

Analysis

After examining several correlations, region emerged as the factor with the greatest variation across different variables. This section focuses on some of the regional differences.


Fertility Rate
The fertility rate in Africa is higher than the rest of the world.

  • Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.

  • South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74.


Income
Income varies widely by region.

  • North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.

  • Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).

  • The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.


Female Life Expectancy
Female life expectancy has increased over time in all countries.

  • As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.

  • South Asia’s life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).

  • North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down during those years.

Conclusions

Column

Summary

The rate of women that participate in the workforce varies around the world, and many variables affect the female participation rates in different countries. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.

The data available placed limitations on the study. More numerical variables that could have been analyzed, however, most of them had missing values for many countries. This study was focused on relevant variables with the most data.

Several assumptions were made throughout the analysis.

  • The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. These countries were small, but they were excluded from the results.

  • When grouping by region over time, some countries had more history recorded than others. In addition, the averages were not weighted by population, something that could be improved upon in the future.

I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how strongly variables were grouped by region. Countries in the same region were much more likely to have similar characteristics than any other indicator.

Column

Future Work

While this study was narrowed down to 8 variables, much work could be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as more analysis on the difference in male and female participation rates. In some regions, the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to this presentation.

Another way to study this data could be by country or region. This study gave a broad overview of the world focusing on the year 2020, but it could be interesting to focus on certain regions.

References

All data used in this project came from The World Bank. In addition to data used here, the World Bank has other resources that were used to better understand the data after performing some initial analysis.

Another helpful resource was iMediaProf - the Youtube video shows how to embed a Tableau view into a markdown file.

---
title: "Working Women"
author: "Rachel Sebastian"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: embed
    theme:
      bootswatch: zephyr
---

```{r setup, include=FALSE}
library(flexdashboard)
```

```{r imports}
setwd("C:/Users/clari/Documents/Work/Projects/Working Women")

library(pacman)

p_load(tidyverse, ggplot2, RColorBrewer, DataExplorer, vtable, scales, plotly)

#reading in files
gender <- read_csv("data/gender.csv", skip = 4)
colnames(gender) <- mapply(gsub, 'X', '', colnames(gender), USE.NAMES = FALSE)

gender <- gender %>% rename(country_code = "Country Code", country_name = "Country Name", ind_code = "Indicator Code", ind_name = "Indicator Name")

region_income <- read_csv("data/region_income_level.csv") 
region_income <- region_income %>% rename(country_code = "Country Code", region = "Region", income_group = "IncomeGroup") %>% 
  select(country_code, region, income_group)

region_income <- region_income %>% subset(!is.na(country_code)) %>% subset(nchar(country_code) == 3)

#Creating data frame with wanted variables

indicator_names = c("m_life_exp","f_life_exp", "fertility_rate", "female_labor", "male_participation", "female_participation")

df <- gender %>% mutate(indicator = case_when(
  ind_code == "SP.DYN.LE00.MA.IN" ~ indicator_names[1],
  ind_code == "SP.DYN.LE00.FE.IN" ~ indicator_names[2],
  ind_code == "SP.DYN.TFRT.IN" ~ indicator_names[3],
  ind_code == "SL.TLF.TOTL.FE.ZS" ~ indicator_names[4],
  ind_code == "SL.TLF.CACT.MA.ZS" ~ indicator_names[5],
  ind_code == "SL.TLF.CACT.FE.ZS" ~ indicator_names[6]
  
))

df <- subset(df, !is.na(indicator))
df <- df %>% select(-c(ind_name, ind_code)) %>% select("indicator", "country_name", "country_code", everything())

df <- data.frame(country = rep(unique(df$country_name), 62),
                  country_code = rep(unique(df$country_code), 62),
                  year = rep(1960:2021, each = length(unique(df$country_name))),
                  m_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[1], 4:65]))),
                  f_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[2], 4:65]))),
                  fertility_rate = unname(unlist(as.vector(df[df$indicator==indicator_names[3], 4:65]))),
                  female_labor = unname(unlist(as.vector(df[df$indicator==indicator_names[4], 4:65]))),
                  male_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[5], 4:65]))),
                  female_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[6], 4:65])))
)

df <- df %>% left_join(region_income, by = "country_code") %>% 
  select(country, year, country_code, region, income_group, everything())

df <- df %>% mutate_if(is.character, as.factor)

df$income_group <- factor(df$income_group, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))

#Taking out data that was not grouped by individual country
data_2020 <- df %>% subset(year == "2020") %>% select(-c("year")) %>% subset(!is.na(region))

#Averages by region and year
region_groups <- df %>% group_by(year, region) %>% summarise(avg_female_labor = mean(female_labor, na.rm = T), avg_female_le = mean(f_life_exp, na.rm = T), avg_female_participation = mean(female_participation, na.rm = T), avg_male_participation = mean(male_participation, na.rm = T))

```

Introduction
=======================================================================

Column {.tabset data-width=600 .tabset-fade}
-----------------------------------------------------------------------

### Motivation and Background

<font size="5"> **Working Women: A study on female participation in the labor force around the world**</font>  
Historically, women around the world tend to face barriers when entering and staying in the workforce. Using data from around the world, this study will compare labor rates in different countries and look at correlating factors. 

Some of the research questions explored include:

- How do female participation rates vary from country to country?

- What variables in the data set correlate with female participation rates?

- Is there a relationship between other variables, such as life expectancy and region?

The source of the data is <a href="https://genderdata.worldbank.org/" target="_blank">The World Bank</a>. The gender section was used to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.


### Variable Explanations

The original data set contained more than 60 numerical variables, but a subset was selected to focus on key indicators.

- **Country**: the country of the observation  
  -Not all countries are represented, and some have more data than others  

- **Year**: the year of the observation  
  -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.  
  -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.  

- **Region**: the region of the country  
  -There are 7 regions

- **Income Level**: the income level of the country  
  -There are 4 income levels  
  -According to the World Bank, "the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year." More about the income classification can be found <a href="https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023#" target="_blank">here</a>.  
   
- **Male Life Expectancy**: life expectancy at birth, male (years)

- **Female Life Expectancy**: life expectancy at birth, female (years)

- **Fertility Rate**: Number of children born per woman on average (births per woman)

- **Female Labor**: Female labor force as a proportion of the total labor force (percentage)  
  -Shows how active women are in relation to others in the labor force  
  -The labor force is made up of people 15 or older that supply labor
  
- **Female Participation**: Rate of women ages 15 or older that supply labor (percentage)

- **Male Participation**: Rate of men ages 15 or older that supply labor (percentage)

### Analysis

In the summary statistics and correlation tabs, only data from 2020 will be used. 

**Summary Statistics**

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top. 

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.   

- Female life expectancy is higher on average than male life expectancy  

- Male participation rate tends to be higher than female participation rate  

- Both female percentage of labor force and female participation rate show significant variation in the data 

----------------------------------------------------------------

**Correlation Plot**

The correlation plot shows relationships between the numerical variables in the data set.  

- Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

- Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

- Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.

- Female participation is not strongly correlated with any other numerical values in the data set. 

In the next few tabs, the relationship between female participation and region and income will be explored.  

Column {.tabset data-width=400 .tabset-fade}
-----------------------------------------------------------------------

### Summary Statistics
<br>
<span style="color: light grey;">Categorical Variables by Country</span>

``` {r summary_cat} 
region_income_table <- summary(data_2020 %>% select(region, income_group))
colnames(region_income_table) <- c("Region", "Income Group")
region_income_table
```

<span style="color: light grey;">Numerical Variables</span>

``` {r summary_num}
labs <- c('Male Life Expectancy',
          'Female Life Expectancy',
          'Fertility Rate',
          'Female Percentage of Labor Force',
          'Male Participation Rate',
          'Female Participation Rate')

st(data_2020 %>% select(-c("region", "income_group", "country", "country_code")),
         summ=c('min(x)',
                'mean(x)',
                'max(x)',
                'propNA(x)*100'),
         summ.names = c('Min',
                        'Mean',
                        'Max',
                        'Missing Values (%)'),
         title = "",
         digits = 2,
         labels = labs)
```

### Correlation
``` {r correlation}
corr <- data_2020 %>% select(-c("region", "income_group", "country", "country_code"))

plot_correlation(corr, cor_args = list("use" = "complete.obs"))
```

Exploration
=======================================================================

Column {.tabset .tabset-fade}
----------------------------------------------------------------------
### Source Data

<br>
The table shows countries and corresponding values from 2020.
<br>

``` {r view}
DT::datatable(df %>% filter(year == "2020", !is.na(region))) %>%
    DT::formatRound(columns=c("female_labor", "male_participation", "female_participation"), digits=3)
```

### Worldwide Map

<div class='tableauPlaceholder' id='viz1669924235929' style='position: relative'><noscript><a href='#'><img alt='Female Participation ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Fe&#47;FemaleParticipationWorldwide&#47;FemaleParticipation&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='FemaleParticipationWorldwide&#47;FemaleParticipation' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Fe&#47;FemaleParticipationWorldwide&#47;FemaleParticipation&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                

``` {js, embedcode}
var divElement = document.getElementById('viz1669924235929');                    
var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.4)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    
vizElement.parentNode.insertBefore(scriptElement, vizElement);
```


Female Participation
=======================================================================

Column {.tabset data-width=550 .tabset-fade}
----------------------------------------------------------------------

### Histogram

The distribution of female participation follows close to a normal distribution, but is skewed slightly to the left.

``` {r f_hist}
ggplot(data_2020, aes(x= female_participation)) + geom_histogram(na.rm=T, binwidth = 5, col = "white", fill = "#1b2085") + labs(x = "Female Participation (%)", y = "Number of Countries", title = "Distribution of Female Participation in the Labor Force")
```
<br>

The mean value for female participation is `r round(mean(data_2020$female_participation, na.rm = T),2)`%. 

The country with the smallest percentage is `r data_2020[which.min(data_2020$female_participation), "country"]` at `r round(data_2020[which.min(data_2020$female_participation), "female_participation"],2)`%. Yemen is in the region category of `r data_2020[which.min(data_2020$female_participation), "region"]` and is classified as `r tolower(data_2020[which.min(data_2020$female_participation), "income_group"])`.

The country with the highest participation is `r data_2020[which.max(data_2020$female_participation), "country"]` at a rate of `r round(data_2020[which.max(data_2020$female_participation), "female_participation"],2)`%. The `r data_2020[which.max(data_2020$female_participation), "country"]` are classified as `r tolower(data_2020[which.max(data_2020$female_participation), "income_group"])` and located in the `r data_2020[which.max(data_2020$female_participation), "region"]` region.

### Plot Analysis

**Region**  
Region is displayed as a map by country for the 2020 values (Exploration tab) and the average participation by region over time.

- On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.

- The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.

- The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.

---------------------------------

**Income**  
The distribution by income shows some interesting results.

- The median differences in income levels are not as drastic as the differences in region levels.

- Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.

- The category lower middle income has the largest spread of data.

- The two ends of the spectrum have the highest median rates of female participation 

---------------------------------------

**Male Participation**  
Female and male participation do not correlate.  

- In all regions, the average male participation is higher than the average female participation.

- The average for male participation remains about the same for each region, but the average for female participation varies.


Column {.tabset data-width=450 .tabset-fade}
--------------------------------------------------------------------

### Region

``` {r region_1}
ggplot(region_groups, aes(x = year, y = avg_female_participation, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + xlim(1988, 2022) + scale_color_brewer(palette = "Set2", na.translate = FALSE) + theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Average Female Participation by Region from 1990-2020", x = "Year", y = "Average Female Participation (%)", col = "Region") + theme(text = element_text(size=10))
```

### Income

``` {r income}
p1 <- ggplot(data_2020, aes(x=income_group, y = female_participation)) + geom_boxplot(na.rm = TRUE, fill = "#4F8073") +
  scale_x_discrete(na.translate = FALSE) + labs(x = "Income Level", y = "Female Participation (%)", title = "Female Participation Distribution by Income") + theme(text = element_text(size=13))

ggplotly(p1)
```

### Male Participation
``` {r f_perc}
participation <- region_groups %>% 
  filter(year == 2020) %>% 
  select(year, region, avg_female_participation, avg_male_participation) %>%
  rename(Female = avg_female_participation, Male = avg_male_participation) %>%
  mutate(region = reorder(region, -Female))  # Reorder regions by descending Female participation

participation_long <- gather(participation, gender, participation, Female:Male)

participation_plot <- ggplot(participation_long, aes(x = region, y = participation, fill = gender, 
                                                   text = paste0("Region: ", region, "\nGender: ", gender, 
                                                                 "\nParticipation: ", round(participation, 2), "%"))) + 
  geom_bar(stat = "identity", position = "dodge", na.rm = T) + 
  scale_x_discrete(na.translate = FALSE, labels = label_wrap(12)) + 
  labs(x = "Region", y = "Average Participation Rate (%)", 
       title = "Participation Rates by Region and Gender", fill = "Gender") + 
  scale_fill_manual(values = c("#008395", "#95B0B6")) + 
  theme(text = element_text(size = 13))

ggplotly(participation_plot, tooltip = "text")

```

Regional Correlations
=======================================================================

Column {.tabset data-width=500 .tabset-fade}
-----------------------------------------------------------------------

### Fertility Rate

``` {r r_map}
library(maps)

map <- map_data("world")

#Recoding names to match data set
map$region <- map$region %>% recode("USA" = "United States",
                                    "Venezuela" = "Venezuela, RB",
                                    "Egypt" = "Egypt, Arab Rep.",
                                    "Iran" = "Iran, Islamic Rep.",
                                    "North Korea" = "Korea, Dem. People's Rep.",
                                    "South Korea" = "Korea, Rep.",
                                    "Turkey" = "Turkiye",
                                    "Yemen" = "Yemen, Rep.",
                                    "Laos" = "Lao PDR",
                                    "Russia" = "Russian Federation",
                                    "Syria" = "Syrian Arab Republic",
                                    "Democratic Republic of the Congo" = "Congo, Dem. Rep.",
                                    "Republic of Congo" = "Congo, Rep.",
                                    "French Guiana" = "Guyana",
                                    "Kyrgyzstan" = "Kyrgyz Republic",
                                    "Ivory Coast" = "Cote d'Ivoire",
                                    "Virgin Islands" = "Virgin Islands (U.S.)",
                                    "Saint Vincent" = "St. Vincent and the Grenadines",
                                    "Trinidad" = "Trinidad and Tobago",
                                    "Sint Maarten" = "Sint Maarten (Dutch part)",
                                    "Slovakia" = "Slovak Republic",
                                    "Gambia" = "Gambia, The",
                                    "UK" = "United Kingdom",
                                    "Saint Martin" = "St. Martin (French part)",
                                    "Saint Lucia" = "St. Lucia",
                                    "Antigua" = "Antigua and Barbuda",
                                    "Bahamas" = "Bahamas, The"
                                    )

gender_map <- data_2020 %>% left_join(map, by = c("country"="region"))

```

``` {r fertility_map, fig.height = 7, fig.width = 12}
f <- ggplot(gender_map, aes(long, lat)) + 
  geom_polygon(aes(group = group, fill = fertility_rate, text = paste0(country, ": ", round(fertility_rate, 2), " births/woman"))) +
  scale_fill_viridis_c(option = "D") + labs(fill = "Fertility Rate") + 
  coord_map() + theme_minimal() +
  theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major = element_blank(), panel.background = element_blank(), legend.position = "none")

ggplotly(f, tooltip = "text")
```

###  Income

``` {r income_region}

#Calculate the percentage of each income group per region
region_income_percentages <- data_2020 %>%
  group_by(region, income_group) %>%
  summarise(count = n(), .groups = 'drop') %>%
  group_by(region) %>%
  mutate(total = sum(count),
         percentage = count / total) %>%
  filter(income_group == "High income") %>% 
  ungroup()

#Find regions that have no high-income countries
regions_with_no_high_income <- setdiff(unique(data_2020$region), region_income_percentages$region)

#Create a dataframe for regions with no high income, and assign 0% participation
regions_with_no_high_income_df <- tibble(
  region = regions_with_no_high_income,
  income_group = "High income",
  count = 0,
  total = 0,
  percentage = 0
)

#Combine the original region_income_percentages with the new rows
region_income_percentages <- bind_rows(region_income_percentages, regions_with_no_high_income_df)

#Get the regions ordered by the percentage of high-income people
ordered_regions <- region_income_percentages %>%
  arrange(desc(percentage)) %>%
  pull(region)

#Reorder regions in the original dataset
income_region <- data_2020 %>%
  mutate(region = factor(region, levels = ordered_regions)) #Reorder by high-income percentage

income_plot <- ggplot(income_region, aes(x = region, fill = income_group)) + 
  geom_bar(position = "fill", na.rm = TRUE) + 
  scale_x_discrete(na.translate = FALSE, labels = label_wrap(12)) + 
  scale_y_continuous(breaks = seq(0,1,by = .2), labels = percent) + 
  scale_fill_manual(values = c("#472d30", "#723d46", "#ad2a56", "#ba8466", "#03071e")) + 
  theme(legend.position = "top") + 
  labs(title = "Income Levels by Region", x = "Region", y = "Percentage (%)", fill = "") + 
  theme(text = element_text(size = 10))

income_plot
```

### Female Life Expectancy

``` {r f_le_region}
ggplot(region_groups, aes(x = year, y = avg_female_le, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + scale_color_brewer(palette = "Set2", na.translate = FALSE) + 
  theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Female Life Expectancy by Region from 1960-2020", x = "Year", y = "Average Female Life Expectancy (years)", col = "Region") + theme(text = element_text(size=10))
```

Column {data-width=500}
-----------------------------------------------------------------------

### Analysis

After examining several correlations, region emerged as the factor with the greatest variation across different variables. This section focuses on some of the regional differences.

-----------

**Fertility Rate**  
The fertility rate in Africa is higher than the rest of the world.

- Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.

- South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74. 

------------

**Income**  
Income varies widely by region.

- North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.

- Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).

- The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.

-------------

**Female Life Expectancy**  
Female life expectancy has increased over time in all countries.

- As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.

- South Asia's life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).

- North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down during those years.

Conclusions
=======================================================================

Column {data-width=500}
-----------------------------------------------------------------------
###  Summary

The rate of women that participate in the workforce varies around the world, and many variables affect the female participation rates in different countries. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.

The data available placed limitations on the study. More numerical variables that could have been analyzed, however, most of them had missing values for many countries. This study was focused on relevant variables with the most data. 

Several assumptions were made throughout the analysis.

-  The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. These countries were small, but they were excluded from the results.

- When grouping by region over time, some countries had more history recorded than others. In addition, the averages were not weighted by population, something that could be improved upon in the future. 

I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how strongly variables were grouped by region. Countries in the same region were much more likely to have similar characteristics than any other indicator.


Column {data-width=500}
-----------------------------------------------------------------------

### Future Work

While this study was narrowed down to 8 variables, much work could be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as more analysis on the difference in male and female participation rates. In some regions, the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to this presentation. 

Another way to study this data could be by country or region. This study gave a broad overview of the world focusing on the year 2020, but it could be interesting to focus on certain regions.

###  References

All data used in this project came from <a href="https://genderdata.worldbank.org/" target="_blank">The World Bank</a>. In addition to data used here, the World Bank has other <a href="https://genderdata.worldbank.org/data-stories/flfp-data-story/" target="_blank">resources</a> that were used to better understand the data after performing some initial analysis.

Another helpful resource was <a href="https://www.youtube.com/watch?v=yBIfRS56gjo" target="_blank">iMediaProf</a> - the Youtube video shows how to embed a Tableau view into a markdown file.