in

Mapping South America with R: A Deep Dive into Geo-Visualization | by Fernando Barbalho | Aug, 2023


Navigating datasets, geopolitical nuances, and coding challenges to color a complete image of the continent

Picture by Alexander Schimmeck on Unsplash

So you’re that form of information scientist and newbie Medium author who has loved maps and geography since childhood. You might be looking for a great theme in your subsequent work with graphs and, most actually, maps once you notice that the official statistics company of your nation, Brazil, launched the newest census information. Why not? Why not take an image of Brazil in comparison with its neighbors in South America? It could be a easy activity utilizing R and all its good packages. Let’s do it.

The minute after this choice comes the conclusion that the easy activity is certainly a hero’s journey with components resembling discovering essentially the most appropriate dataset with shapefiles, lack of know-how, shapefiles interoperability, latitude and longitude arithmetic, cultural variations in Geography ideas, and even geopolitical points, like understanding the best way to put French abroad territories’ map and information accurately in South America.

The subsequent paragraphs clarify one in all some attainable paths to color demographic data in a delimited portion of a world map. The step-by-step described beneath could be helpful for all these eager about worldwide comparability with a Geo-visualization method, even when one’s objective is to match entry to water amongst African international locations or weight problems charges in North America.

Let’s begin with the entire image: an R model of mapa-mundi. See the picture and code beneath.

Mapa Mundi: Picture by creator
library(readxl)
library(geobr)
library(tidyverse)
library(sf)
library(spData)
library(ggrepel)

information("world")

#mapa mundi

world %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
scale_fill_continuous_sequential(palette= "Warmth 2" )+
theme_void() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões de habitantes", 10)
)

I take advantage of the package deal {spData} as a reference for a dataframe with geometry data for territories shapefiles throughout the planet. The aes perform makes use of the inhabitants data to fill the shapes. As we all know, China and India are essentially the most populated international locations on the planet, with over 1 billion folks every. The warmth colours present the distinction with all different international locations. A lot of the sequential colours are weak. We will barely perceive the gradient of colours within the image. The logarithm is the perfect different if you’d like a glimpse of a greater coloration distribution. See beneath.

Mapa mundi with log scale. Picture by creator
world %>%
ggplot() +
geom_sf(aes(fill=pop)) +
scale_fill_continuous_sequential(palette= "Warmth 2", trans= "log2" )+
theme_void() +
theme(
panel.background = element_rect(fill="#0077be"),
legend.place = "none"
)

Within the code, you may see the logarithm transformation within the scale_fill_continuous_sequential perform.

On the planet dataframe construction, there’s a Continent column. So, filtering the info utilizing this column to get a South American map is clear. See the code and, proper after, the map.

South America map: first model. Picture by creator
world %>%
filter(continent == "South America") %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
scale_fill_continuous_sequential(palette= "Warmth 2" )+
theme_void() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões de habitantes", 10)
)

As you may see, the dplyr filter perform labored superb; that is simply the map we needed to see. However is it actually right?

Local weather change is a large situation, however the sea ranges haven’t risen but with such a quantity to drown a pronounced space that used to look in North of South America. What occurred right here? Let’s draw one other map now with the assistance of coordinates and naming the polygons.

South America map: second model. Picture by creator
southamerica<-
world %>%
filter(continent=="South America")

southamerica$lon<- sf::st_coordinates(sf::st_centroid(southamerica$geom))[,1]
southamerica$lat<- sf::st_coordinates(sf::st_centroid(southamerica$geom))[,2]

southamerica %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
scale_fill_continuous_sequential(palette= "Warmth 2" )+
theme_light() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões de habitantes", 10)
)+
geom_text_repel(aes(x=lon, y=lat, label= str_wrap(name_long,20)),
coloration = "black",
fontface = "daring",
measurement = 2.8)

The theme_light as an alternative of theme_void was adequate to show the coordinates. The polygon naming took extra work. We needed to calculate the centroid of every polygon after which use this data as x and y coordinates in a geom_text_repel perform.

With this new map model and a few earlier information, we found the lacking territory was French Guyana, between 0º and 10º north latitude and 53º and 55º west longitude. Our subsequent quest is knowing the best way to get items of knowledge on French Guyana: polygon, inhabitants, and a few coordinates to fill our map.

I needed to isolate France from the remainder of the world to grasp how the {spData} package deal handled this nation map’s information. See the end result beneath.

A map of France. Picture by creator
france<-
world %>%
filter(iso_a2 == "FR")

france %>%
ggplot() +
geom_sf(aes(fill=pop)) +
scale_fill_continuous_sequential(palette= "Warmth 2", trans= "log2" )+
theme_light() +
theme(
#panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População", 30)
)

France has many so-called abroad territories. The method of the {spData} package deal was to symbolize solely the primary territory, plus Corsica, an island within the Mediterranean Sea, and French Guyana, positioned exactly within the coordinate vary that characterizes the hole in our final map of South America.

My subsequent attempt was so as to add the dataframe with France geometry information to my South America filter, however I knew I would want extra. See beneath

South America + France. Picture by creator
southamerica %>%
bind_rows(france) %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
scale_fill_continuous_sequential(palette= "Warmth 2" )+
theme_light() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões de habitantes", 10)
)+
geom_text_repel(aes(x=lon, y=lat, label= str_wrap(name_long,20)),
coloration = "black",
fontface = "daring",
measurement = 2.8)

As you may see within the code, I used bind_row to mix South American territories with France shapefile. So we had now the French Guyana properly positioned. However, there is no such thing as a inhabitants data on the map, and France is part of South America on the reverse of colonialism’s historical past.

In different phrases, I needed this map.

French Guyana is on the South America map. Picture by creator
data_guiana<-
insee::get_idbank_list('TCRED-ESTIMATIONS-POPULATION') %>%
filter(str_detect(REF_AREA_label_fr,"Guyane")) %>%
filter(AGE == "00-") %>% #all ages
filter(SEXE == 0) %>% #women and men
pull(idbank) %>%
insee::get_insee_idbank() %>%
filter(TIME_PERIOD == "2023") %>%
choose(TITLE_EN,OBS_VALUE) %>%
mutate(iso_a2 = "FR")

data_guiana <- janitor::clean_names(data_guiana)

southamerica %>%
bind_rows(france) %>%
left_join(data_guiana) %>%
mutate(pop=ifelse(iso_a2=="FR",obs_value,pop))%>%
mutate(lon= ifelse(iso_a2=="FR", france[[11]][[1]][[1]][[1]][1,1], lon),
lat= ifelse(iso_a2=="FR",france[[11]][[1]][[1]][[1]][1,2], lat)) %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
scale_fill_continuous_sequential(palette= "Warmth 2" )+
geom_text_repel(aes(x=lon, y=lat, label= str_wrap(name_long,20)),
coloration = "black",
fontface = "daring",
measurement = 2.8)+
coord_sf(xlim = c(-82,-35), ylim=c(-60,15))+
theme_light() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões de habitantes", 10)
)

As you may learn, I used an R package produced by France’s official statistics workplace to acquire the inhabitants of Guyana. As well as, I restricted the map to the suitable coordinates to see South America.

Now that the map hero lastly resolved the South American points and performed the pipes of peace with France, it’s time to return to Brazilian information and maps. Keep in mind, I need to evaluate some Brazilian census particulars with different international locations and territories south of Panama.

The info census is obtainable in an R package or at an API tackle. I did the tougher choice utilizing the API. Utilizing the opposite choice one other time could be a good suggestion. See the code and the map beneath, the place I present the inhabitants of the Brazilian states in distinction to the opposite South American territories.

South America + Brazilian states. Picture by creator
central_america<-
world %>%
filter(subregion == "Central America")

brasil<- geobr::read_country()
estados<- geobr::read_state()

#dados de população

ibge2022<-
get_municipalies_data()

estados<-
estados %>%
inner_join(
ibge2022 %>%
rename(abbrev_state = uf) %>%
summarise(.by=abbrev_state,
pop = sum(populacao_residente)
)
)

southamerica %>%
filter(iso_a2!="BR") %>%
bind_rows(france) %>%
left_join(data_guiana) %>%
mutate(pop=ifelse(iso_a2=="FR",obs_value,pop))%>%
mutate(lon= ifelse(iso_a2=="FR", france[[11]][[1]][[1]][[1]][1,1], lon),
lat= ifelse(iso_a2=="FR",france[[11]][[1]][[1]][[1]][1,2], lat)) %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
geom_sf(information=estados, aes(fill=pop/10^6)) +
geom_sf(information=brasil,fill=NA, coloration="#00A859", lwd=1.2)+
geom_sf(information= central_america,fill= "#808080")+
scale_fill_continuous_sequential(palette= "Warmth 2" )+
geom_text_repel(aes(x=lon, y=lat,
label= str_wrap(name_long,20)),
coloration = "black",
fontface = "daring",
measurement = 2.8)+
coord_sf(xlim = c(-82,-35), ylim=c(-60,15))+
theme_void() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões", 10)
)

I wrote the perform get_municipalites_data utilizing the API cited above. The code is obtainable in my gist. Notice additionally two capabilities that present the shapefiles used to attract the Brazilian and its sub-region limits: read_country and read_states. These capabilities are current on the {geobr} package deal.

I used one other filter from the world dataframe. On this case, the aim is to indicate the start of the Central American subcontinent and paint its map with a shade of grey. Right here, we confronted a cultural divergence as we discovered in Brazil that the Americas have three sub-continents: North America, Central America, and South America. For the authors of the dataset, Central America is a sub-region of North America.

Now it’s time to complete my work. I need to present the names of the eight most populous territories on the map. Even on this closing dash, there have been a couple of code tips.

Most populated territories. Picture by creator
estados$lon<- sf::st_coordinates(sf::st_centroid(estados$geom))[,1]   
estados$lat<- sf::st_coordinates(sf::st_centroid(estados$geom))[,2]

most_populated<-
southamerica %>%
filter(iso_a2 !="BR") %>%
rename(title= name_long) %>%
as_tibble() %>%
choose(title, pop, lat, lon) %>%

bind_rows(
estados %>%
rename(title= name_state) %>%
as_tibble() %>%
choose(title, pop, lat, lon)
) %>%
slice_max(order_by = pop, n=8)

southamerica %>%
filter(iso_a2!="BR") %>%
bind_rows(france) %>%
left_join(data_guiana) %>%
mutate(pop=ifelse(iso_a2=="FR",obs_value,pop))%>%
mutate(lon= ifelse(iso_a2=="FR", france[[11]][[1]][[1]][[1]][1,1], lon),
lat= ifelse(iso_a2=="FR",france[[11]][[1]][[1]][[1]][1,2], lat)) %>%
ggplot() +
geom_sf(aes(fill=pop/10^6)) +
geom_sf(information=estados, aes(fill=pop/10^6)) +
geom_sf(information=brasil,fill=NA, coloration="#00A859", lwd=1.2)+
geom_sf(information= central_america,fill= "#808080")+
scale_fill_continuous_sequential(palette= "Warmth 2" )+
geom_text_repel(information= most_populated,
aes(x=lon, y=lat,
label= str_c(str_wrap(title,10),": ",spherical(pop/10^6,1))),
coloration = "black",
fontface = "daring",
measurement = 2.9)+
coord_sf(xlim = c(-82,-35), ylim=c(-60,15))+
theme_void() +
theme(
panel.background = element_rect(fill="#0077be")
) +
labs(
fill= str_wrap("População em milhões", 10)
)

Three Brazilian states are among the many eight most populated territories in South America. The truth is, São Paulo is the second most inhabited area on the map, surpassing all of the international locations besides Colombia.

Now, specializing in the code, you may see that I created a brand new dataframe to construct this rating by combining two completely different sf objects. I chosen a sub-set of columns and adjusted the sort from sf to tibble to allow the row binding.

So that is it. The hero has accomplished one attainable path and left the footprints for the following journey. Now it’s your flip. Keep in mind your entire initiatives that would have a big enchancment utilizing a map illustration. Utilizing the walk-through above and gathering all the info accessible about inhabitants, socioeconomic points, and so forth, it’s only a matter of selecting the variable to fill the polygons.

Class Imbalance: From Random Oversampling to ROSE | by Essam Wisam | Aug, 2023

How I Leveraged Open Supply LLMs to Obtain Huge Financial savings on a Massive Compute Venture | by Ryan Shrott | Aug, 2023