{censobr} is an R package to download data from Brazil’s Population Census. It provides a very simple and efficient way to download and read the data sets and documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the Arrow platform, which allows users to work with larger-than-memory census data using {dplyr} familiar functions.
# install from CRAN
install.packages("censobr")
# or use the development version with latest features
::remove.packages('censobr')
utils::install_github("ipeaGIT/censobr", ref="dev")
remoteslibrary(censobr)
The package currently includes 6 main functions to download & read census data:
read_population()
read_households()
read_mortality()
read_families()
read_emigration()
read_tracts()
{censobr} also includes a few support functions to help users navigate the documentation Brazilian censuses, providing convenient information on data variables and methodology:
data_dictionary()
questionnaire()
interview_manual()
Finally, the package includes two functions to help users manage the data chached locally.
censobr_cache()
set_censobr_cache_dir()
The syntax of all {censobr} functions to read data operate on the same logic so it becomes intuitive to download any data set using a single line of code. Like this:
read_households(
year, # year of reference
columns, # select columns to read
add_labels, # add labels to categorical variables
as_data_frame, # return an Arrow DataSet or a data.frame
showProgress, # show download progress bar
cache # cache data for faster access later
)
Note: all data sets in
{censobr} are enriched with geography columns following
the name standards of the {geobr} package to help
data manipulation and integration with spatial data from {geobr}. The
added columns are:
c(‘code_muni’, ‘code_state’, ‘abbrev_state’, ‘name_state’, ‘code_region’, ‘name_region’, ‘code_weighting’)
.
The first time the user runs a function, {censobr}
will download the file and store it locally. This way, the data only
needs to be downloaded once. When the cache
parameter is
set to TRUE
(Default), the function will read the cached
data, which is much faster.
censobr_cache()
: can be used to list and/or delete data
files cached locallyset_censobr_cache_dir()
: can be used to set custom
cache directory for {censobr} filesMicrodata of Brazilian census are often be too big to load in users’
RAM memory. To avoid this problem, {censobr} will by
default return an Arrow
table, which can be analyzed like a regular data.frame
using the dplyr
package without loading the full data to
memory.
More info in the package vignette.
If you would like to contribute to {censobr}, you’re welcome to open an issue to explain the proposed a contribution.
As far as we know, {censobr} is the only R package that provides fast and convenient access to the complete data sets and documentation of Brazilian censuses. The microdadosBrasil package used to provide access to microdata of several public data sets, but unfortunately, it has been discontinued.
Original Census data is collected by the Brazilian Institute of Geography and Statistics (IBGE). The {censobr} package is developed by a team at the Institute for Applied Economic Research (Ipea), Brazil. If you want to cite this package, you can cite it as:
bibentry(
bibtype = "Manual",
title = "censobr: Download Data from Brazil's Population Census",
author = "Rafael H. M. Pereira [aut, cre] and Rogério J. Barbosa [aut]",
year = 2023,
version = "v0.2.0",
url = "https://CRAN.R-project.org/package=censobr",
textVersion = "Pereira, R. H. M.; Barbosa, R. J. (2023) censobr: Download Data from Brazil's Population Census. R package version v0.2.0, <https://CRAN.R-project.org/package=censobr>."
)