ptwikiwords

Words used in Portuguese Wikipedia

Travis-CI Build Status CRAN_Status_Badge

This data-package contains a dataset with words used in a random sample from ~15.000 pages from the Portuguese Wikipedia.

Installing

It can be installed using:

devtools::install_github("dfalbel/ptwikiwords")

Using

After installing the package, you can load the dataset using:

library(ptwikiwords)
data(ptwikiwords)
head(ptwikiwords)
#> # A tibble: 6 × 3
#>    word  count check
#>   <chr>  <int> <lgl>
#> 1    de 210954  TRUE
#> 2     a 109652  TRUE
#> 3     e 100028  TRUE
#> 4     o  87839  TRUE
#> 5    em  67040  TRUE
#> 6    do  59489  TRUE

The dataset contains 3 columns:

Here is a wordcloud of those words:

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(wordcloud))
words_filter <- ptwikiwords %>%
  filter(check == T) %>%
  slice(1:300)
wordcloud(words_filter$word, words_filter$count)