Words used in Portuguese Wikipedia
This data-package contains a dataset with words used in a random sample from ~15.000 pages from the Portuguese Wikipedia.
It can be installed using:
::install_github("dfalbel/ptwikiwords") devtools
After installing the package, you can load the dataset using:
library(ptwikiwords)
data(ptwikiwords)
head(ptwikiwords)
#> # A tibble: 6 × 3
#> word count check
#> <chr> <int> <lgl>
#> 1 de 210954 TRUE
#> 2 a 109652 TRUE
#> 3 e 100028 TRUE
#> 4 o 87839 TRUE
#> 5 em 67040 TRUE
#> 6 do 59489 TRUE
The dataset contains 3 columns:
Here is a wordcloud of those words:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(wordcloud))
<- ptwikiwords %>%
words_filter filter(check == T) %>%
slice(1:300)
wordcloud(words_filter$word, words_filter$count)