Every campaign cycle I’ve to do similar things, go to a repository, download a bounce of data, merge and store them to an existing RData file for posterior analysis. Ive already wrote about this topic some time ago, but I think my script became simpler this turn.

Set the Directory

Let’s assume you’re not in the same directory of your files, so you’ll need to set R to where the population of files resides.

	
setwd("~/Downloads/consulta_cand_2014")

Getting a List of files

Next, it’s just a matter of getting to know your files. For this, the list.files() function is very handy, and you can see the file names right-way in your screen. Here I’m looking form those “txt” files, so I want my list of files exclude everything else, like pdf, jpg etc.

	
files <- list.files(pattern= '\.txt$')

Sometimes you may find empty objects that may prevent the script to run successfully against them. Thus, you may want to inspect the files beforehand.

	
info = file.info(files)
empty = rownames(info[info$size == 0, ])

Moreover, in case you have the same files in more than one format, you may want to filter them like in the following:

	
CSVs <-list.files(pattern='csv')
TXTs <- list.files(pattern='txt')
mylist <- CSVs[!CSVs %in% TXTs]

Stacking files into a dataframe

The last step is to iterate “rbind” through the list of files in the working directory putting all them together. Notice that in the script below I’ve included some extra conditions to avoid problems reading the files I have. Also, this assumes all the files have the same number of columns, otherwise “rbind” won’t work. In this case you may need to replace “rbind” by “smartbind” from gtools package.

	
cand_br <- do.call("rbind",lapply(files,
FUN=function(files){read.table(files,
header=FALSE, sep=";",stringsAsFactors=FALSE, 
fileEncoding="cp1252", fill=TRUE,blank.lines.skip=TRUE)
}))