I was trying to change few levels in my factor variable by simply coercing characters on that factor variable but it dint seem to work.
data(iris)
iris$Species[c(50:120)] <- rep("Random", 71)
## Warning: invalid factor level, NAs generated
iris$Species
## [1] setosa setosa setosa setosa setosa setosa setosa
## [8] setosa setosa setosa setosa setosa setosa setosa
## [15] setosa setosa setosa setosa setosa setosa setosa
## [22] setosa setosa setosa setosa setosa setosa setosa
## [29] setosa setosa setosa setosa setosa setosa setosa
## [36] setosa setosa setosa setosa setosa setosa setosa
## [43] setosa setosa setosa setosa setosa setosa setosa
## [50] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [57] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [64] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [71] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [78] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [85] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [92] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [99] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [106] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [113] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## [120] <NA> virginica virginica virginica virginica virginica virginica
## [127] virginica virginica virginica virginica virginica virginica virginica
## [134] virginica virginica virginica virginica virginica virginica virginica
## [141] virginica virginica virginica virginica virginica virginica virginica
## [148] virginica virginica virginica
## Levels: setosa versicolor virginica
Well, I did find a way to find a work around for that by doing this:
iris$Species <- as.character(iris$Species)
iris$Species[c(50:120)] <- rep("Random", 71)
iris$Species <- as.factor(iris$Species)
iris$Species
## [1] setosa setosa setosa setosa setosa setosa setosa
## [8] setosa setosa setosa setosa setosa setosa setosa
## [15] setosa setosa setosa setosa setosa setosa setosa
## [22] setosa setosa setosa setosa setosa setosa setosa
## [29] setosa setosa setosa setosa setosa setosa setosa
## [36] setosa setosa setosa setosa setosa setosa setosa
## [43] setosa setosa setosa setosa setosa setosa setosa
## [50] Random Random Random Random Random Random Random
## [57] Random Random Random Random Random Random Random
## [64] Random Random Random Random Random Random Random
## [71] Random Random Random Random Random Random Random
## [78] Random Random Random Random Random Random Random
## [85] Random Random Random Random Random Random Random
## [92] Random Random Random Random Random Random Random
## [99] Random Random Random Random Random Random Random
## [106] Random Random Random Random Random Random Random
## [113] Random Random Random Random Random Random Random
## [120] Random virginica virginica virginica virginica virginica virginica
## [127] virginica virginica virginica virginica virginica virginica virginica
## [134] virginica virginica virginica virginica virginica virginica virginica
## [141] virginica virginica virginica virginica virginica virginica virginica
## [148] virginica virginica virginica
## Levels: Random setosa virginica
This problem annoyed me at first, “Why would R not allow me to change/add factor levels!?!@#!@#?” but then Utkarsh and I had a conversation about this which made me think otherwise.
Excerpts from the conversation:
Utkarsh: It is usually not good to create data on the fly. Besides, when you create a factor variable, you should give the finite set of values it can take. This prevents future mistakes. It is called type checking. Python does not do it. R does it to some extent. C does it to some extent. Haskell does it very very strictly and it prevents about 50% of bugs from appearing. Let's say you misspell one of the levels.
In retrospect, it actually makes sense for us not to be able to add/edit the levels in factor variables. For a simple reason, we “might” make mistake, and misspelling a factor level could cause serious trouble. Lesson learnt!