String manipulation

See also Factors in “Data structures” chapter.

  1. Concatenate
  2. Split
  3. Regular expressions (grepl, grep, sub)

Concatenate

paste and paste0 concatenate a set of character strings. They can also do replication in strings.

paste("Chr", c(1:22, "X", "Y"), sep = "")
##  [1] "Chr1"  "Chr2"  "Chr3"  "Chr4"  "Chr5"  "Chr6"  "Chr7"  "Chr8" 
##  [9] "Chr9"  "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16"
## [17] "Chr17" "Chr18" "Chr19" "Chr20" "Chr21" "Chr22" "ChrX"  "ChrY"
paste0("Chr", c(1:22, "X", "Y"))
##  [1] "Chr1"  "Chr2"  "Chr3"  "Chr4"  "Chr5"  "Chr6"  "Chr7"  "Chr8" 
##  [9] "Chr9"  "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16"
## [17] "Chr17" "Chr18" "Chr19" "Chr20" "Chr21" "Chr22" "ChrX"  "ChrY"

Split

Spliting characters is done by the strsplit function.
Using the empty string as separator separates all characters.

strsplit("ATTGCCTGGATT", "")
## [[1]]
##  [1] "A" "T" "T" "G" "C" "C" "T" "G" "G" "A" "T" "T"

Regular expressions

Regular expressions can be applied on character strings.
grepl() finds if a string contains a given pattern using or not the Perl syntax (see also regexpr()).

multi_strings <- c("Giraf", "Cow", "Frog", "Panda")
grepl(pattern = "^F", x = multi_strings, perl = TRUE)
## [1] FALSE FALSE  TRUE FALSE

grep() returns the index of the matching strings, if any. sub() find a pattern and replace it. If it can't replace anything in a string, the full string is itself returned.


Creative Commons License
This work by Celine Hernandez is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.