Encrypting (pass)words in R

Objective

After reading the book “History of Codes” by Simon Singh, I was inspired to apply some of the things that I have learned in R. While the book describes the history of codes in terms of encrpyting and decrypting messages (e.g., for secret communication), I will focus on managing passwords. Please note that the following is written for fun and should not be interpreted as an attempt to develop a flawless password manager. It is rather my first try of writing an encryption function. I hope it is useful for generating passwords for conferences, journals, or shopping webpages.

Background

Encryption can be seen as substituting letters. As an example, each of the 26 letters of the alphabet could be shifted by one letter. This would result in “A -> B”, “B -> C”, “C -> D” and so forth. Using this substitution algorithm, the encryption of my name “Markus” results in “Nbslvt”. In order to decipher the word “Nbslvt” back to “Markus” one would need the key, i.e., a table indicating how the letters have been substituted. Obviously, the “shifted-by-one-letter” cipher can be figured out quite easily by most amateur code breakers and is not very safe. Throughout history scientists came up with much more sophisticated cipher codes and cipher machines in order to keep communication private. Especially, during the first and second World War secret communication became crucial for the military. With the rapid expansion of the internet, encrypting messages was no longer only a matter for governments and the military but also relevant for ordinary individuals. In my opinion, one of the most fascinating encryption technique is the Rivest–Shamir–Adleman (RSA) cryptosystem. This breakthrough idea uses one-way functions (functions that cannot easily be inverted) and the practical difficulty of factoring two prime numbers. The algorithm is without a doubt as genius and geeky as it gets. This video gives a very vivid explanation: Eddie Woo YouTube.

My implementation in R

My idea is much more simple and relies on combining the substitution of letters with randomly selected numbers. In other words, I do not use an algorithm such as shifting the alphabet by \(x\) characters, as I have described above, but let the sample() function in R determine the substitution pattern. More specifically, each letter in the encrypted word will be based on its own sample() call. The word “Hallo”, for example, contains the letter “l” twice (third and forth element of the string). The sample() function might lead to a substitution of “l -> k” for the first “l” and the second “l” might be substituted with b (l -> b). This makes is much harder to break the encryption code because of the extremely high number of substitution possibilities. The key for the encryption function is the given by the set.seed() argument. The set.seed() function suspends the randomness in the sample() call and therefore allows transforming the letters back to their original order.

The choice of the alphabet

My alphabet includes all 26 letters (in lower- and upper-case spelling), the numbers \(0\) to \(9\), and a selection of special characters (!§$%&/()=?@*#<>_-.;+{}[]). The code can be easily modified in order to incorporate more special characters, i.e., Germans might want a “ö”, while Danes require “ø”, and the French prefer “ô”.

CipherWord <- function(my.password, my.key) {

    set.seed(my.key)
    the.length <- length(unlist(strsplit(my.password, "")))
    the.vec <- unlist(strsplit(my.password, ""))
    specials <- c(unlist(strsplit("!§$%&/()=?@*#<>_-.;+{}[]", "")))
    all.characters <- c(letters, LETTERS, 0:9, specials)

    if (!all(the.vec %in% all.characters)) {
    stop("unkown character(s) in password")
    }

    cipher.list <- NULL
    for (i in 1:the.length) {
        cipher.frame <- data.frame(alphabet = all.characters)
        cipher.frame$alphabet <- levels(droplevels(cipher.frame$alphabet))
        cipher.frame$cipher.num <- sample(1:length(cipher.frame$alphabet), length(cipher.frame$alphabet))
        cipher.frame$cipherbet <- cipher.frame[order(cipher.frame$cipher.num), ]$alphabet
        cipher.list[[i]] <- cipher.frame
    }

    encrypted.password <- c()
    for (j in 1:the.length) {
        encrypted.letter <- unlist(strsplit(my.password, ""))[j]
        encrypted.password[j] <- cipher.list[[j]]$cipherbet[match(encrypted.letter, cipher.list[[j]]$alphabet)]
    }

    count <- any(encrypted.password %in% specials) +
        any(encrypted.password %in% letters) +
        any(encrypted.password %in% LETTERS)

    if(count!=3) {
        warning("Encryption does not include lower, upper, and special characters")
        }

    out <- rbind(cbind("Original","Encrypted"),cbind(my.password,
             paste(encrypted.password,collapse="")))

    return(out)
}

# basically copy and paste
DecipherWord <- function(my.encrypted.word, my.key) {

    set.seed(my.key)
    the.length <- length(unlist(strsplit(my.encrypted.word, "")))
    the.vec <- unlist(strsplit(my.encrypted.word, ""))
    specials <- c(unlist(strsplit("!§$%&/()=?@*#<>_-.;+{}[]", "")))
    all.characters <- c(letters, LETTERS, 0:9, specials)

    if (!all(the.vec %in% all.characters)) {
    stop("unkown character(s) in the encrypted word")
    }

    cipher.list <- NULL
    for (i in 1:the.length) {
        cipher.frame <- data.frame(alphabet = all.characters)
        cipher.frame$alphabet <- levels(droplevels(cipher.frame$alphabet))
        cipher.frame$cipher.num <- sample(1:length(cipher.frame$alphabet), length(cipher.frame$alphabet))
        cipher.frame$cipherbet <- cipher.frame[order(cipher.frame$cipher.num), ]$alphabet
        cipher.list[[i]] <- cipher.frame
    }

    decrypted.password <- c()
    for (j in 1:the.length) {
        decrypted.letter <- unlist(strsplit(my.encrypted.word, ""))[j]
        decrypted.password[j] <- cipher.list[[j]]$alphabet[match(decrypted.letter, cipher.list[[j]]$cipherbet)]
    }

    out <- rbind(cbind("Encrypted","Decrypted"),cbind(my.encrypted.word,
             paste(decrypted.password,collapse="")))

    return(out)
}

How does it work?

Imagine you are accepted at a conference and the upload of the paper requires setting up an account with your Email address and a password. You would love to choose an easy word such as “EasyPassword” but you are aware that this is neither secure nor accepted by most webpages. With the functions that I have provided above, it is possible to transform “EasyPassword” to “Q/VX@ezlQptv”. The CipherWord() function requires the word that you would like to encrypt and a key number (for set.seed). The key number becomes the user’s personal encryption key. One option to use CipherWord() is remembering one personal key and choosing easy passwords such as “PasswordPAA2021” for PAA in 2021 or “PasswordEPC2022” for EPC in 2022. Alternatively, the user can keep the password constant but change the encryption key. The encryption key might be based on a fancy function, refer to the date of a given conference, or the session number of the presentation. In the following example, I encrypt “EasyPassword” with the key “123” to the much safer password “Q/VX@ezlQptv”. The DecipherWord() function reverses the encryption as long as the user knows the right key (“123” in the example). Accordingly, the function can also be used to inform close colleagues (with access to the encryption key) about the secret lunch date location ^_^

CipherWord("EasyPassword", 123)
##      my.password                  
## [1,] "Original"     "Encrypted"   
## [2,] "EasyPassword" "Q/VX@ezlQptv"
DecipherWord("Q/VX@ezlQptv", 123)
##      my.encrypted.word               
## [1,] "Encrypted"       "Decrypted"   
## [2,] "Q/VX@ezlQptv"    "EasyPassword"

The challenge of password requirements

I encounter mostly webpages with the requirement of selecting a password that includes lower- and upper case letters as well as at least one special character. At the moment, the function CipherWord() gives you a warning in case this requirements are not met. It is also possible to modify the function in a way that it automatically forces this combination of letters whenever necessary. However, the empirical analysis shown below indicates that this issue is relatively rare. Especially, if the input word is not too short (choosing a relative long password is actually always a good idea). For this reason, I do not take any specific password requirements into account for the moment.

specials <- c(unlist(strsplit("!§$%&/()=?@*#<>_-.;+{}[]", "")))
yes <- 0
no <- 0

for (i in 1:100) {
    the.word <- CipherWord("EasyPassword", i)[2,2]
    the.word <- unlist(strsplit(the.word, ""))

    count <- any(the.word %in% specials) +
        any(the.word %in% letters) +
        any(the.word %in% LETTERS)

    if (count==3) {
        yes <- yes+1
    }else{
        no <- no+1
    }
}
## Warning in CipherWord("EasyPassword", i): Encryption does not include lower,
## upper, and special characters

## Warning in CipherWord("EasyPassword", i): Encryption does not include lower,
## upper, and special characters

## Warning in CipherWord("EasyPassword", i): Encryption does not include lower,
## upper, and special characters

## Warning in CipherWord("EasyPassword", i): Encryption does not include lower,
## upper, and special characters
bars <- barplot(cbind(yes,no), col="lightgray",
                main="Include lower-, upper-, and specical characters?",
                cex.main=1.5
                )
text(bars, 40, paste(cbind(yes,no), "times", " "), cex=1.5)

Now, the same procedure but with a longer input word (LongerPasswordsAreBetter).

yes <- 0
no <- 0

for (i in 1:100) {
    the.word <- CipherWord("LongerPasswordsAreBetter", i)[2,2]
    the.word <- unlist(strsplit(the.word, ""))

    count <- any(the.word %in% specials) +
        any(the.word %in% letters) +
        any(the.word %in% LETTERS)

    if (count==3) {
        yes <- yes+1
    }else{
        no <- no+1
    }
}

bars <- barplot(cbind(yes,no), col="lightgray",
                main="Include lower-, upper-, and specical characters?",
                cex.main=1.5
                )
text(bars, 40, paste(cbind(yes,no), "times", " "), cex=1.5)

References

  • Singh, Simon (2002). The Code Book: The Secret History of Codes and Code-Breaking. Harper Collins Publishers UK: (Reissue) Edition.
Markus Sauerberg
Markus Sauerberg
Research Scientist

Related