Securing Credentials

As with every programming language, it is important to avoid publishing code with your credentials in plain text. There are several options to protect your credentials in R. In order of preference, here are the methods that we will cover:

  • Integrated security with DSN

  • Integrated security without DSN

  • Encrypt credentials with the keyring package

  • Use a configuration file with the config package

  • Environment variables using the .Renviron file

  • Using the options base R command

  • Prompt for credentials using the RStudio IDE.

Integrated security with DSN

The very best solution consists of two pieces that are usually out of the hands of the analyst:

  1. There is integrated security between the terminal and the database, usually via Kerberos.

  2. An ODBC connection has been pre-configured in the Desktop or Server by someone with sufficient access rights. The ODBC connection will have a unique Data Source Name, or DSN.

con <- DBI::dbConnect(odbc::odbc(), "My DSN Name")

This is considered the best option, because no information about the database, other than an internal alias, is available in plan text code.

Integrated security without DSN

In cases where the analyst is not able to setup a DSN, or a quick prototype is needed, it is possible to pass the necessary connection settings as arguments to the DBI::dbConnect command. With integrated security, there are three arguments that are typically required: the driver name, the network path to the server (or cluster), and the database name. Other arguments may be required depending on the type of database; for instance, in the example below, the port number is required for a connection with an Apache Impala database.

con <- DBI::dbConnect(odbc::odbc(),
                   Driver = "impala", 
                   Host = "hadoop-cluster",
                   Schema = "default",
                   Port   = 21050)

Encrypt credentials with keyring

The keyring package uses the operating system’s credential storage. It currently supports:

  • Keychain on MacOS
  • Credential Store on Windows
  • the Secret Service API on Linux

You can now install from CRAN

install.packages("keyring")

or the dev version from github

devtools::install_github("r-lib/keyring")

Keyrings and keys

Inside the credential store, there are keyrings that contain multiple keys. A keyring is secured with a single password, which then grants access to to all of its keys. In our case, we will store a single database connection credential per key. We will use the default keyring, which is automatically unlocked when a user signs in.

Our first key

A key has four main attributes:

  • service - The key’s unique identifier; we will use the database server’s path for our keys.
  • keyring - The key’s ‘parent’ keyring. If not specified, the default keyring is used.
  • username
  • password

The way keyring retrieves data is by passing the keyring name and a service name. If no keyring is passed, the default keyring is used. The keyring::key_set() function is used to create the key; a prompt will appear asking for the password that should be used for the key:

keyring::key_set(service = "my-database", 
                 username = "myusername")

Retrieve credentials

The key_list() command is used to retrieve the username:

keyring::key_list("my-database")

To extract the only the username, so as to pass it inside the connection string, use:

keyring::key_list("my-database")[1,2]

To extract the password, use key_get():

keyring::key_get("my-database")

These functions can be used to supply the database credentials without storing them in plain text or an environment variable:

con <- dbConnect(odbc::odbc(), 
  Driver   = "SQLServer",
  Server   = "my-database",
  Port     = 1433,
  Database = "default",
  UID      = keyring::key_list("my-database")[1,2],
  PWD      = keyring::key_get("my-database"))

The default keyring is unlocked anytime the user is signed in. If a new keyring is created and used, the Operating System will prompt the user for the keyring password when the code executes.

Stored in a file with config

The config package is meant to make it easier to deploy content, but we also use it to keep the credentials outside of the R script by saving them in the config.yml file. In fact, all of the connection arguments can be saved in the yml file.

  install.packages("config")

Here is an example yml file:

default:
  datawarehouse:
    driver: 'Postgres' 
    server: 'mydb-test.company.com'
    uid: 'local-account'
    pwd: 'my-password'  
    port: 5432
    database: 'regional-sales'
 

This is how the connection arguments would be called inside the R script:


dw <- config::get("datawarehouse")

con <- DBI::dbConnect(odbc::odbc(),
   Driver = dw$driver,
   Server = dw$server,
   UID    = dw$uid,
   PWD    = dw$pwd,
   Port   = dw$port,
   Database = dw$database
)

Use Environment variables

The .Renviron file can be used to store the credentials, which can then be retrieved with Sys.getenv(). Here are the steps:

  1. Create a new file defining the credentials:
    uid = "username"
    pwd = "password"
  1. Save it in your home directory with the file name .Renviron. If you are asked whether you want to save a file whose name begins with a dot, say YES.

  2. Note that by default, dot files are usually hidden. However, within RStudio, the file browser will make .Renviron visible and therefore easy to edit in the future.

  3. Restart R. .Renviron is processed only at the start of an R session.

  4. Retrieve the credentials using Sys.getenv() while opening the connection:

  con <- DBI::dbConnect(odbc::odbc(),
    Driver = "impala", 
    Host   = "database.rstudio.com",
    UID    = Sys.getenv("uid"),
    PWD    = Sys.getenv("pwd")
  )

Using options()

You can record the user name and password as a global option in R. Use the options() command to set a custom option, and then use getOption() to retrieve it.

The following example code sets credentials. When trying this out, be sure to remove these lines from your published work:

  options(database_userid = "myuserid")
  options(database_password = "mypassword")

This is how the credentials can be called within the published work:

  con <- DBI::dbConnect(odbc::odbc(),
    Driver = "impala", 
    Host   = "database.rstudio.com",
    UID    = getOption("database_userid"),
    PWD    = getOption("database_password")
  )

Prompt for Credentials

The RStudio IDE’s API can be used to prompt the user to enter the credentials in a popup box that masks what is typed:



con <- DBI::dbConnect(odbc::odbc(),
  Driver = "impala", 
  Host   = "database.rstudio.com",
  UID    = rstudioapi::askForPassword("Database user"),
  PWD    = rstudioapi::askForPassword("Database password")
)