As with every programming language, it is important to avoid publishing code with your credentials in plain text. There are several options to protect your credentials in R. In order of preference, here are the methods that we will cover:
Integrated security with DSN
Integrated security without DSN
Encrypt credentials with the
Use a configuration file with the
Environment variables using the
optionsbase R command
Prompt for credentials using the RStudio IDE.
Integrated security with DSN
The very best solution consists of two pieces that are usually out of the hands of the analyst:
There is integrated security between the terminal and the database, usually via Kerberos.
An ODBC connection has been pre-configured in the Desktop or Server by someone with sufficient access rights. The ODBC connection will have a unique Data Source Name, or DSN.
con <- DBI::dbConnect(odbc::odbc(), "My DSN Name")
This is considered the best option, because no information about the database, other than an internal alias, is available in plan text code.
Integrated security without DSN
In cases where the analyst is not able to setup a DSN, or a quick prototype is needed, it is possible to pass the necessary connection settings as arguments to the
DBI::dbConnect command. With integrated security, there are three arguments that are typically required: the driver name, the network path to the server (or cluster), and the database name. Other arguments may be required depending on the type of database; for instance, in the example below, the port number is required for a connection with an Apache Impala database.
con <- DBI::dbConnect(odbc::odbc(), Driver = "impala", Host = "hadoop-cluster", Schema = "default", Port = 21050)
Encrypt credentials with
The keyring package uses the operating system’s credential storage. It currently supports:
- Keychain on MacOS
- Credential Store on Windows
- the Secret Service API on Linux
Keyrings and keys
Inside the credential store, there are
keyrings that contain multiple
keyring is secured with a single password, which then grants access to to all of its
keys. In our case, we will store a single database connection credential per
key. We will use the default keyring, which is automatically unlocked when a user signs in.
Our first key
key has four main attributes:
service- The key’s unique identifier; we will use the database server’s path for our keys.
keyring- The key’s ‘parent’ keyring. If not specified, the default keyring is used.
keyring retrieves data is by passing the
keyring name and a
service name. If no
keyring is passed, the default keyring is used. The
keyring::key_set() function is used to create the key; a prompt will appear asking for the password that should be used for the key:
keyring::key_set(service = "my-database", username = "myusername")
## WARNING: your platform is not supported. Input is not masked!
key_list() command is used to retrieve the
## service username ## 1 my-database myusername
To extract the only the
username, so as to pass it inside the connection string, use:
##  "myusername"
To extract the
##  ""
These functions can be used to supply the database credentials without storing them in plain text or an environment variable:
con <- dbConnect(odbc::odbc(), Driver = "SQLServer", Server = "my-database", Port = 1433, Database = "default", UID = keyring::key_list("my-database")[1,2], PWD = keyring::key_get("my-database"))
The default keyring is unlocked anytime the user is signed in. If a new keyring is created and used, the Operating System will prompt the user for the
keyring password when the code executes.
Stored in a file with
config package is meant to make it easier to deploy content, but we also use it to keep the credentials outside of the R script by saving them in the
config.yml file. In fact, all of the connection arguments can be saved in the
Here is an example
default: datawarehouse: driver: 'Postgres' server: 'mydb-test.company.com' uid: 'local-account' pwd: 'my-password' port: 5432 database: 'regional-sales'
This is how the connection arguments would be called inside the R script:
dw <- config::get("datawarehouse") con <- DBI::dbConnect(odbc::odbc(), Driver = dw$driver, Server = dw$server, UID = dw$uid, PWD = dw$pwd, Port = dw$port, Database = dw$database )
Use Environment variables
.Renviron file can be used to store the credentials, which can then be retrieved with
Sys.getenv(). Here are the steps:
- Create a new file defining the credentials:
uid = "username" pwd = "password"
Save it in your home directory with the file name
.Renviron. If you are asked whether you want to save a file whose name begins with a dot, say YES.
Note that by default, dot files are usually hidden. However, within RStudio, the file browser will make .Renviron visible and therefore easy to edit in the future.
Restart R. .Renviron is processed only at the start of an R session.
Retrieve the credentials using
Sys.getenv()while opening the connection:
con <- DBI::dbConnect(odbc::odbc(), Driver = "impala", Host = "database.rstudio.com", UID = Sys.getenv("uid"), PWD = Sys.getenv("pwd") )
You can record the user name and password as a global option in R. Use the
options() command to set a custom option, and then use
getOption() to retrieve it.
The following example code sets credentials. When trying this out, be sure to remove these lines from your published work:
options(database_userid = "myuserid") options(database_password = "mypassword")
This is how the credentials can be called within the published work:
con <- DBI::dbConnect(odbc::odbc(), Driver = "impala", Host = "database.rstudio.com", UID = getOption("database_userid"), PWD = getOption("database_password") )
Prompt for Credentials
The RStudio IDE’s API can be used to prompt the user to enter the credentials in a popup box that masks what is typed:
con <- DBI::dbConnect(odbc::odbc(), Driver = "impala", Host = "database.rstudio.com", UID = rstudioapi::askForPassword("Database user"), PWD = rstudioapi::askForPassword("Database password") )