Databases using R

At RStudio, we are working to make it as easy as possible to work with databases in R. This work focuses on three key areas:

  • Developing best-in-class packages These packages include: dplyr, odbc, keyring, pool and others.

  • Integrating with RStudio Starting with version 1.1, RStudio helps you write the R code you need to connect to a database, remembers connections you’ve made, and allows you to explore the objects and data in your database.

  • Promoting best practices This website is the main way we provide support in this area. RStudio is also working through other delivery channels, like upcoming webinars and in-person training at rstudio::conf 2018

Explore databases inside RStudio v1.1

The new RStudio Connections Pane makes it possible to easily connect to a variety of data sources, and explore the objects and data inside the connection.

The Connections Pane is currently available only in the preview release of RStudio 1.1. Be sure to use the latest version of the odbc package.

Read more about this feature here: RStudio Connections Pane

dplyr as a database interface

The dplyr package simplifies data transformation. It provides a consistent set of functions, called verbs, that can be used in succession and interchangeably to gain understanding of the data iteratively.

dplyr is able to interact with databases directly by translating the dplyr verbs into SQL queries. This convenient feature allows you to ‘speak’ directly with the database from R. Other advantages of this approach are:

  1. Run data exploration routines over all of the data, instead of importing part of the data into R.

  2. Use the SQL Engine to run the data transformations. In effect, computation is being pushed to the database.

  3. Collect into R only a targeted dataset.

  4. All of your code is in R. Becausedplyr is used to communicate with the database, there is no need to alternate between languages or tools to perform the data exploration.

Connect to a database

At the center of this approach is the DBI package. This package acts as ‘middle-ware’ between packages to allow connectivity with the database from the user or other packages. It provides a consistent set of functions regardless of the database type being accessed. The dplyr package depends on the DBI package for communication with databases.

There are packages that enables a direct connection between the an open-source database and R. Currently, such packages exist for the following databases: MySQL, SQLite, PostgreSQL, and bigquery.

Most commercial databases, like Oracle and Microsoft SQL Server, offer ODBC drivers that allow you to connect your tool to the database. Even though there are R packages that allow you to use ODBC drivers, the connection will most likely not be compatible with DBI. The new odbc package solves that problem by providing a DBI backend to any ODBC driver connection.

If you are interested in creating your own package that connects DBI to a database, please review the article DBI Backend.

SQL Translations for dplyr

A complementary package called dbplyr contains the translations of the vendor-specific SQL for dplyr to use. As of today, we have translations for the following databases:

In the development version of dbplyr, we also have translations for:

Is the database you are interested in not listed here? You can still use DBI and odbc to connect and send SQL queries. If you would like to contribute a translation, please see the SQL Translation page in this website.