Best practices

Creating Visualizations Typically, a function that produces a plot in R performs the data crunching and the graphical rendering. For example, geom_histogram() calculates the bin sizes and the count per bin, and then it renders the plot. Plotting functions usually require that 100% of the data be passed to them. This is a problem when working with a database. The best approach is to move the data transformation to the database, and then use a graphing function to render the results.
Enterprise-ready dashboards Original published in RViews on 9/20/2017 Inside the enterprise, a dashboard is expected to have up-to-the-minute information, to have a fast response time despite the large amount of data that supports it, and to be available on any device. An end user may expect that clicking on a bar or column inside a plot will result in either a more detailed report, or a list of the actual records that make up that number.
Making scripts portable When we share R scripts, RNotebooks or other kind of content with others, and want the code to also execute in their environment, then we need to take steps to make the code portable. There are two primary considerations when sharing content that contain database connections: Ensuring the connection details for the development and deployment environments are in sync Managing database credentials There are many different ways to meet these two requirements.
Run Queries Safely We will review four options to run SQL commands safely using the DBI package: Parameterised queries Using glue_sql Interpolation by “hand” Manual escaping SQL Injection Attack The dbGetQuery() command allows us to write queries and retrieve the results. The query has to be written using the SQL syntax that matches to the database type. For example, here is a database that contains the airports data from NYC Flights data:
Schema selection Overview It is common for enterprise databases to use multiple schemata to partition the data, it is either separated by business domain or some other context. This is especially true for Data warehouses. It is rare when the default schema is going to have all of the data needed for an analysis. For analyses using dplyr, the in_schema() function should cover most of the cases when the non-default schema needs to be accessed.
Securing Credentials As with every programming language, it is important to avoid publishing code with your credentials in plain text. There are several options to protect your credentials in R. In order of preference, here are the methods that we will cover: Integrated security with DSN Integrated security without DSN Encrypt credentials with the keyring package Use a configuration file with the config package Environment variables using the .
Securing Deployed Content There are multiple ways to secure access to a database in deployed content. The types of content that we can deploy are: Shiny Apps flexdashboards RMarkdown documents The most common methods are: Service Account It is typical for shiny applications and R Markdown reports to provide insight from data that is not directly accessible by the content audience. In these 1-to-many cases, it is common to define service accounts that access the database on behalf of the content audience.
Selecting a database interface DBI support is key Connecting to a database with a method that supports the DBI package provides the following advantages: A consistent set of functions that work across all connections Makes using dplyr as a front-end possible Mostly based on how difficult is to setup, the order of preference of connection methods is: Native database driver implemented in database package (e.g. RPostgresSQL). To see the list of known database packages check out the Databases page.
Setting up ODBC Drivers For Unix and MacOS, ODBC drivers should be compiled against unixODBC. Drivers compiled against iODBC may also work, but are not fully supported. After installation of the driver manager and driver, you will have to register the driver in a odbcinst.ini file for it to appear in odbc::odbcListDrivers(). Microsoft Windows Database Drivers Windows is bundled with ODBC libraries; however, drivers for each database need to be installed separately. Windows ODBC drivers typically include an installer that must be run to install the drivers in the proper locations.