Building an Interactive Data Exploration App with R Shiny

Introduction

In this tutorial, we will walk through the creation of an interactive data exploration application using R Shiny. This app allows users to filter data, view various charts, and download them for further analysis.

Prerequisites

  • Basic understanding of R programming
  • R and RStudio installed
  • Shiny, ggplot2, and DT packages installed

App Overview

Our R Shiny app includes:

  • A filterable table
  • Interactive charts including bar plots, scatter plots, and line plots
  • Data download functionality

Getting Started

First, ensure you have the required libraries:

library(shiny)
library(DT)
library(ggplot2)

Data Preparation

Load and preprocess your data. In our case, we are reading from a CSV file and creating bins for age and income:

dataset = read.csv("dataset.csv")
# Create bins for age and income
dataset$AGE_Bin = cut(dataset$AGE,5,include.lowest = TRUE)
dataset$INCOME_Bin = cut(dataset$INCOME,5,include.lowest = TRUE,dig.lab = 6)

The code contains the UI and Server in two parts. I will layout the complete code of each part here, and later in the article, I will delve into the very intuitive UI design in Shiny.

Building the UI

The user interface (UI) is designed with fluidPage for a responsive layout.

ui <-   fluidPage(
    
    h1("Rshiny Homework"),
    h2("Demographic Exploartion"),
    h3("Filterable Table"),
    DT::dataTableOutput("table"),
    br(),
    h3("Charts"),
    selectInput(
        "option",
        "Demography",
        c("AGE_Bin","INCOME_Bin","GENDER"),
        selected = NULL,
        multiple = FALSE,
        selectize = TRUE,
        width = NULL,
        size = NULL
    ),
    
    actionButton("gobutton", "View Chart", class = "btn-success"),
    plotOutput("disPlot"),
    downloadButton(outputId = "disPlot_download", label = "Download Chart",class = "btn-success"),
    
    br(),
    hr(),
    br(),
    h3("Relationship Between Variables"),
    
    tabsetPanel(
        tabPanel("Scatter", 
                 plotOutput("Scatter", brush="selected_range"),
                 br(),
                 downloadButton(outputId = "scatter_download", label = "Download Chart",class = "btn-success"),
                 br(),
                 br(),
                 DT::dataTableOutput("brushed_table")
        ),
        tabPanel("Distribution", 
                 plotOutput("displot2"),
                 downloadButton(outputId = "displot2_download", label = "Download Chart",class = "btn-success"),
                 br(),
                 plotOutput("displot3"),
                 downloadButton(outputId = "displot3_download", label = "Download Chart",class = "btn-success")
                 
        )
    ),
    
    br(),
    hr(),
    br(),
    h3("Line Plot"),
    plotOutput("lineplot"),
    downloadButton(outputId = "lineplot_download", label = "Download Chart",class = "btn-success"),
    br(),
    plotOutput("lineplot2"),
    downloadButton(outputId = "lineplot2_download", label = "Download Chart",class = "btn-success")
)

Server Logic

The server function contains the logic for rendering plots and tables based on user input. As you may find, all backend data handling and visual design goes in here.

server <- function(input,output, session) {
    
    library(ggplot2)
    library(shiny)
    library(DT)
    # library(stringr)
    
    #setwd("C:/Users/kli4/Downloads/Shiny_HW")
    
    dataset = read.csv("dataset.csv")
    dataset$AGE_Bin = cut(dataset$AGE,5,include.lowest = TRUE)
    dataset$INCOME_Bin = cut(dataset$INCOME,5,include.lowest = TRUE,dig.lab = 6)
    # dataset$INCOME_Bin <- lapply(strsplit(gsub("]|[[(]", "", levels(dataset$INCOME_Bin)), ","),
    #           prettyNum, big.mark=".", decimal.mark=",", input.d.mark=".", preserve.width="individual")
    
    
    plot_var <- eventReactive(input$gobutton,{
        
        selection <- input$option
        
        data_agg <-aggregate(x=dataset$Customer, by=list(SELECTION=dataset[,c(selection)],TREATMENT = dataset[,"TREATMENT"]),length)
        names(data_agg) = c("SELECTION","TREATMENT", "Customer")
        
        return(data_agg)
        
    })
    
    
    output$disPlot <- renderPlot({
        displot = ggplot(plot_var(), aes(x=SELECTION,y=Customer,fill=TREATMENT)) + geom_bar(position="stack",stat="identity")
        
        output$disPlot_download <- downloadHandler(
            filename = function() { paste(input$option, '.jpg', sep='') },
            content = function(file){
                ggsave(file,plot=displot)
            })
        displot
    })
    

    output$table <- DT::renderDataTable(datatable(dataset))
 
    scatter_plot <- ggplot(dataset, aes(x=AGE,y=INCOME)) + geom_point()
    
    scatter_plot = scatter_plot + facet_grid(GENDER ~ TREATMENT)
    
    output$Scatter <- renderPlot({
        scatter_plot
    })
    
    scatter_brushed <- reactive({
        
        my_brush <- input$selected_range
        sel_range <- brushedPoints(dataset, my_brush)
        return(sel_range)
        
    })
    output$brushed_table <- DT::renderDataTable(DT::datatable(scatter_brushed()))
    
    
    
    displot2 <- ggplot(dataset, aes(online.Activity.A)) + geom_histogram(aes(fill=AGE_Bin), bins = 5)
    
    displot2 = displot2 + facet_grid(GENDER ~ TREATMENT)
    
    displot3 <- ggplot(dataset, aes(online.ACTIVITY.B)) + geom_histogram(aes(fill=AGE_Bin), bins = 5)
    
    displot3 = displot3 + facet_grid(GENDER ~ TREATMENT)
    
    output$displot2 <- renderPlot({
        displot2
    })
    
    output$displot3 <- renderPlot({
        displot3
    })
    # 
    # scatter_brushed2 <- reactive({
    #   
    #   my_brush <- input$selected_range2
    #   sel_range <- brushedPoints(dataset, my_brush)
    #   return(sel_range)
    #   
    # })
    # output$brushed_table2 <- DT::renderDataTable(DT::datatable(scatter_brushed2()))
    
    data_agg2 <-aggregate(list(Activity_A=dataset$online.Activity.A), by=list(DAY=dataset$DAY,TREATMENT=dataset$TREATMENT,GENDER=dataset$GENDER),mean)
    
    lineplot <- ggplot(data_agg2, aes(x=DAY, y=Activity_A, group=c(TREATMENT))) + geom_line(aes(color=TREATMENT)) + geom_point()
    lineplot = lineplot + facet_grid(GENDER ~ TREATMENT)
    
    output$lineplot <- renderPlot({
        lineplot
    })
    
    data_agg2 <-aggregate(list(Activity_B=dataset$online.ACTIVITY.B), by=list(DAY=dataset$DAY,TREATMENT=dataset$TREATMENT, GENDER=dataset$GENDER),mean)
    
    lineplot2 <- ggplot(data_agg2, aes(x=DAY, y=Activity_B, group=c(TREATMENT))) + geom_line(aes(color=TREATMENT)) + geom_point()
    lineplot2 = lineplot2 + facet_grid(GENDER ~ TREATMENT)
    
    output$lineplot2 <- renderPlot({
        lineplot2
    })
    
    #Downloads
    
    output$lineplot2_download <- downloadHandler(
        filename = "Activity_B Line.jpg",
        content = function(file){
            ggsave(file,plot=lineplot2)
        })
    
    output$lineplot_download <- downloadHandler(
        filename = "Activity_A Line.jpg",
        content = function(file){
            ggsave(file,plot=lineplot)
        })
    
    output$displot2_download <- downloadHandler(
        filename = "ActivityA_Dist.jpg",
        content = function(file){
            ggsave(file,plot=displot2)
        })
    output$displot3_download <- downloadHandler(
        filename = "ActivityB_Dist.jpg",
        content = function(file){
            ggsave(file,plot=displot3)
        })
    
    output$scatter_download <- downloadHandler(
        filename = "Age_Income.jpg",
        content = function(file){
            ggsave(file,plot=scatter_plot)
        })
    

}

UI Design in R Shiny

UI design in R Shiny is easy and intuitive. It’s an HTML element as a function concept. Let’s dive into how UI is designed in our R Shiny app, using the provided code as an example.

Basic Structure

R Shiny UI is structured using functions defining the layout and its elements. The fluidPage() function is often used for its responsive layout capabilities, meaning the app’s interface adjusts nicely to different screen sizes.

ui <- fluidPage(
    # UI components are nested here
)

Organizing Content with Headers and Separators

Headers (h1, h2, h3, etc.) and separators (hr()) are used to organize content and improve readability. In our app, headers indicate different sections:

h1("Rshiny Homework"),
h2("Demographic Exploration"),
h3("Filterable Table"),

Data Display

The DT::dataTableOutput() function is used to render data tables in the UI. This function takes an output ID as an argument, linking it to the server logic that provides the data:

DT::dataTableOutput("table"),

Interactive Inputs

Interactive inputs, such as selectInput, allowing users to interact with the app and control what data or plot is displayed. In our app, selectInput is used for choosing demographic aspects to display in a chart:

selectInput(
    "option",
    "Demography",
    c("AGE_Bin", "INCOME_Bin", "GENDER"),
    selected = NULL,
    multiple = FALSE,
    selectize = TRUE,
    width = NULL,
    size = NULL
),

Action Buttons

Action buttons, created with actionButton(), trigger reactive events in the server. Our app uses an action button to generate plots based on user selection:

actionButton("gobutton", "View Chart", class = "btn-success"),

Displaying Plots

To display plots, plotOutput() is used. This function references an output ID from the server side where the plot is rendered:

plotOutput("disPlot"),

Interactive Plots

I use ggplot2 for creating interactive plots. For example, a scatter plot is generated based on user-selected variables:

scatter_plot <- ggplot(dataset, aes(x=AGE,y=INCOME)) + geom_point()

Tabbed Panels

Tabbed panels, created with tabsetPanel(), help in organizing content into separate views within the same space. Each tabPanel holds different content:

tabsetPanel(
    tabPanel("Scatter", ...),
    tabPanel("Distribution", ...)
),

Download Handlers

We provide functionality for users to download plots as JPEG files:

output$scatter_download <- downloadHandler(
    filename = "Age_Income.jpg",
    content = function(file){
        ggsave(file,plot=scatter_plot)
    })

downloadButton(outputId = "scatter_download", label = "Download Chart", class = "btn-success"),

Running the App

Finally, to run the app, use:

shinyApp(ui = ui, server = server)

Airflow Systemd Config File

Need to pay attention to environment path, user.

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
#EnvironmentFile=/etc/sysconfig/airflow
Environment=PATH=$PATH:/home/ken/miniconda3/bin/
User=ken
Group=ken
Type=simple
ExecStart=/home/ken/miniconda3/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

How to Correctly Install CuDF for Python

Introduction of CuDF

CuDF is a powerful tool in the era of big data. It utilizes GPU computing framework Cuda to speed up data ETL and offers a Pandas-like interface. The tool is developed by the team, RapidAI

You can check out their Git repo here.

I love the tool. It gives me a way to make full use of my expensive graphic card, which most of the time only used for gaming. Most importantly, for a company like Owler, which has to handle 14 millions+ company profiles, even a basic data transformation task might take days. This tool is possible to help speed up the process by about 80+ times. GPU computing has been a norm for ML/AL. CuDF makes it also good for the upper stream of the flow, the ETL. And ETL is in high demand for almost every company with digital capacity in the world.

The Challenges

It’s nice to have this tool for our day-to-day data work. However, the convenience comes at a cost. That is, the installation of CuDF is quite confusing and hard to follow. It also has some limitations in OS and Python versions. Currently, it only works with Linux and Python 3.7+. And it only provides a condo-forge way to install; otherwise, you need to build from the source.

The errors range from solving environment fail, dependencies conflict, inability to find GPU, and such. I have been installing CuDF into a couple of servers, including personal desktop, AWS, and so on. Each time, I have to spend hours dealing with multiple kinds of errors and try them again and again. When it finally works, I don’t know which one is the critical step because there were so many variables. Most ugly, when you have dependency conflict error, you have to wait for a very long time after 4 solving environment attempts until it displays the conflicting package for you.

But the good news is, from the most recent installation, I can finally understand the cause for the complication and summarize an easy to follow guide for anyone who wants to enjoy this tool.

In short, the key is, use miniconda or create a new environment in anaconda to install.

Let me walk through the steps.

Installing the Nvidia Cuda framework (Ubuntu)

Installing Cuda is simple when you have the right machine. You can follow the guide here from Nvidia official webpage. If you encounter an installation error, please check if you are selecting the right architecture and meet the hardware/driver requirements.

However, if you have an older version of Cuda installed and wish to upgrade that. The Nvidia guide won’t help you anyway. The correct way is to uninstall the older version Cuda first before doing anything from the guide. The reason is that, at least in Ubuntu, the installation step will change your apt-get source library; once you do that, you will no longer be able to uninstall the older version, and it may cause conflict.

To uninstall Cuda, you can try the following steps. (for Ubuntu)

  • Remove nvidia-cuda-toolkit
sudo apt-get remove nvidia-cuda-toolkit

You may want to use this command instead to remove all the dependencies as well.

sudo apt-get remove --auto-remove nvidia-cuda-toolkit
  • Remove Cuda
sudo apt-get remove cuda

If you forgot to remove the older version cuda before installing the new version, you would need to remove all dependencies for cuda and start over the new version installation.

sudo apt-get remove --auto-remove cuda
  • Install cuda by following the Nvidia official guide. Link above.

Install CuDF

I highly recommend installing CuDF in miniconda. This will avoid most of the package dependency conflicts. If you have dependency conflicts, you will probably get the below error. Or you will be waiting forever during the last solving environment step.

Dependencies error

Miniconda is a much cleaner version of Anaconda. It has very few packages installed out of box. So it will avoid the CuDF installation running into conflict.

If you already have Anaconda installed and wish to keep it. You can try creating a new condo environment with no default package for CuDF.

conda create --no-default-packages -n myenv python=3.7

Building Sales Prediction Models

In this work sample, I demonstrated how to build a predictive model for sales data using Python with Sklearn, Statsmodels, and other libraries. The sample includes using linear regression model and time series model.
Homework - Modeling (2)