" W E H A V E . . . A T H O M E ! "

A n R S h i n y A p p l i c a t i o n l e v e r a g i n g

P o s t g r e S Q L D a t a b a s e s h o s t e d o n A W S R D S

Robert Kozub

Information Management

University of Texas at Dallas

Spring 2023

Overview

R Shiny Application

Data Formatting

User Interface (UI)

Data Collection

Deploy Shiny App

Database Creation

Spring 2023

// Kozub

T A B L E O F C O N T E N T S

Conclusion

The initial goal of this project was to create a tool that would enable users to compare grocery

items across multiple grocery store chains in the area and determine the cheapest cart price.

However, due to countermeasures put in place by the grocery store websites, it was

challenging to extract the data needed to create the price comparison tool. As a result, the

project scope was changed to focus on a different aspect of grocery shopping.

“We Have…At Home!” is a R Shiny application that allows users to find the ingredients needed

to make their favorite dishes utilizing allrecipes.com and fast-food restaurants such as

Whataburger and McDonalds. This app utilizes several technologies to provide a streamlined

and efficient solution to managing grocery shopping and meal planning.

The app uses a PostgreSQL database, which is a powerful open-source relational database

management system (RDBMS). The PostgreSQL database was created in pgAdmin, a database

administration platform which allows the data to exist within a fixed schema. The data for the

database was extracted using ParseHub, a web scraping tool that simplifies data collection

allowing users to scrape data without writing code. Custom web scraper templates were

created in ParseHub to extract data from allrecipes.com, Whataburger.com, McDonalds.com,

and Heb.com to create a comprehensive database of ingredients and grocery store items.

To ensure that the application is accessible online, the PostgreSQL database is connected to an

AWS RDS instance which is a managed database service by Amazon Web Services. This allows

the app to be hosted and accessed without having a live connection active on the local

pgAdmin database.

The user interface (UI) of the app was created using R Shiny, a framework for building

interactive web applications using the R programming language. The app provides users with

an interface to select a recipe or fast-food item, while the app searches for items in the HEB

item table that match the ingredient and then calculates the price. In addition, the app

includes another tab that provides users with additional information about the chosen recipe

or fast-food item, such as their description, price, and macronutrients.

Overall, “We Have…At Home” is a functional tool that leverages several technologies to provide

a user-friendly interface to assist users in recreating their favorite items at home.

O V E R V I E W

Spring 2023

// Kozub

To collect the data needed for this project, a web scraping tool called ParseHub was used to

create workflows for scraping the necessary data to create a database in pgAdmin. Information

collected included recipe and item names, ingredients, prices, macronutrients, and

descriptions.

The websites that were scraped included allrecipes.com, heb.com, Whataburger.com, and

mcdonalds.com. Alternatively, the data from these websites could be scraped with Python

utilizing libraries such as BeautifulSoup and Selenium. Scraping using Python scripts can be

helpful with automating the process and building workflows to update data when it is updated

on the website. However, there are several complications to this including countermeasures to

discourage scraping, changes to the website’s HTML/CSS or JavaScript code and overloading

the site with too many requests.

Building workflows with scraping tools such as ParseHub does not require any coding and can

be done quickly. Figure 1 shows an example template that was created to obtain the necessary

data for the Whataburger table.

D A T A C O L L E C T I O N

Spring 2023

// Kozub

Figure 1: ParseHub Graphical User Interface (GUI)

This template contains 3 tabs with different actions that parse through the designated website

and extract the information you need. The first tab iterates through the homepage links. Once

the program clicks the link, the actions on the second tab take over and either clicks links or

gathers information. This process of iterating through each page is called pagination and is

what makes it possible for the tool to gather information from multiple pages. Finally, the

third tab extracts most of the necessary information such as the item name, ingredients, price,

and macronutrients.

Data can then be exported and downloaded in various formats including CSV, Excel, and JSON.

There is also a paid option that allows ParseHub to automatically create an Application

Programming Interface (API). For the purposes of this project, the data was exported as CSV.

Templates for each of the websites varies depending on the site’s structure and the

information available, but typically follows the same process of setting up each web scraping

template.

D A T A C O L L E C T I O N C O N T .

Spring 2023

// Kozub

After the data is collected, R scripts were created to clean and combine or separate the data so

that it is ready to use in the PostgreSQL database.

C L E A N A N D F O R M A T

D A T A

Spring 2023

// Kozub

The code snippet above in Figure 2 provides an example of how the R script combines three

different tables scraped from allrecipes.com into a single CSV file categorized by breakfast,

lunch, and dinner. This code uses the “dplyr” and “Stringr” packages. The dplyr package allows

for the script to perform data manipulation and the stringr package provides functions for

working with strings (RPubs, 2021).

A for loop (starting on line 11) is used to loop through each CSV file and perform several

actions such as creating new columns like “scrape_date”, “category”, and “unique_id”. The

unique id is a unique identifier which will make it possible for the database to distinguish

between other information in the database and relate to other tables that share the same

identifier (IBM, 2021).

Figure 2: AllRecipes Data Formatting with R

Each formatting script varies slightly due to different table formats, but overall follows the

same method of adding a unique identifier and other additional columns. Additional scripts

were also created with R to separate the ingredients into their own table and separate each

word line by line, matching its unique identifier. This makes it easier for the Shiny application

to find those key words or ingredients in the HEB table and find items that match.

C L E A N A N D F O R M A T

D A T A C O N T .

Spring 2023

// Kozub

For example, the code snippet in Figure 3 separates the ingredients from the original

McDonalds table into its own CSV and places each ingredient name in a new row matching its

unique identifier. After manually parsing the data, undesirable words are collected and

inputted into the script to remove rows that contain those unneeded characters or words.

Figure 3: Data Formatting with R

D A T A B A S E C R E A T I O N

Spring 2023

// Kozub

Figure 4 provides a visual representation of the database schema created in pgAdmin. The

schema consists of seven tables, three of which are related by their unique identifier

(unique_id, mcd_id, and wb_id). The HEB table did not require a related table since it is being

used to search for ingredients.

Figure 4: Database Schema

D A T A B A S E C R E A T I O N :

P G A D M I N

Spring 2023

// Kozub

Applying this schema and creating a new

database in pgAdmin is straightforward and

can be quickly done either manually or by

using SQL commands. In this instance, the

following SQL command (right) was used to

create the schema and tables.

This SQL code is a direct translation to the

diagram shown previously. The CREATE

function in SQL allows the user to create

the specified tables with the specified

columns and data types.

This is important because the data type

determines how the data is encoded and

stored, so selecting the appropriate data

type is crucial for maintaining data

integrity.

CREATE TABLE public.allrecipes_combined (

unique_id character varying(255),

recipe_name character varying(255),

recipe_url character varying(2048),

recipe_description text,

recipe_ingredients text,

recipe_calories integer,

recipe_fat character varying(255),

recipe_carbs character varying(255),

recipe_protein character varying(255),

scrape_date date,

category character varying(255)

);

CREATE TABLE public.allrecipes_ingredients (

unique_id character varying(10),

ingredient character varying(100)

);

CREATE TABLE public.heb_combined (

item_id character varying(255),

item_name character varying(255),

item_url character varying(2048),

item_price character varying(255),

scrape_date date,

category character varying(255)

);

CREATE TABLE public.mcdonalds_cleaned (

mcd_id character varying(255),

mcd_category character varying(255),

mcd_cat_url character varying(2048),

mcd_name character varying(255),

mcd_url character varying(2048),

mcd_desc text,

mcd_ingredients text,

scrape_date date

);

CREATE TABLE public.mcdonalds_ingredients (

mcd_id character varying(10),

ingredient character varying(100)

);

CREATE TABLE public.whataburger_cleaned (

wb_id character varying(255),

wb_name character varying(255),

wb_ingredients text,

wb_price_cal character varying(255),

wb_calories integer,

scrape_date date

);

CREATE TABLE public.whataburger_ingredients (

wb_id character varying(10),

ingredient character varying(100)

);

D A T A B A S E C R E A T I O N :

P G A D M I N C O N T .

Spring 2023

// Kozub

After the tables are set up properly, the CSV data can be loaded into each table by right

clicking each table and selecting “import/export data”. If the tables are in the correct format,

the data should load seamlessly. It can be checked by viewing the data inside the database to

ensure it was copied over as shown in Figure 5 above.

Figure 5: Copied Data from CSV in pgAdmin database

D A T A B A S E C R E A T I O N :

A W S R D S I N S T A N C E

Spring 2023

// Kozub

The database that was created initially was a local database, which in most cases is fine if the

data does not need to live online. However, the app being created needs to have consistent

access to the data whether the database is online. Shiny apps that are published to shinyapp.io

also cannot connect to a “local host” due to it technically not existing once the application is

published. Therefore, after initial testing with the Shiny application, the database was

recreated to be connected to an Amazon Web Services (AWS) Relational Database Service

(RDS). Figure 6 below details how the AWS RDS instance is structured.

Figure 6: Amazon Relational Database Service (aws.amazon.com)

The process of creating an AWS RDS instance is straightforward and can be done for free

depending on the amount of data that needs to be hosted in the cloud. Since the data that is

being uses less than 10 mb of storage, the free tier is sufficient for this project. To create the

instance, first an account was created on aws.amazon.com/rds. Then within the dashboard,

there is an option for databases and from there “create database” is selected.

D A T A B A S E C R E A T I O N :

A W S R D S I N S T A N C E

Spring 2023

// Kozub

After the database is created with the specified options and connection credentials, the

database instance should be posted to the database dashboard as shown in Figure 7 below.

Figure 7: AWS RDS Instance

From there, the connectivity information can be accessed, and the endpoint, port, user, and

password can be copied into dbAdmin to create a new database that utilizes the AWS instance.

Once the cloud database is created, the Shiny application will be able to use the connection and

database activated.

To develop this shiny app, the following packages were used: Shiny, DBI, RPostgreSQL, and

DT.

Shiny is a web application framework for R that allows users to create interactive web

applications.

DBI is a database interface for R and provides consistent communication between R and a

relational database system (Wickham et al, 2022).

The RPostgreSQL package allows groups of rows to be retrieved instead of an entire set of

results (Conway et al, 2023).

The DT (DataTables) package allows the Shiny app to manipulate tables and render HTML

tables (Xie et al, 2023).

The Shiny application code consists of the following sections: Database connection, UI

components, and the Server Function.

C R E A T E S H I N Y

A P P L I C A T I O N

Spring 2023

// Kozub

Figure 8: Database Connection

Using the “dbConnect” function, a connection is established to the PostgreSQL AWS instance

where the database is hosted. Once the credentials are inputted, the connection object “con” is

used to perform queries on the database (Figure 8).

The UI components shown in Figure 10 creates a simplistic, yet functional user interface that

allows users to select either the AllRecipes, McDonalds, or Whataburger tables from a drop-

down and either select a recipe or fast-food item from the sidebar panel (function

“sidebarLayout”). The UI comprises of three different tabs (function “tabPanel”) that display

the ingredients, HEB items with check boxes and total price, and either the recipe or item

details.

Figures 10-15 comprise the server function which contains logic for handling user input and

rendering the output.

C R E A T E S H I N Y

A P P L I C A T I O N C O N T .

Spring 2023

// Kozub

Figure 9: Shiny User Interface (UI)

The “output$id_selection_ui” block uses “renderUI” to create a dropdown after either

selecting one of the AllRecipes, Whataburger, or McDonalds tables. “req” makes sure the input

value (tables previously mentioned) exists and the “switch” function determines the correct

table and columns to use based upon the selection. Then, the “items” stores the data from the

database that consist of the unique IDs and names for the items in the selected table using an

embedded SQL function using “dbGetQuery”.

C R E A T E S H I N Y

A P P L I C A T I O N C O N T .

Spring 2023

// Kozub

Figure 10: Render Item Selection Drop-down Based On Selected Table

Figure 11: Render the Table of Ingredients for the Selected Item

The ”output$ingredientDetails” block uses “renderDataTable” to create a table of either the

selected ingredient or fast food item. This also uses “dbGetQuery” to construct a SQL query

and gather the ingredients. A data table is then shown in the first tab of the UI containing each

ingredient in its own row that matches the unique ID of the selection.

The “output$itemDetails” block creates a table of attributes for the selected item. Again, using

an SQL query to get the details for the selected items and then generating a table (accessed in

the 3rd tab of the UI) with each recipe or fast food item details.

C R E A T E S H I N Y

A P P L I C A T I O N C O N T .

Spring 2023

// Kozub

Figure 12: Render the Item Detail Table

Figure 13: Render the HEB Items UI

The “output$hebItems_ui” code block, arguably the most important part of the entire script,

creates the HEB items UI which consists of checkboxes for selecting each item needed to

create the recipe. This also utilizes the “switch” function to determine the correct column

name and table. The for loop iterates through the ingredients of the selected recipe or fast-

food item to create the SQL query for the HEB items. Lastly, the “checkboxGroupInput”

generates checkboxes for the subset of data in the HEB items table that were filtered out using

the ingredient names.

The “output$filteredHebItems” block also uses the ‘renderTable’ function to generate a table of

HEB items that match the ingredients for the selected recipe or fast-food item. The table

shows the item ID, name, and price of each related item. “selected_items” retrieves selected

HEB items from the checkboxes while “item_ids” gathers the item IDs from the

“selected_items” by splitting the string on the colon and taking the first part. The

“filtered_heb_items” variable stores the fetched data from the database and consists of the

item ID, name, and price for the filtered HEB items generated by the “query” function.

C R E A T E S H I N Y

A P P L I C A T I O N C O N T .

Spring 2023

// Kozub

Figure 14: Render the Filtered HEB Items Table

Figure 15: Render the Total Price Text Output

Lastly, the “output$totalPrice” block uses “renderText” to add the price of each selected item

and display the total price at the bottom of the second tab. The “item_prices” function uses a

regular expression to extract prices from the “selected_items” and matches the price pattern

to convert to a numeric value. The regular expression “regexpr” is an R function that finds the

first match of a given pattern within a string. This expression is used in the variable

“price_match” which ultimately stores the matched price substring as a character vector.

Once the Shiny App code is tested and functioning properly by being able to access the cloud

database and generate the UI correctly, the application can be published to shinyapp.io. There

are other host sites where the app can live such as Heroku, but for easy publishing and editing,

shinyapp.io is more than sufficient. This action is performed by simply clicking the drop down

on the top right corner of the code block in R Studio and publishing the app online. R Studio

may prompt the user to create an account if not already created.

The completed app can be accessed here: https://kozub.shinyapp s.io/WeHav eAtHome

D E P L O Y T O S H I N Y A P P . I O

Spring 2023

// Kozub

Figure 16: Select a Recipe or Fast Food Chain

This drop-down allows the user to select from either of the All Recipes, McDonalds, or

Whataburger tables.

U S E R I N T E R F A C E ( U I )

Spring 2023

// Kozub

Figure 17: Select an Item

This drop-down allows users to select either a recipe from the All Recipes table, or an item

from the McDonalds or Whataburger table.

U S E R I N T E R F A C E ( U I )

Figure 18: List of Ingredients

After selecting a recipe or fast-food item, the ingredients, separated into

individual rows are listed. These words are then queried within the HEB table

to find the relevant items.

Spring 2023

// Kozub

Figure 19: HEB Items Tab

The HEB items tab allows users to select the relevant items that can be purchased at HEB to

create the recipe for either one of the All Recipes dishes or one of the fast-food items.

U S E R I N T E R F A C E ( U I )

Figure 20: Total Price

Once the items are selected via the check boxes, a total price is displayed.

Spring 2023

// Kozub

Figure 21: Item Details Tab

The third tab consists of the remaining details for the selected recipe or item and can include

the item’s name, link, description, ingredients (non-separated), macronutrients, category, and

scrape date. Each table differs in the available information as the details were not consistent

across each website.

U S E R I N T E R F A C E ( U I )

C O N C L U S I O N

Taking everything into account, “We Have…At Home” is a useful tool for determining

how much a favorite dish or fast-food item will cost to make at home. The project

presented several challenges, such as difficulties web scraping multiple grocery store

websites (for the original project idea) and building the user interface, but in the end,

this was overcome, and the application was launched successfully.

The project was an educational experience that helped deepen my understanding of

application development using R Shiny, PostgreSQL databases, and cloud environments.

The biggest takeaway being the importance of data management and accessibility in

creating user-friendly applications.

Overall, this application serves as an inspiration for future projects and desire to further

explore the potential of cloud-based solutions, application development, and database

management. With this project, I gained valuable experience and knowledge that will

benefit me in my future endeavors.

R E F E R E N C E S

A Complete Beginners Guide to Regular Expressions in R – Regenerative. 1 Nov. 2021,

https://regenerativetoday.com/a-complete-beginners-guide-to-regular-expressions-in-r/.

Basic Matches. R-Packages, https://cran.r-project.org/web/packages/stringr/vignettes/regular-

expressions.html. Accessed 3 May 2023.

Conway, Joe, et al. RPostgreSQL: R Interface to the “PostgreSQL” Database System. 0.7-5, 10 Feb.

2023. R-Packages, https://cran.r-project.org/web/packages/RPostgreSQL/.

Databases (R-SIG-DB), R. Special Interest Group on, et al. DBI: R Database Interface. 1.1.3, 18 June

2022. R-Packages, https://cran.r-project.org/web/packages/DBI/.

“Fully Managed Relational Database - Amazon RDS - Amazon Web Services.” Amazon Web

Services, Inc., https://aws.amazon.com/rds/. Accessed 3 May 2023.

IBM Documentation. 4 Mar. 2021, https://www.ibm.com/docs/en/strategicsm/10.0.4?

topic=introduction-unique-id.

R Packages (2e). https://r-pkgs.org/. Accessed 3 May 2023.

Regular Expressions in R. https://rstudio-pubs-

static.s3.amazonaws.com/74603_76cd14d5983f47408fdf0b323550b846.html. Accessed 3 May 2023.

RPubs - Tidyverse: Using Stringr, Dplyr, and Tibble to Clean up Catch Phrases.

https://rpubs.com/mlucich/stringr-dplyr-tibble. Accessed 3 May 2023.

Xie, Yihui, et al. DT: A Wrapper of the JavaScript Library “DataTables.” 0.27, 17 Jan. 2023. R-

Packages, https://cran.r-project.org/web/packages/DT/index.html.

DT: A Wrapper of the JavaScript Library “DataTables.” 0.27, 17 Jan. 2023. R-Packages,

https://cran.r-project.org/web/packages/DT/.