" W E H A V E . . . A T H O M E ! "
A n R S h i n y A p p l i c a t i o n l e v e r a g i n g
P o s t g r e S Q L D a t a b a s e s h o s t e d o n A W S R D S
Robert Kozub
Information Management
University of Texas at Dallas
Spring 2023
Overview
R Shiny Application
Data Formatting
User Interface (UI)
Data Collection
Deploy Shiny App
Database Creation
Spring 2023
// Kozub
T A B L E O F C O N T E N T S
03
13
06
18
04
18
08
Conclusion
21
The initial goal of this project was to create a tool that would enable users to compare grocery
items across multiple grocery store chains in the area and determine the cheapest cart price.
However, due to countermeasures put in place by the grocery store websites, it was
challenging to extract the data needed to create the price comparison tool. As a result, the
project scope was changed to focus on a different aspect of grocery shopping.
“We Have…At Home! is a R Shiny application that allows users to find the ingredients needed
to make their favorite dishes utilizing allrecipes.com and fast-food restaurants such as
Whataburger and McDonalds. This app utilizes several technologies to provide a streamlined
and efficient solution to managing grocery shopping and meal planning.
The app uses a PostgreSQL database, which is a powerful open-source relational database
management system (RDBMS). The PostgreSQL database was created in pgAdmin, a database
administration platform which allows the data to exist within a fixed schema. The data for the
database was extracted using ParseHub, a web scraping tool that simplifies data collection
allowing users to scrape data without writing code. Custom web scraper templates were
created in ParseHub to extract data from allrecipes.com, Whataburger.com, McDonalds.com,
and Heb.com to create a comprehensive database of ingredients and grocery store items.
To ensure that the application is accessible online, the PostgreSQL database is connected to an
AWS RDS instance which is a managed database service by Amazon Web Services. This allows
the app to be hosted and accessed without having a live connection active on the local
pgAdmin database.
The user interface (UI) of the app was created using R Shiny, a framework for building
interactive web applications using the R programming language. The app provides users with
an interface to select a recipe or fast-food item, while the app searches for items in the HEB
item table that match the ingredient and then calculates the price. In addition, the app
includes another tab that provides users with additional information about the chosen recipe
or fast-food item, such as their description, price, and macronutrients.
Overall, We HaveAt Home” is a functional tool that leverages several technologies to provide
a user-friendly interface to assist users in recreating their favorite items at home.
O V E R V I E W
Spring 2023
// Kozub
3
To collect the data needed for this project, a web scraping tool called ParseHub was used to
create workflows for scraping the necessary data to create a database in pgAdmin. Information
collected included recipe and item names, ingredients, prices, macronutrients, and
descriptions.
The websites that were scraped included allrecipes.com, heb.com, Whataburger.com, and
mcdonalds.com. Alternatively, the data from these websites could be scraped with Python
utilizing libraries such as BeautifulSoup and Selenium. Scraping using Python scripts can be
helpful with automating the process and building workflows to update data when it is updated
on the website. However, there are several complications to this including countermeasures to
discourage scraping, changes to the website’s HTML/CSS or JavaScript code and overloading
the site with too many requests.
Building workflows with scraping tools such as ParseHub does not require any coding and can
be done quickly. Figure 1 shows an example template that was created to obtain the necessary
data for the Whataburger table.
D A T A C O L L E C T I O N
Spring 2023
// Kozub
Figure 1: ParseHub Graphical User Interface (GUI)
4
This template contains 3 tabs with different actions that parse through the designated website
and extract the information you need. The first tab iterates through the homepage links. Once
the program clicks the link, the actions on the second tab take over and either clicks links or
gathers information. This process of iterating through each page is called pagination and is
what makes it possible for the tool to gather information from multiple pages. Finally, the
third tab extracts most of the necessary information such as the item name, ingredients, price,
and macronutrients.
Data can then be exported and downloaded in various formats including CSV, Excel, and JSON.
There is also a paid option that allows ParseHub to automatically create an Application
Programming Interface (API). For the purposes of this project, the data was exported as CSV.
Templates for each of the websites varies depending on the site’s structure and the
information available, but typically follows the same process of setting up each web scraping
template.
D A T A C O L L E C T I O N C O N T .
Spring 2023
// Kozub
5
After the data is collected, R scripts were created to clean and combine or separate the data so
that it is ready to use in the PostgreSQL database.
C L E A N A N D F O R M A T
D A T A
Spring 2023
// Kozub
The code snippet above in Figure 2 provides an example of how the R script combines three
different tables scraped from allrecipes.com into a single CSV file categorized by breakfast,
lunch, and dinner. This code uses the “dplyr andStringr packages. The dplyr package allows
for the script to perform data manipulation and the stringr package provides functions for
working with strings (RPubs, 2021).
A for loop (starting on line 11) is used to loop through each CSV file and perform several
actions such as creating new columns like “scrape_date,category”, and “unique_id. The
unique id is a unique identifier which will make it possible for the database to distinguish
between other information in the database and relate to other tables that share the same
identifier (IBM, 2021).
Figure 2: AllRecipes Data Formatting with R
6
Each formatting script varies slightly due to different table formats, but overall follows the
same method of adding a unique identifier and other additional columns. Additional scripts
were also created with R to separate the ingredients into their own table and separate each
word line by line, matching its unique identifier. This makes it easier for the Shiny application
to find those key words or ingredients in the HEB table and find items that match.
C L E A N A N D F O R M A T
D A T A C O N T .
Spring 2023
// Kozub
For example, the code snippet in Figure 3 separates the ingredients from the original
McDonalds table into its own CSV and places each ingredient name in a new row matching its
unique identifier. After manually parsing the data, undesirable words are collected and
inputted into the script to remove rows that contain those unneeded characters or words.
Figure 3: Data Formatting with R
7
D A T A B A S E C R E A T I O N
Spring 2023
// Kozub
Figure 4 provides a visual representation of the database schema created in pgAdmin. The
schema consists of seven tables, three of which are related by their unique identifier
(unique_id, mcd_id, and wb_id). The HEB table did not require a related table since it is being
used to search for ingredients.
Figure 4: Database Schema
8
D A T A B A S E C R E A T I O N :
P G A D M I N
Spring 2023
// Kozub
Applying this schema and creating a new
database in pgAdmin is straightforward and
can be quickly done either manually or by
using SQL commands. In this instance, the
following SQL command (right) was used to
create the schema and tables.
This SQL code is a direct translation to the
diagram shown previously. The CREATE
function in SQL allows the user to create
the specified tables with the specified
columns and data types.
This is important because the data type
determines how the data is encoded and
stored, so selecting the appropriate data
type is crucial for maintaining data
integrity.
CREATE TABLE public.allrecipes_combined (
unique_id character varying(255),
recipe_name character varying(255),
recipe_url character varying(2048),
recipe_description text,
recipe_ingredients text,
recipe_calories integer,
recipe_fat character varying(255),
recipe_carbs character varying(255),
recipe_protein character varying(255),
scrape_date date,
category character varying(255)
);
CREATE TABLE public.allrecipes_ingredients (
unique_id character varying(10),
ingredient character varying(100)
);
CREATE TABLE public.heb_combined (
item_id character varying(255),
item_name character varying(255),
item_url character varying(2048),
item_price character varying(255),
scrape_date date,
category character varying(255)
);
CREATE TABLE public.mcdonalds_cleaned (
mcd_id character varying(255),
mcd_category character varying(255),
mcd_cat_url character varying(2048),
mcd_name character varying(255),
mcd_url character varying(2048),
mcd_desc text,
mcd_ingredients text,
scrape_date date
);
CREATE TABLE public.mcdonalds_ingredients (
mcd_id character varying(10),
ingredient character varying(100)
);
CREATE TABLE public.whataburger_cleaned (
wb_id character varying(255),
wb_name character varying(255),
wb_ingredients text,
wb_price_cal character varying(255),
wb_calories integer,
scrape_date date
);
CREATE TABLE public.whataburger_ingredients (
wb_id character varying(10),
ingredient character varying(100)
);
9
D A T A B A S E C R E A T I O N :
P G A D M I N C O N T .
Spring 2023
// Kozub
After the tables are set up properly, the CSV data can be loaded into each table by right
clicking each table and selecting import/export data”. If the tables are in the correct format,
the data should load seamlessly. It can be checked by viewing the data inside the database to
ensure it was copied over as shown in Figure 5 above.
Figure 5: Copied Data from CSV in pgAdmin database
10
D A T A B A S E C R E A T I O N :
A W S R D S I N S T A N C E
Spring 2023
// Kozub
The database that was created initially was a local database, which in most cases is fine if the
data does not need to live online. However, the app being created needs to have consistent
access to the data whether the database is online. Shiny apps that are published to shinyapp.io
also cannot connect to a “local host” due to it technically not existing once the application is
published. Therefore, after initial testing with the Shiny application, the database was
recreated to be connected to an Amazon Web Services (AWS) Relational Database Service
(RDS). Figure 6 below details how the AWS RDS instance is structured.
Figure 6: Amazon Relational Database Service (aws.amazon.com)
The process of creating an AWS RDS instance is straightforward and can be done for free
depending on the amount of data that needs to be hosted in the cloud. Since the data that is
being uses less than 10 mb of storage, the free tier is sufficient for this project. To create the
instance, first an account was created on aws.amazon.com/rds. Then within the dashboard,
there is an option for databases and from there create database” is selected.
11
D A T A B A S E C R E A T I O N :
A W S R D S I N S T A N C E
Spring 2023
// Kozub
After the database is created with the specified options and connection credentials, the
database instance should be posted to the database dashboard as shown in Figure 7 below.
Figure 7: AWS RDS Instance
From there, the connectivity information can be accessed, and the endpoint, port, user, and
password can be copied into dbAdmin to create a new database that utilizes the AWS instance.
Once the cloud database is created, the Shiny application will be able to use the connection and
login credentials to access the data at any time and not rely on a local computer to have the
database activated.
12
To develop this shiny app, the following packages were used: Shiny, DBI, RPostgreSQL, and
DT.
Shiny is a web application framework for R that allows users to create interactive web
applications.
DBI is a database interface for R and provides consistent communication between R and a
relational database system (Wickham et al, 2022).
The RPostgreSQL package allows groups of rows to be retrieved instead of an entire set of
results (Conway et al, 2023).
The DT (DataTables) package allows the Shiny app to manipulate tables and render HTML
tables (Xie et al, 2023).
The Shiny application code consists of the following sections: Database connection, UI
components, and the Server Function.
C R E A T E S H I N Y
A P P L I C A T I O N
Spring 2023
// Kozub
Figure 8: Database Connection
Using the dbConnect function, a connection is established to the PostgreSQL AWS instance
where the database is hosted. Once the credentials are inputted, the connection object “conis
used to perform queries on the database (Figure 8).
13
The UI components shown in Figure 10 creates a simplistic, yet functional user interface that
allows users to select either the AllRecipes, McDonalds, or Whataburger tables from a drop-
down and either select a recipe or fast-food item from the sidebar panel (function
“sidebarLayout”). The UI comprises of three different tabs (function “tabPanel) that display
the ingredients, HEB items with check boxes and total price, and either the recipe or item
details.
Figures 10-15 comprise the server function which contains logic for handling user input and
rendering the output.
C R E A T E S H I N Y
A P P L I C A T I O N C O N T .
Spring 2023
// Kozub
Figure 9: Shiny User Interface (UI)
14
The “output$id_selection_ui block uses renderUI” to create a dropdown after either
selecting one of the AllRecipes, Whataburger, or McDonalds tables.req makes sure the input
value (tables previously mentioned) exists and the switch function determines the correct
table and columns to use based upon the selection. Then, the items” stores the data from the
database that consist of the unique IDs and names for the items in the selected table using an
embedded SQL function using “dbGetQuery”.
C R E A T E S H I N Y
A P P L I C A T I O N C O N T .
Spring 2023
// Kozub
Figure 10: Render Item Selection Drop-down Based On Selected Table
Figure 11: Render the Table of Ingredients for the Selected Item
The ”output$ingredientDetails” block usesrenderDataTable to create a table of either the
selected ingredient or fast food item. This also uses dbGetQuery to construct a SQL query
and gather the ingredients. A data table is then shown in the first tab of the UI containing each
ingredient in its own row that matches the unique ID of the selection.
15
The “output$itemDetails block creates a table of attributes for the selected item. Again, using
an SQL query to get the details for the selected items and then generating a table (accessed in
the 3rd tab of the UI) with each recipe or fast food item details.
C R E A T E S H I N Y
A P P L I C A T I O N C O N T .
Spring 2023
// Kozub
Figure 12: Render the Item Detail Table
Figure 13: Render the HEB Items UI
The “output$hebItems_ui code block, arguably the most important part of the entire script,
creates the HEB items UI which consists of checkboxes for selecting each item needed to
create the recipe. This also utilizes the “switch” function to determine the correct column
name and table. The for loop iterates through the ingredients of the selected recipe or fast-
food item to create the SQL query for the HEB items. Lastly, the checkboxGroupInput”
generates checkboxes for the subset of data in the HEB items table that were filtered out using
the ingredient names.
16
The “output$filteredHebItemsblock also uses the renderTablefunction to generate a table of
HEB items that match the ingredients for the selected recipe or fast-food item. The table
shows the item ID, name, and price of each related item. selected_items retrieves selected
HEB items from the checkboxes whileitem_idsgathers the item IDs from the
“selected_items” by splitting the string on the colon and taking the first part. The
“filtered_heb_items variable stores the fetched data from the database and consists of the
item ID, name, and price for the filtered HEB items generated by the “query function.
C R E A T E S H I N Y
A P P L I C A T I O N C O N T .
Spring 2023
// Kozub
Figure 14: Render the Filtered HEB Items Table
Figure 15: Render the Total Price Text Output
Lastly, the output$totalPrice block usesrenderText to add the price of each selected item
and display the total price at the bottom of the second tab. The item_prices” function uses a
regular expression to extract prices from theselected_items and matches the price pattern
to convert to a numeric value. The regular expression regexpris an R function that finds the
first match of a given pattern within a string. This expression is used in the variable
“price_match which ultimately stores the matched price substring as a character vector.
17
Once the Shiny App code is tested and functioning properly by being able to access the cloud
database and generate the UI correctly, the application can be published to shinyapp.io. There
are other host sites where the app can live such as Heroku, but for easy publishing and editing,
shinyapp.io is more than sufficient. This action is performed by simply clicking the drop down
on the top right corner of the code block in R Studio and publishing the app online. R Studio
may prompt the user to create an account if not already created.
The completed app can be accessed here: https://kozub.shinyapp s.io/WeHav eAtHome
D E P L O Y T O S H I N Y A P P . I O
Spring 2023
// Kozub
Figure 16: Select a Recipe or Fast Food Chain
This drop-down allows the user to select from either of the All Recipes, McDonalds, or
Whataburger tables.
U S E R I N T E R F A C E ( U I )
18
Spring 2023
// Kozub
Figure 17: Select an Item
This drop-down allows users to select either a recipe from the All Recipes table, or an item
from the McDonalds or Whataburger table.
U S E R I N T E R F A C E ( U I )
Figure 18: List of Ingredients
After selecting a recipe or fast-food item, the ingredients, separated into
individual rows are listed. These words are then queried within the HEB table
to find the relevant items.
19
Spring 2023
// Kozub
Figure 19: HEB Items Tab
The HEB items tab allows users to select the relevant items that can be purchased at HEB to
create the recipe for either one of the All Recipes dishes or one of the fast-food items.
U S E R I N T E R F A C E ( U I )
Figure 20: Total Price
Once the items are selected via the check boxes, a total price is displayed.
20
Spring 2023
// Kozub
Figure 21: Item Details Tab
The third tab consists of the remaining details for the selected recipe or item and can include
the item’s name, link, description, ingredients (non-separated), macronutrients, category, and
scrape date. Each table differs in the available information as the details were not consistent
across each website.
U S E R I N T E R F A C E ( U I )
C O N C L U S I O N
Taking everything into account, We HaveAt Home is a useful tool for determining
how much a favorite dish or fast-food item will cost to make at home. The project
presented several challenges, such as difficulties web scraping multiple grocery store
websites (for the original project idea) and building the user interface, but in the end,
this was overcome, and the application was launched successfully.
The project was an educational experience that helped deepen my understanding of
application development using R Shiny, PostgreSQL databases, and cloud environments.
The biggest takeaway being the importance of data management and accessibility in
creating user-friendly applications.
Overall, this application serves as an inspiration for future projects and desire to further
explore the potential of cloud-based solutions, application development, and database
management. With this project, I gained valuable experience and knowledge that will
benefit me in my future endeavors.
21
R E F E R E N C E S
A Complete Beginners Guide to Regular Expressions in R Regenerative. 1 Nov. 2021,
https://regenerativetoday.com/a-complete-beginners-guide-to-regular-expressions-in-r/.
Basic Matches. R-Packages, https://cran.r-project.org/web/packages/stringr/vignettes/regular-
expressions.html. Accessed 3 May 2023.
Conway, Joe, et al. RPostgreSQL: R Interface to the PostgreSQL Database System. 0.7-5, 10 Feb.
2023. R-Packages, https://cran.r-project.org/web/packages/RPostgreSQL/.
Databases (R-SIG-DB), R. Special Interest Group on, et al. DBI: R Database Interface. 1.1.3, 18 June
2022. R-Packages, https://cran.r-project.org/web/packages/DBI/.
Fully Managed Relational Database - Amazon RDS - Amazon Web Services. Amazon Web
Services, Inc., https://aws.amazon.com/rds/. Accessed 3 May 2023.
IBM Documentation. 4 Mar. 2021, https://www.ibm.com/docs/en/strategicsm/10.0.4?
topic=introduction-unique-id.
R Packages (2e). https://r-pkgs.org/. Accessed 3 May 2023.
Regular Expressions in R. https://rstudio-pubs-
static.s3.amazonaws.com/74603_76cd14d5983f47408fdf0b323550b846.html. Accessed 3 May 2023.
RPubs - Tidyverse: Using Stringr, Dplyr, and Tibble to Clean up Catch Phrases.
https://rpubs.com/mlucich/stringr-dplyr-tibble. Accessed 3 May 2023.
Xie, Yihui, et al. DT: A Wrapper of the JavaScript Library DataTables. 0.27, 17 Jan. 2023. R-
Packages, https://cran.r-project.org/web/packages/DT/index.html.
DT: A Wrapper of the JavaScript Library DataTables. 0.27, 17 Jan. 2023. R-Packages,
https://cran.r-project.org/web/packages/DT/.
22