Databases
A more scalable solution is to store data in a database: a formal system for organizing and managing data.
Databases range in size and complexity from simple file-based databases that run on your local computer, to large scalable databases that run on a server cluster. They are optimized for fetching text and numeric data, and performing set operations on this data.
They also have the advantage of scaling really well. They’re optimized not just for efficient storage and retrieval of large amounts of data, but also for concurrent access by hundreds or thousands of users simulaneously [ed. This is a huge topic and we’re not even scratching the surface. I’d strongly encourage you to take a more comprehensive database course e.g. CS 348.]
A relational database is a particular database design that structures data as tables:
- A table represents some logical entity e.g. Customer, Transactions. It consists of columns and rows i.e. like a grid.
- A column is a field or data element in that table. e.g. a “Customer“ table might have “name“, “city“, “birthdate“ fields.
- A row is a record, containing values for each field. e.g. ”Jeff Avery”, ”Waterloo”, ”June 23, 1985”. Each row in each table needs to be unique identified by a key. This can be a part of the data (e.g. timestamp) but is often a unique, generated numeric identifier (e.g. “customerid“, “transactionid“).
There are many other types of databases! We’ll use relational databases in this course b/c (a) they’re very common, and (b) they are suitable for any type of data/relations that you will encounter in this course.
Here’s our earlier example shown as a class, a set of records in a CSV file and as a set of relational database tables.
Why is this approach valuable? Relational databases can quickly oprerate on these tables, and support operations on sets of records. For example:
- Return a list of all purchases greater than $100.
- Return a list of customers from ”Paris”.
- Delete customers from ”Ottawa”.
Our example is pretty trivial, but imagine useful queries like:
- ”Find all transactions between 2:00 and 2:30”, or
- ”Find our which salesperson sold the greatest amount during last Saturday’s sale”.
SQL (pronounced ”Ess-que-ell”) is a Domain-Specific Language (DSL) for describing your queries. Using SQL, you write statements describing the operation to perform and which tables to use, and the database performs the operations for you.
SQL is a standard, so SQL commands work across different databases [ed. SQL was adopted as a standard by ANSI in 1986 as SQL-86, and by ISO in 1987].
Using SQL, you can:
- C reate new records
- R etrieve sets of existing records
- U pdate the fields in one or more records
- D elete one or more records1
SQL has a particular syntax for managing sets of records:
<operation> FROM [table] [WHERE [condition]]
operations: SELECT, UPDATE, INSERT, DELETE, ...
conditions: WHERE [col] <operator> <value>
For example:
/* SELECT returns data from a table, or a set of tables.
* an asterix (*) means "all"
*/
SELECT * FROM customers
SELECT * FROM Customers WHERE city = "Ottawa"
SELECT name FROM Customers WHERE custid = 1001
/* UPDATE modifies one or more column values based on some criteria.*/
UPDATE Customer SET city = "Kitchener" WHERE cust_id = 1001
UPDATE Customer SET city = "Kitchener" // uh oh, what is wrong with this?
/* INSERT adds new records to your database. */
INSERT INTO Customer(cust_id, name, city)
VALUES ("1005", "Sandra Avery", "Kitchener")
INSERT INTO Customer(cust_id, name, city)
VALUES ("1005", "Austin Avery", "Kitchener") // uh oh, what have I done?
It’s common to have a record spread across multiple tables. A join describes how to relate data across tables.
SELECT c.customer_id, c.first_name + “ “ + c.last_name, t.date, p.name, p.cost
FROM Customer c, Transactions t, Products p
WHERE c.customer_id = t.customer_id
AND t.product_id = p.product_id
Returns data that spans multiple tables:
$ 1001, Jeff Avery, 12-Aug-2020, T-shirt, 29.95
https://imgur.com/t/logic/v23nUwQ
Relational databases are often large and complex systems. However, the relational approach can also be scaled-down.
SQLite (pronounced ESS-QUE-ELL-ITE) is a small-scale relational DBMS that supports SQL. It is small enough for local, standalone use.
SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers…"
You can install the SQLite database under Mac, Windows or Linux.
- Visit the SQLite Download Page.
- Dowload the binary for your platform.
- To test it, launch it from a shell.
$ sqlite3
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .exit
We will create a database from the command line. Optionally, you can install SQLite Studio, a GUI for managing databases.
You can download the SQLite Sample Database and confirm that SQLite is working. We can open and read from this database using the command-line tools:
$ sqlite3
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .help
.auth ON|OFF Show authorizer callbacks
.backup ?DB? FILE Backup DB (default "main") to FILE
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.clone NEWDB Clone data into NEWDB from the existing database
.databases List names and files of attached databases
.dbconfig ?op? ?val? List or change sqlite3_db_config() options
.dbinfo ?DB? Show status information about the database
These commands are meta-commands
that act on the database itself, and not the data that it contains (i.e. these are different from SQL and specific to SQLite). Some particularly useful commands:
Command | Purpose |
---|---|
.open filename |
Open database filename . |
.database | Show all connected databases. |
.log filename |
Write console to log filename . |
.read filename |
Read input from filename . |
.tables | Show a list of tables in the open database. |
.schema tablename |
SQL to create a particular tablename . |
.fullschema | SQL to create the entire database structure. |
.quit | Quit and close connections. |
We often use these meta-commands to change settings, and find information that we’ll need to structure queries e.g. table names).
$ sqlite3
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open chinook.db // name of the file
sqlite> .mode column // lines up data in columns
sqlite> .headers on // shows column names at the top
// determine which tables to query
sqlite> .tables
albums employees invoices playlists
artists genres media_types tracks
customers invoice_items playlist_track
Once we’ve identified what we want to do, we can just execute our queries. Examples of selecting from a single table at a time:
sqlite> SELECT * FROM albums WHERE albumid < 4;
AlbumId Title ArtistId
---------- ------------------------- ----------
1 For Those About To Rock 1
2 Restless and Wild 2
3 Let There Be Rock 1
sqlite> SELECT * FROM artists WHERE ArtistId = 1;
ArtistId Name
---------- ----------
1 AC/DC
We can also JOIN across two tables (based on a primary key, “ArtistId“’). You often will have multiple WHERE clauses to join between multiple tables.
sqlite> SELECT albums.AlbumId, artists.Name, albums.Title
FROM albums, artists
WHERE albums.ArtistId = artists.ArtistId
AND albums.AlbumId < 4;
AlbumId Name Title
---------- ---------- ------------------------
1 AC/DC For Those About To Rock
2 Accept Restless and Wild
3 AC/DC Let There Be Rock
How do we use a database from our applications? Kotlin doesn’t have built-in database support, but the framework includes a library for accessing external databases.
Specifically, Kotlin leverages the Java JDBC API (”Java DataBase Connectivity”). This provides a standard mechanism for connecting to databases, issuing queries, and managing results.
Creating a database project in IntelliJ:
- Create a Gradle/Kotlin project.
- Modify the
build.gradle
to include a dependency on JDBC.
implementation 'org.xerial:sqlite-jdbc:3.30.1'
- Use the Java SQL package classes to connect and fetch data.
import java.sql.Connection
import java.sql.DriverManager
import java.sql.SQLException
….
This example uses a sample database from the official SQLite tutorial.
var conn: Connection? = null
fun connect() {
try {
val url = "jdbc:sqlite:chinook.db"
conn = DriverManager.getConnection(url)
println("Connection established.")
} catch (e: SQLException) {
println(e.message)
}
}
fun query() {
try {
if (conn != null) {
// do something
JDBC is a useful mechansism for connecting to remote databases, but making raw SQL calls is error-prone: there’s no type checking, or other safety mechanisms in-place. It also requires us to explicitly convert between string data and class objects that are holding our data.
There are a number of libraries that can leverage JDBC and provide a cleaner set of abstractions. We’ll discuss one of them: Exposed, which is a JetBrains Kotlin library for working with databases.
repositories {
mavenCentral()
}
dependencies {
implementation("org.jetbrains.exposed", "exposed-core", "0.40.1")
implementation("org.jetbrains.exposed", "exposed-dao", "0.40.1")
implementation("org.jetbrains.exposed", "exposed-jdbc", "0.40.1")
}
Every database access using Exposed is started by obtaining a connection and creating a transaction. We’ll use SQLite as an example. We can connect using the same database connection string we used in the previous section:
To get a connection:
Database.connect("jdbc:sqlite:chinook.db")
After obtaining a connection all SQL statements should be placed inside a transaction2:
transaction {
// statements go here
}
Exposed comes in two flavors: DSL (Domain Specific Language) and DAO (Data Access Objects). DSL means a type-safe syntax that is similar to SQL; DAO means doing CRUD operations on entities like tables. They are both viable, but represent different types of abstraction for our database.
Here’s an example of a DSL query from the Exposed Wiki:
fun main(args: Array<String>) {
Database.connect("jdbc:sqlite:cities.db")
transaction {
// print sql to std-out
addLogger(StdOutSqlLogger)
// create a table that reflects the Cities class structure
SchemaUtils.create (Cities)
// insert new city. SQL: INSERT INTO Cities (name) VALUES ('St. Petersburg')
val stPeteId = Cities.insert {
it[name] = "St. Petersburg"
} get Cities.id
// 'select *' SQL: SELECT Cities.id, Cities.name FROM Cities
println("Cities: ${Cities.selectAll()}")
}
}
object Cities: IntIdTable() {
val name = varchar("name", 50)
}
Here’s the same example using Exposed DAO. This is similar to DSL, but the objects are more explicitly tied to the database structure.
fun main(args: Array<String>) {
Database.connect("sqlite:cities.db")
transaction {
// print sql to std-out
addLogger(StdOutSqlLogger)
SchemaUtils.create (Cities)
// insert new city. SQL: INSERT INTO Cities (name) VALUES ('St. Petersburg')
val stPete = City.new {
name = "St. Petersburg"
}
// 'select *' SQL: SELECT Cities.id, Cities.name FROM Cities
println("Cities: ${City.all()}")
}
}
object Cities: IntIdTable() {
val name = varchar("name", 50)
}
class City(id: EntityID<Int>) : IntEntity(id) {
companion object : IntEntityClass<City>(Cities)
var name by Cities.name
}
For details on how to use the Exposed DAO API, see the DAO Example page.
-
These are considered the set of basic functions for working with persistant storage. Yes, we use the acronym CRUD. You’ll occasionally trip over the term “CRUD applications” which just means an application whose primary purpose is to interact with an underlying database. An example would be something like a simple app for tracking a list of customers or sales data. ↩︎
-
A transaction represent a scope of operations that should be done together, in the order presented. If any one of the operations in the transaction fails, none of the operations will be applied. ↩︎