Data & Databases

Overview

Most software operates on data. This might be a primary function of your application (e.g. image editor) or a secondary function (e.g. user preferences). As a software developer, you need to give considerable thought to how you will handle user data.

Some examples of data that you might care about:

  • The data that the user has captured and wishes to save e.g. an image that they have created in an image-editor, or the text from a note.
  • The position and size of the application window, window size, or other settings that you might want to save and restore when the application is relaunched.
  • User preferences that they might set in your application e.g. preferred font or font size.
  • The public key to connect to a remote machine.
  • The software license key for your application.

Operations you may need to do:

  • Store data in a file so that you can load and restore it later.
  • Transfer the data to another process that can use it, or to a completely different system e.g. over the internet.
  • Filter it to a subset of your original data. Sort it. Modify it.

Let’s start by reviewing something we already know - data types.

Data Types

A type is a way of categorizing our data so that the computer knows how we intend to use it. There are many different kinds of types that you already know how to work with:

  • Primitive Types: these are intrinsically understood by a compiler, and will include boolean, and numeric data types. e.g. boolean, integer, float, double.

  • Strings: Any text representation which can include characters (char) or longer collections of characters (string). Strings are complicated to store and manipulate because there are a large number of potential values for each character, and the strings themselves are variable length e.g. they can span a few characters to many thousands of words in a single string!

  • Compound data types: A combination of primitive types. e.g. an Array of Integers.

  • Pointers: A data type whose value points to a location in memory. The underlying data may resemble a long Integer, but they are treated as a separate type to allow for specific behaviours to help protect our programs e.g. to prevent invalid memory access or invalid operations.

  • Abstract data type: We treat ADTs as different because they describe a structure, and do not actually hold any concrete data. We have a template for an ADT (e.g. class) and concrete realizations (e.g. objects). These can be singular, or stored as Compound types as well e.g. an Array of Objects.

You’re accustomed to working with most of these types – we declare variables in a programming language using the types for that language. For example, in Kotlin, we can assign the value 4 to different variables, each representing a different type. Note that we’ve had to make some adjustments to how we represent the value so that they match the expected type, and avoid compiler errors:

val a:Int = 4
val b:Double = 4.0
val c:String = "4"

By using different types, we’ve made it clear to the compiler that a and b do not represent the same value.

>>> a==b
error: operator '==' cannot be applied to 'Int' and 'Double'

Types have properties and behaviours that are specific to that type. For example, we can determine the length of c, a String, but not the length of a, an Int. Similarly, we can perform mathematical operations on some types but not others.

>>> c.length
res13: kotlin.Int = 1

>>> a.length
error: unresolved reference: length
a.length
  ^

>>> b/4
res15: kotlin.Double = 1.0

>>> c/4
error: unresolved reference. None of the following candidates is applicable because of receiver type mismatch: 
public inline operator fun BigDecimal.div(other: BigDecimal): BigDecimal defined in kotlin
public inline operator fun BigInteger.div(other: BigInteger): BigInteger defined in kotlin
c/4
 ^

In order for our programs to work with data, it needs to be stored in one of these types: either a series of primitives or strings, or a collection of these types, or an instance of an ADT (i.e. an Object).

Info

Keep in mind that this is not the actual data representation, but the abstraction that we are using to describe our data. In memory, a string might be stored as a continuous number of bytes, or scattered through memory in chunks - for this discussion, that doesn’t matter. We’re relying on the programming language to hide away the underlying details.

Data can be simple, consisting of a single field (e.g. the user’s name), or complex, consisting of a number of related fields (e.g. a customer with a name, address, job title). Typically we group related data into classes, and class instances are used to represent specific instances of that class.

data class Customer(id:Int, name:String, city:String)
val new_client = Customer(1001, "Jane Bond", "Waterloo")

Data items can be singular (e.g. one particular customer), or part of a collection (e.g. all of my customers). A singular example would be a custom record.

val new_client1 = Customer(1001, "Jane Bond", "Waterloo")
val new_client2 = Customer(1002, "Bruce Willis", "Kitchener")

A bank might also keep track of transactions that a customer makes, where each transaction represents the deposit or withdrawal, the date when it occurred, the amount and so on. [ed. This date format is ISO 8601, a standard date/time representation.]

To represent this in memory, we might have a transaction data class, with individual transactions being stored in a collection.

data class Tx(id:Int, date:String, amount:Double, currency: String)
val transactions = mutableList()
transactions.add(Tx(1001, "2020-06-06T14:35:44", "78.22", "CDN"))
transactions.add(Tx(1001, "2020-06-06T14:38:18", "12.10", "USD"))
transactions.add(Tx(1002, "2020-06-06T14:42:51", "44.50", "CDN")) 

These structures represent how we are storing this data in this particular case. Note that there may be multiple ways of representing the same data, we’ve just chosen one that makes sense to use.

Data Models

A data model is a abstraction of how our data is represented and how data elements relate to one another. We will often have a canonical data model, which we might then use as the basis for different representations of that data.

There are different forms of data models, including:

  • Database model: describes how to structure data in a database (flat, hierarchical, relational).

  • Data structure diagrams (DSD): describes data as entities (boxes) and their relationships (arrows connecting them).

  • Entity-Relationship Model: Similar to DSDs, but with some notational differences. Don’t scale out very well.

Data structure diagrams aren’t commonly used to diagram a complete system, but can be used to show how entities relate to one another. They can range in complexity from very large and formals diagrams, to quick illustrative sketches that just show the relationships between different entities.

Here’s a data structure diagram, showing different entities. These would likely be converted to multiple classes, each one responsible for their own data.

Data Structure Diagram Data Structure Diagram

Info

We also use these terms when referring to complex data structures:

  • field: a particular piece of data, corresponding to variables in a program.
  • record: a collection of fields that together comprise a single instance of a class or object.

Data Representation

One of the challenges is determining how to store our data for different purposes.

Let’s consider our customer data. We can represent this abstractly as a Customer class. In code, we can have instances of Customers. How do we store that data, or transfer it to a different system?

The “easy” answer would be to share the objects directly, but that’s often not realistic. The target system would need to be able to work with that object format directly, and it’s likely not a very robust (or safe) way of transmitting your data.

Often we need to convert our data into different representations of the same data model.

  • We need to ensure that we don’t lose any data
  • We need to maintain the relationships between data elements.
  • We need to be able to “convert back” as needed.

In the diagram below, you can see this in action. We might want to save our customer data in a database, or into a data file for backup (or transmission to a different system).

Different data representations can be used by a single application Different data representations can be used by a single application

Part of the challenge in working with data is determining what you need to do with it, and what data representation will be appropriate for different situations. As we’ll see, it’s common to need to convert and manage data across these different representations.

Data Files

A file format is a standard way of encoding data in a file. There are a large number of predefined file formats that have been designed and maintained by different standards bodies. If you are working with data that matches a predefined format, then you should use the correct file format for that data! Examples of standard file formats include HTML, Scalable vector graphics (SVG), MPEG video files, JPEG images, PDF documents and so on.

If you are working with your own data (e.g. our Customer data), then you are free to define your own encoding.

A fundamental distinction is whether you want text or binary encoding:

  • Text encoding is storing data as a stream of characters in a standard encoding scheme (e.g. UTF8). Text files have the advantage of being human-readable, and can be easy to process and debug. However, for large amounts of data, they can also be slow to process.
  • Binary encoding allows you to store data in a completely open and arbitrary way. It won’t be human readable, but you can more easily store non-textual data in an efficient manner e.g. images from a drawing program.

Kotlin has libraries to support both. You can easily write a stream of characters to a text file, or push a stream of bytes to a binary file. We will explore how to do both in the next section.

A “rule of thumb” is that non-private data, especially if realtively small, should often be stored in a text file. The ability to read the file for debugging purposes is invaluable, as is the ability to edit the file in a standard text editor. This is why Preferences files, Log files and other similar data files are stored in human-readable formats.

Private data, or data that is difficult to process is probably better served in a binary format. It’s not human-readable, but that’s probably what you want if the data is sensitive. As we will discover, it can also be easier to process. We’ll discuss this below.

Character Encoding

Writing text into a file isn’t as simple as it sounds. Like other data, characters in memory are stored as binary i.e. numeric values in a range. To display them, we need some agreement on what number represents each character. This sound like a simple problem but it’s actually very tricky, due to the number of different characters that are in use around the world.

Our desire to be able to encode all characters in all language is balanced by our need to be efficient. i.e. we want to use the least number of bytes per character that we’re storing.

In the early days of computing, we used US-ASCII as a standard, which stored each character in 7 bits (range of 0-127). Although it was sufficient for “standard” English language typewriter symbols, this is rarely used anymore since it cannot handle languages other than English, nor can it handle very many extra symbols.

The Unicode standard is used for modern encoding. Every character known is represented in Unicode, and requires one or more bytes to store (i.e. it’s a multi-byte format). UTF-8 refers to unicode encoding for those characters which only require a single byte (the 8 refers to 8-bit encoding). UTF-16 and UTF-32 are used for characters that require 2 and 4 bytes respectively.

UTF-8 is considered standard encoding unless the data format required additional complexity.

Characters in text files are really just numerical values Characters in text files are really just numerical values

Structuring Text Files

We know that we will use UTF-8, but that only describes how the characters will be stored. We also need to determine how to structure our data in a way that reflects our data model. We’ll talk about three different data structure formats for managing text data, all of which will work with UTF-8.

Comma-separated values (CSV)

The simplest way to store records might be to use a CSV (comma-separated values) file.

We use this structure:

  • Each row corresponds to one record (i.e. one object)
  • The values in the row are the fields separated by commas.

For example, our transaction data file stored in a comma-delimited file would look like this:

1001, 2020-06-06T14:35:44, 78.22, CDN
1001, 2020-06-06T14:38:18, 12.10, USD
1002, 2020-06-06T14:42:51, 44.50, CDN

CSV is literally the simplest possible thing that we can do, and sometimes it’s good enough.

It has some advantages:

  • its extremely easy to work with, since you can write a class to read/write it in a few lines of code.
  • it’s human readable which makes testing/debugging much easier.
  • its fairly space efficient.

However, this comes with some pretty big disadvantages too:

  • It doesn’t work very well if your data contains a delimiter (e.g. a comma).
  • It assumes a fixed structure and doesn’t handle variable length records.
  • It doesn’t work very well with complex or multi-dimensional data. e.g. a Customer class.
// how do you store this as XML? the list is variable length
data class Customer(id:Int, name:String, transactions:List<Transactions>)

Data streams are used to provide support for reading and writing “primitives“ from streams. They are very commonly used.

  • File: FileInputStream, FileOutputStream
  • Data: DataInputStream, DataOutputStream
var file = FileOutputStream("hello.txt") 
var stream = DataOutputStream(file) 

stream.writeInt(100) 
stream.writeFloat(2.3f) 
stream.writeChar('c') 
stream.writeBoolean(true) 

We can use streams to write our transaction data to a file quite easily.

val filename = "transactions.txt"
val delimiter = "," // comma-delimited values 

// add a new record to the data file
fun append(txID:Int, amount:Float, curr:String = "CDN") { 
  val datetime = LocalDateTime.now() 
  File(filename).appendText("$txID $delimiter $datetime $delimiter $amount\n", Charsets.UTF_8)
}

Extensible Markup Language (XML)

XML is a human-readable markup language that designed for data storage and transmission.

Defined by the World Wide Web Consortium’s XML specification, it was the first major standard for markup languages. It’s structurally similar to HTML, with a focus on data transmission (vs. presentation).

  • Structure consists of pairs of tags that enclose data elements: <name>Jeff</name>
  • Attributes can extend an element: <img src="madonna.jpg"></img>

Example of a music collection structured in XML [ed. If you don’t know these albums, you should look them up. Steve Wonder is a musical genius. Dylan is, well, Dylan.]

An album is a record, and each album contains fields for title, artist etc.

<catalog>
	<album>
		<title>Empire Burlesque</title> 
		<artist>Bob Dylan</artist> 
		<country>USA</country> 
		<company>Columbia</company> 
		<price>10.90</price> 
		<year>1985</year> 
	</album>
	
  <album>
		<title>Innervisions</title> 
		<artist>Stevie Wonder</artist> 
		<country>US</country> 
		<company>The Record Plant</company> 
		<price>9.90</price> 
		<year>1973</year> 
  </album>
</catalog>

XML is useful, but doesn’t handle repeating structured particularly well. It’s also verbose when working with large amount of data. Finally, although it’s human-readable, it’s not particularly easy to read.

We’ll talk about processing XML in a moment. First, let’s talk about JSON.

JavaScript Object Notation (JSON)

JSON is an open standard file and data interchange format that’s commonly used on the web.

JSON consists of attribute:value pairs and array data types. It’s based on JavaScript object notation, but is language independent. It was standardized in 2013 as ECMA-404.

JSON has become extremely popular due to its simpler syntax compared to XML.

  • Data elements consist of name/value pairs
  • Fields are separated by commas
  • Curly braces hold objects
  • Square brackets hold arrays
Here’s the music collection in JSON. 

{ "catalog": { 
  "albums": [ 
     {
      "title":"Empire Burlesque", 
      "artist":"Bob Dylan", 
      "country":"USA", 
      "company":"Columbia", 
      "price":"10.90", 
      "year":"1988"
    	}, 
    { 
      "title":"Innervision",
      "artist":"Stevie Wonder",
      "country":"US",
      "company":"The Record Plant",
      "price":"9.90",
      "year":"1973"
    }
  ]
}} 

Advantages of JSON:

  1. Simplifying closing tags makes JSON easier to read.
{ "employees":[
	{ "first":"John", "last":"Zhang", "dept":"Sales"},
	{ "first":"Anna", "last":"Smith", "dept":"Engineering"} 
]} 

Compare this to the corresponding XML:

<employees>
	<employee><first>John</first> <last>Zhang</last> <dept>Sales</dept></employee>
	<employee><first>Anna</first> <last>Smith</last> <dept>Engineering</dept></employee>
</employees>
  1. JSON also handles arrays better.

Array in XML:

<name>Celia</name> 
<age>30</age> 
<cars> 
	<model>Ford</model> 
	<model>BMW</model> 
	<model>Fiat</model> 
</cars> 

Array in JSON:

{
 "name":"Celia",
 "age":30,
 "cars":[ "Ford", "BMW", "Fiat" ] 
} 

JSON is the data format for user preferences in VS Code JSON is the data format for user preferences in VS Code

So, our application data resides in data structures, in memory. How do we make use of JSON or XML?

  • To save: we convert objects into XML or JSON format, then save the raw XML or JSON in a data file.
  • To restore: we load XML or JSON from our data files, and instantiate objects (records) based on the file contents.

Data formats for file-based data Data formats for file-based data

Although you could write your own parser, there are a number of libraries out there than handle conversion to and from JSON quite easily.

XML parsing can be addressed by a number of parsers:

We’ll focus on using Kotlin’s serialization libraries to convert our objects to and from JSON directly!

Serialization

Info

Serialization is the process of converting your program to a binary stream, so that you can transmit it, or persist it to a file or database. Deserialization is the process of converting the stream back into an object.

This is really cool technology that we can use to save our objects directly without trying to save the indivual property values (like we would for a CSV file).

To include the serialization libraries in your project, add these dependencies to the build.gradle file.

plugins {
    id 'org.jetbrains.kotlin.multiplatform' version '1.6.10'
    id 'org.jetbrains.kotlin.plugin.serialization' version '1.6.10'
}

Also make sure to add the dependency:

dependencies {
    implementation "org.jetbrains.kotlinx:kotlinx-serialization-json:1.3.2"
}

You can then save your objects directly as a stream which you can save to a file, or a database. You can later use this stream to recreate your objects. See https://github.com/Kotlin/kotlinx.serialization for details.

Example from: https://blog.jetbrains.com/kotlin/2020/10/kotlinx-serialization-1-0-released/

@Serializable
data class Project(
   val name: String,
   val owner: Account,
   val group: String = "R&D"
)

@Serializable
data class Account(val userName: String)

val moonshot = Project("Moonshot", Account("Jane"))
val cleanup = Project("Cleanup", Account("Mike"), "Maintenance")

fun main() {
   val string = Json.encodeToString(listOf(moonshot, cleanup))
   println(string)
   // [{"name":"Moonshot","owner":{"userName":"Jane"}},{"name":"Cleanup","owner":
   //  {"userName":"Mike"},"group":"Maintenance"}]

   val projectCollection = Json.decodeFromString<List<Project>>(string)
   println(projectCollection)
   // [Project(name=Moonshot, owner=Account(userName=Jane), group=R&D), 
   // Project(name=Cleanup, owner=Account(userName=Mike), group=Maintenance)]
}

Note that to make this work, you have to annotate your class as @Serializable. It also needs to contain data that the compiler can convert to JSON.

Databases

A more scalable solution is to store data in a database: a formal system for organizing and managing data.

Databases range in size and complexity from simple file-based databases that run on your local computer, to large scalable databases that run on a server cluster. They are optimized for fetching text and numeric data, and performing set operations on this data.

They also have the advantage of scaling really well. They’re optimized not just for efficient storage and retrieval of large amounts of data, but also for concurrent access by hundreds or thousands of users simulaneously [ed. This is a huge topic and we’re not even scratching the surface. I’d strongly encourage you to take a more comprehensive database course e.g. CS 348.]

The same data can be stored in a relational database The same data can be stored in a relational database

A relational database is a particular database design that structures data as tables:

  • A table represents some logical entity e.g. Customer, Transactions. It consists of columns and rows i.e. like a grid.
  • A column is a field or data element in that table. e.g. a “Customer“ table might have “name“, “city“, “birthdate“ fields.
  • A row is a record, containing values for each field. e.g. ”Jeff Avery”, ”Waterloo”, ”June 23, 1985”. Each row in each table needs to be unique identified by a key. This can be a part of the data (e.g. timestamp) but is often a unique, generated numeric identifier (e.g. “customerid“, “transactionid“).

There are many other types of databases! We’ll use relational databases in this course b/c (a) they’re very common, and (b) they are suitable for any type of data/relations that you will encounter in this course.

Here’s our earlier example shown as a class, a set of records in a CSV file and as a set of relational database tables.

The same data in multiple representations The same data in multiple representations

Why is this approach valuable? Relational databases can quickly oprerate on these tables, and support operations on sets of records. For example:

  • Return a list of all purchases greater than $100.
  • Return a list of customers from ”Paris”.
  • Delete customers from ”Ottawa”.

Our example is pretty trivial, but imagine useful queries like:

  • ”Find all transactions between 2:00 and 2:30”, or
  • ”Find our which salesperson sold the greatest amount during last Saturday’s sale”.

What is SQL?

SQL (pronounced ”Ess-que-ell”) is a Domain-Specific Language (DSL) for describing your queries. Using SQL, you write statements describing the operation to perform and which tables to use, and the database performs the operations for you.

SQL is a standard, so SQL commands work across different databases [ed. SQL was adopted as a standard by ANSI in 1986 as SQL-86, and by ISO in 1987].

Using SQL, you can:

  • C reate new records
  • R etrieve sets of existing records
  • U pdate the fields in one or more records
  • D elete one or more records1

SQL has a particular syntax for managing sets of records:

<operation> FROM [table] [WHERE [condition]]
operations: SELECT, UPDATE, INSERT, DELETE, ...
conditions: WHERE [col] <operator> <value>

For example:

/* SELECT returns data from a table, or a set of tables. 
 * an asterix (*) means "all"
 */

SELECT * FROM customers 
SELECT * FROM Customers WHERE city = "Ottawa" 
SELECT name FROM Customers WHERE custid = 1001
/* UPDATE modifies one or more column values based on some criteria.*/

UPDATE Customer SET city = "Kitchener" WHERE cust_id = 1001 
UPDATE Customer SET city = "Kitchener" // uh oh, what is wrong with this?
/* INSERT adds new records to your database.  */

INSERT INTO Customer(cust_id, name, city) 
VALUES ("1005", "Sandra Avery", "Kitchener") 

INSERT INTO Customer(cust_id, name, city)
VALUES ("1005", "Austin Avery", "Kitchener") // uh oh, what have I done?

It’s common to have a record spread across multiple tables. A join describes how to relate data across tables.

Database Joins Database Joins

SELECT c.customer_id, c.first_name +   + c.last_name, t.date, p.name, p.cost 
	FROM Customer c, Transactions t, Products p
 	WHERE c.customer_id = t.customer_id
 	AND t.product_id = p.product_id

Returns data that spans multiple tables:

$ 1001, Jeff Avery, 12-Aug-2020, T-shirt, 29.95

Different types of joins that describe table and key relations Different types of joins that describe table and key relations

https://imgur.com/t/logic/v23nUwQ

SQLite

Relational databases are often large and complex systems. However, the relational approach can also be scaled-down.

SQLite (pronounced ESS-QUE-ELL-ITE) is a small-scale relational DBMS that supports SQL. It is small enough for local, standalone use.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers…"

https://www.sqlite.org/index.html

You can install the SQLite database under Mac, Windows or Linux.

  1. Visit the SQLite Download Page.
  2. Dowload the binary for your platform.
  3. To test it, launch it from a shell.
$ sqlite3

SQLite version 3.28.0 2019-04-15 14:49:49
 Enter ".help" for usage hints.
 Connected to a transient in-memory database.
 Use ".open FILENAME" to reopen on a persistent database. 

sqlite> .exit

We will create a database from the command line. Optionally, you can install SQLite Studio, a GUI for managing databases.

You can download the SQLite Sample Database and confirm that SQLite is working. We can open and read from this database using the command-line tools:

$ sqlite3 
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> .help
.auth ON|OFF             Show authorizer callbacks
.backup ?DB? FILE        Backup DB (default "main") to FILE
.bail on|off             Stop after hitting an error.  Default OFF
.binary on|off           Turn binary output on or off.  Default OFF
.cd DIRECTORY            Change the working directory to DIRECTORY
.changes on|off          Show number of rows changed by SQL
.check GLOB              Fail if output since .testcase does not match
.clone NEWDB             Clone data into NEWDB from the existing database
.databases               List names and files of attached databases
.dbconfig ?op? ?val?     List or change sqlite3_db_config() options
.dbinfo ?DB?             Show status information about the database

Commands

These commands are meta-commands that act on the database itself, and not the data that it contains (i.e. these are different from SQL and specific to SQLite). Some particularly useful commands:

Command Purpose
.open filename Open database filename.
.database Show all connected databases.
.log filename Write console to log filename.
.read filename Read input from filename.
.tables Show a list of tables in the open database.
.schema tablename SQL to create a particular tablename.
.fullschema SQL to create the entire database structure.
.quit Quit and close connections.

We often use these meta-commands to change settings, and find information that we’ll need to structure queries e.g. table names).

$ sqlite3
 SQLite version 3.28.0 2019-04-15 14:49:49
 Enter ".help" for usage hints.
 Connected to a transient in-memory database.
 Use ".open FILENAME" to reopen on a persistent database. 

sqlite> .open chinook.db // name of the file
sqlite> .mode column // lines up data in columns 
sqlite> .headers on // shows column names at the top 

// determine which tables to query
sqlite> .tables
albums   employees    invoices     playlists
artists   genres     media_types   tracks
customers  invoice_items  playlist_track

Queries

Once we’ve identified what we want to do, we can just execute our queries. Examples of selecting from a single table at a time:

  sqlite> SELECT * FROM albums WHERE albumid < 4;

  AlbumId     Title                     ArtistId
  ----------  ------------------------- ----------
  1           For Those About To Rock   1
  2           Restless and Wild         2
  3           Let There Be Rock         1     

  sqlite> SELECT * FROM artists WHERE ArtistId = 1;

  ArtistId    Name
  ----------  ----------
  1           AC/DC

We can also JOIN across two tables (based on a primary key, “ArtistId“’). You often will have multiple WHERE clauses to join between multiple tables.

  sqlite> SELECT albums.AlbumId, artists.Name, albums.Title
   FROM albums, artists
   WHERE albums.ArtistId = artists.ArtistId
   AND albums.AlbumId < 4;

  AlbumId     Name        Title
  ----------  ----------  ------------------------
  1           AC/DC       For Those About To Rock
  2           Accept      Restless and Wild
  3           AC/DC       Let There Be Rock

Accessing databases

How do we use a database from our applications? Kotlin doesn’t have built-in database support, but the framework includes a library for accessing external databases.

Specifically, Kotlin leverages the Java JDBC API (”Java DataBase Connectivity”). This provides a standard mechanism for connecting to databases, issuing queries, and managing results.

Creating a database project in IntelliJ:

  1. Create a Gradle/Kotlin project.
  2. Modify the build.gradle to include a dependency on JDBC.

implementation 'org.xerial:sqlite-jdbc:3.30.1'

  1. Use the Java SQL package classes to connect and fetch data.
import java.sql.Connection
import java.sql.DriverManager
import java.sql.SQLException

….

This example uses a sample database from the official SQLite tutorial.

var conn: Connection? = null 
fun connect() { 
  try {
    val url = "jdbc:sqlite:chinook.db" 
    conn = DriverManager.getConnection(url) 
    println("Connection established.") 
  } catch (e: SQLException) { 
    println(e.message) 
  } 
} 

fun query() { 
  try { 
    if (conn != null) { 
    // do something

JetBrains/Exposed

JDBC is a useful mechansism for connecting to remote databases, but making raw SQL calls is error-prone: there’s no type checking, or other safety mechanisms in-place. It also requires us to explicitly convert between string data and class objects that are holding our data.

There are a number of libraries that can leverage JDBC and provide a cleaner set of abstractions. We’ll discuss one of them: Exposed, which is a JetBrains Kotlin library for working with databases.

Dependencies

repositories {
  mavenCentral()
}
dependencies {
  implementation("org.jetbrains.exposed", "exposed-core", "0.40.1")
  implementation("org.jetbrains.exposed", "exposed-dao", "0.40.1")
  implementation("org.jetbrains.exposed", "exposed-jdbc", "0.40.1")
}

Getting Started

Every database access using Exposed is started by obtaining a connection and creating a transaction. We’ll use SQLite as an example. We can connect using the same database connection string we used in the previous section:

To get a connection:

Database.connect("jdbc:sqlite:chinook.db")

After obtaining a connection all SQL statements should be placed inside a transaction2:

transaction {
  // statements go here
}

Exposed comes in two flavors: DSL (Domain Specific Language) and DAO (Data Access Objects). DSL means a type-safe syntax that is similar to SQL; DAO means doing CRUD operations on entities like tables. They are both viable, but represent different types of abstraction for our database.

DSL Example

Here’s an example of a DSL query from the Exposed Wiki:

fun main(args: Array<String>) {
  Database.connect("jdbc:sqlite:cities.db")

  transaction {
    // print sql to std-out
    addLogger(StdOutSqlLogger)

    // create a table that reflects the Cities class structure
    SchemaUtils.create (Cities)

    // insert new city. SQL: INSERT INTO Cities (name) VALUES ('St. Petersburg')
    val stPeteId = Cities.insert {
      it[name] = "St. Petersburg"
    } get Cities.id

    // 'select *' SQL: SELECT Cities.id, Cities.name FROM Cities
    println("Cities: ${Cities.selectAll()}")
  }
}

object Cities: IntIdTable() {
    val name = varchar("name", 50)
}

DAO Example

Here’s the same example using Exposed DAO. This is similar to DSL, but the objects are more explicitly tied to the database structure.

fun main(args: Array<String>) {
  Database.connect("sqlite:cities.db")

  transaction {
    // print sql to std-out
    addLogger(StdOutSqlLogger)

    SchemaUtils.create (Cities)

    // insert new city. SQL: INSERT INTO Cities (name) VALUES ('St. Petersburg')
    val stPete = City.new {
            name = "St. Petersburg"
    }

    // 'select *' SQL: SELECT Cities.id, Cities.name FROM Cities
    println("Cities: ${City.all()}")
  }
}

object Cities: IntIdTable() {
    val name = varchar("name", 50)
}

class City(id: EntityID<Int>) : IntEntity(id) {
    companion object : IntEntityClass<City>(Cities)
    var name by Cities.name
}

For details on how to use the Exposed DAO API, see the DAO Example page.


  1. These are considered the set of basic functions for working with persistant storage. Yes, we use the acronym CRUD. You’ll occasionally trip over the term “CRUD applications” which just means an application whose primary purpose is to interact with an underlying database. An example would be something like a simple app for tracking a list of customers or sales data. ↩︎

  2. A transaction represent a scope of operations that should be done together, in the order presented. If any one of the operations in the transaction fails, none of the operations will be applied. ↩︎