CS 448/648 (Winter 2012)
Database Systems Implementation
Contents
Course Information
Note: There will be no CS448/648 lecture on Thursday Feb 16th, the
date of the midterm exam. On that day (only), instructor
office hours will run from 1:00-4:00 in DC 3512.
Meeting Time and Place: |
TTh 1:00-2:20 in PHY 313 |
Instructor: |
Ken
Salem, DC3512, userid: kmsalem |
Office Hours: |
Mon 1:00-2:00, Tue 4:00-5:00, Thu 3:00-4:00 |
|
April office hours: Apr 5th 2:30-4:00, Apr 9th
1:00-2:00, Apr 10th 3:00-5:00 |
TAs: |
Changjiu Jin, userid: c4jin |
|
Shahab Kamali, userid: skamali |
|
Xin Liu, userid: x39liu |
TA Office Hours: |
Shahab will hold office hours Wed 11:30-1:30 in DC3306 |
Discussion Board: |
We will be using Piazza for on-line
discussion and announcements. We will be
using Waterloo's LEARN
system only for posting exam and assignment marks.
Course personnel will not be monitoring the uw.cs.cs448
newsgroup this term. |
E-mail to course personnel can be directed to userid at cs dot
uwaterloo dot ca. The userids are listed above.
Course Description and Objectives
This is a second course on databases that focuses on DBMS internals. It is a project-oriented course that will provide the students, upon successful completion, with an appreciation of the intricacies and complexities of a DBMS and enable them to be able to design and implement the major components of it. The course objective will be achieved by focusing on three fundamental sub-objectives:
- To understand the fundamentals of storage systems and disk-based data structures;
- To understand the process of query processing and optimization; and
- To learn the implementation of transactions.
Complementary to the above objectives, the course has a training
component where the students will gain experience, within the context
of a number of assignments, in building components of a DBMS and
incorporating them into an open source system such as MySQL or
PostgreSQL. The lectures may be complemented by guest lectures on
real-life DBMS implementation issues given by colleagues from
industry. Further details can be found at
http://www.cs.uwaterloo.ca/current/courses/course_descriptions/cDescr/CS448.
Workload and Evaluation
The course marking scheme for CS448 is:
- assignments: 40% (5% A1, 20% A2, 15% A3)
- midterm exam: 20%
- final exam: 40%
In addition, you must pass the exams in order to pass the course.
That is, having a weighted exam average above 50% is a necessary
(but not sufficient) condition for passing.
Graduate students enrolled in CS648 have different requirements,
and will use a different marking scheme.
See the CS648 section for more information.
Assignment and exam marks will be posted on
UW's LEARN on-line course
system. Log in to LEARN using your WatIAM userId
and password and select CS448/648 to view your marks once they
have been posted.
Mark posting is the only thing we will be using LEARN
for this term. All course materials can be found on this web
page, and Piazza will be used for discussion and announcements.
Optional Alternative Workload for CS448 Students
CS448 students who wish to participate in
the SIGMOD
2012 Programming Contest have the option of doing so as a CS448
course project. CS448 students who choose this alternative will
be evaluated using the CS648 workload and marking scheme, rather than
the standard CS448 scheme. That is, their programming contest project
will replace CS448 Assignment 3, and they will
use the same project, exam and assignment marking
weights as CS648 students.
Details about the requirements for the SIGMOD programming contest as
a course project can be found on the CS648 project page.
CS448 students who wish to use this alternative must elect to do so
no later than January 24th. This election is not revocable after
January 24th.
CS448 students who are interested in this alternative should elect
to use it by sending e-mail to the instructor by the January 24th
deadline. The e-mail should include name, ID number, and UW e-mail
address. If you expect to work on the project with a group, please
also include the names of the other group members.
Schedule and Lecture Notes
The table below shows planned lecture topics, along with
the corresponding textbook chapters and links to the lecture notes (as
they become available). The course textbook is Database Management
Systems. R. Ramakrishnan and J. Gehrke. McGraw-Hill. 3rd Edition.
The textbook is optional. It will be available through the
Davis Centre Library course reserves.
Lecture notes are available in PDF format, one slide per page.
Using Adobe Acrobat or a similar program, you should be able to
print these with multiple slides per page to save paper and cost.
Printing two slides per page, double-sided is a good option - the
slides are still easy to read, even with substantial margins
(for note taking) left around the slides.
Start |
End |
Topic |
Chapters |
Lecture Notes |
Jan 03 |
Jan 03 |
Intro |
n/a |
PDF |
Jan 03 |
Jan 05 |
DBMS Architecture |
n/a |
PDF |
Jan 05 |
Jan 19 |
Data Storage |
Chapter 9 |
PDF |
Jan 05 |
Jan 19 |
Access Methods |
Chapters 8,10,11 |
PDF |
Jan 24 |
Feb 2 |
Query Processing |
Chapters 12,13,14 |
PDF |
|
Feb 2 |
Mar 6 |
Query Optimization |
Chapter 15 |
PDF |
|
Feb 9 |
Feb 9 |
Spatial Indexing (guest lecture) |
Chapter 28 |
PDF |
|
Mar 6 |
|
Concurrency and Failures |
Chapters 16,17,18 |
PDF |
|
Mar 27 |
|
Distributed Data Management |
Chapter 22 |
PDF |
|
Assignments
- Resources
-
- Assignment 0: PostgreSQL setup
-
- Before you can start on Assignment 1, you will need to
configure, build and install copy of PostgreSQL. You will also need to
familiarize yourself with how to use PostgreSQL, and with the
structure of the source code. Assignment 0 will guide you through
these steps
- This assignment is not marked and there is nothing to
submit.
- There is no due date for this assignment, but you will need
to complete it before you can start on Assignment 1.
- Assignment 1: Prefix Key Compression
-
- Due: 12:00 (noon), Tuesday 24 January. Late
assignments will not be accepted.
- assignment specification
- The sample data file words.txt and the
corresponding SQL command file (words.sql) can be
found in the directory /u/cs448/public in
the student.cs computing environment.
- a second A1 test case is now available. The test case
consists of two files, which you will find
in /u/cs448/public/.
- httplog.csv: data file, in CSV
format
- httplog.sql: command file that
will create a table and an index, and then loads the table
using the httplog.csv file.
This table contains data from a web server log.
The sql file defines an index on the request
field, which contains the actual text of an http request handled
by the web server. The request field contains
duplicates, and the values are generally longer than those in
the words.txt file.
- a third A1 test case is now available. The test case
consists of two files, which you will find
in /u/cs448/public/.
- nycbaby.csv: data file, in CSV
format
- nycbaby.sql: command file that
will create a table and an index, and then loads the table
using the nycbaby.csv file.
The data in this table give the frequency of different baby names
in New York City from 1920 to the present. For each year, it
identifies the names given to babies and number of times each
name was given in that year. There are duplicates in the name
field (because most names are repeated once for each year), but
the values are shorter than those in the httplog data.
(Note that you will find the names FEMALE
and MALE in the name column of this table.
These values were used to represent those babies that were not
assigned names at birth.)
- This is an individual assignment, not a group
assignment.
- This assignment is to be submitted electronically using
the submit command. Details can be found in the
assignment specification. Marked assignments will be returned
electronically.
- Assignment 2: Symmetric Hash Join
-
- Due: 12:00 (noon), Tuesday 6 March. Late
assignments will not be accepted.
- assignment specification
- A2 may be done individually or in groups of two. If you
plan to work in a group, you should register your group no
later than Wednesday, February 1st. Shahab has put up
a simple web site that you should use to register
your group. (Please contact Shahab if you have problems with
group registation.) We will create a Linux group on
the student.cs Linux servers for each registered group.
This can be useful if group members want to share access to a
single copy of the source code.
- General discussion of the assignment (e.g.,
on Piazza)
is encouraged. However, the work that you submit (or your
group submits) must be yours. Copying or sharing code
among groups is a violation of academic
integrity rules, and is not permitted. We will use copy detection
tools to cross-check submissions.
- This assignment is to be submitted electronically using
the submit command. Details can be found in the
assignment specification. Marked assignments will be returned
in class.
- A simple test case, consisting of three files, can be found
in /u/cs448/public in the student.cs
environment. The files are:
Copy or use these files directly from /u/cs448/public
or download them using the links above.
- A2 requires specific debugging output from the PostgreSQL
server, including summary information and trace information.
The required formatting for this debugging output is shown
in this example. Please note
that:
- the debugging output is produced only when the server is
started with the -d 1 command line option, and
- each line of required debugging output starts with the
string CS448, and
- the debugging output includes all of the information
required by A2 and no extraneous informtion, and
- indenting is used to make the output easier to read.
- some test data and queries used in marking A2:
- If there is a problem with the marking of your Assignment 2, you may
request that it be reappraised. To do this, please
print and complete this reappraisal
form, attach it to your A2 marking summary sheet, and return both to the
instructor. Alternatively, you may submit your A2 for
re-appraisal by bringing the
marking summary sheet to an office hour.
The last day for submitting re-appraisal requests for A2
is Friday March 30th.
- Assignment 3: Query Optimization
-
- Due: 12:00 (noon), Monday 2 April. Late
assignments will not be accepted.
- assignment specification
- A3 may be done individually or in groups of two. We will
assume that you will be working in the same groups as you did for
Assignment 2 unless we hear otherwise from you. Please contact
Shahab by March 13th if this is not the case.
- This assignment is to be submitted electronically using
the submit command. Details can be found in the
assignment specification. Marked assignments will be
available for pickup from the instructor's office.
Exams
- Midterm Exam
-
- Exam Date: Thursday February 16, 19:00-21:00
- Exam Location: MC4061
- pens and brains only: no books, no notes, no devices
- if you have a conflict with the scheduled midterm time, please
notify the instructor (kmsalem) of your conflict by e-mail
by Tuesday, February 1st. In your message, please
include the following information: reason for the conflict (e.g.,
another course), time of the conflicting course, your name and
student ID number.
- Marked midterms will be returned in class.
- Midterm Exam Sample Solution
- If there is a problem with the marking of your exam, you may
request that your exam be reappraised. To do this, please
print and complete this reappraisal
form, attach it to your exam, and return both to the
instructor. The deadline for submitting midterm reappraisal
requests is Thursday March 8th.
- Final Exam
-
- Exam Date: Wednesday 11 April, 09:00-11:30
- Exam Location: RCH 112
- pens and brains only: no books, no notes, no devices
- Final
exam information from the registrar.
- Study Materials
-
- Textbook exercises can be a useful way to ensure that you
understand the course material. Here are some suggested exercises
from the course textbook:
- Chapter 8: 8.3,8.4,8.10,8.11
- Chapter 9: 9.7,9.8,9.12,9.22
- Chapter 10: 10.1,10.2,10.4,10.10
- Chapter 11: 11.3,11.7
- Chapter 12: 12.2,12.4
- Chapter 13: 13.1
- Chapter 14: 14.2,14.3,14.4,14.5
- Chapter 15: 15.2,15.4,15,7,15.8,15.9
- Chapter 16: 16.1,16.2,16.3,16.5
- Chapter 17: 17.2 (conflict serializability only),17.8
- Chapter 18: 18.3,18.4,18.5
- Chapter 22: 22.14
- The textbook's authors have published a
solution
manual that includes solutions for odd-numbered exercises up
through and including Chapter 21.
- First S07 midterm exam
and model solution.
- Second S07 midterm exam
and model solution.
- W11 midterm exam
and model solution.
- W12 midterm exam
and model solution.
CS648 Information
Students enrolled in CS648 are responsible only for the first two
assignments (A1 and A2). Instead of A3, CS648 students must do a
final project.
Information about the CS648 final project is available on the
CS648 project information page.
CS648 will use the following marking scheme, which differs from the
CS448 marking scheme because of the project:
- assignments: 20% (4% A1, 16% A2 )
- midterm exam: 10%
- final exam: 40%
- course project: 30%
Like CS448 students, CS648 students must pass the exams to pass the
course. That is, having a weighted exam average above 50% is a necessary
(but not sufficient) condition for passing.
Academic Integrity
In order to maintain a culture of academic
integrity, members of the University of Waterloo community are
expected to promote honesty, trust, fairness, respect and
responsibility. All members of the UW community are expected to hold
to the highest standard of academic integrity in their studies,
teaching, and research. The Office of Academic Integrity's website
(www.uwaterloo.ca/academicintegrity)
contains detailed information on
UW policy for students and faculty. This site explains why academic
integrity is important and how students can avoid academic
misconduct. It also identifies resources available on campus for
students and faculty to help achieve academic integrity in - and out -
of the classroom.
Grievance: A student who believes that a decision affecting some
aspect of his/her university life has been unfair or unreasonable may
have grounds for initiating a grievance. Read Policy 70 - Student
Petitions and Grievances, Section 4,
http://www.adm.uwaterloo.ca/infosec/Policies/policy70.htm
Discipline: A student is expected to know what constitutes academic
integrity, to avoid committing academic offenses, and to take
responsibility for his/her actions. A student who is unsure whether an
action constitutes an offense, or who needs help in learning how to
avoid offenses (e.g., plagiarism, cheating) or about "rules" for
group work/collaboration should seek guidance from the course
professor, academic advisor, or the Undergraduate Associate Dean. When
misconduct has been found to have occurred, disciplinary penalties
will be imposed under Policy 71 - Student Discipline. For information
on categories of offenses and types of penalties, students should
refer to Policy 71 - Student Discipline,
http://www.adm.uwaterloo.ca/infosec/Policies/policy71.htm
Avoiding Academic Offenses: Most students are unaware of the line
between acceptable and unacceptable academic behaviour, especially
when discussing assignments with classmates and using the work of
other students. For information on commonly misunderstood academic
offenses and how to avoid them, students should refer to the Faculty
of Mathematics Cheating and Student Academic Discipline Policy,
http://www.math.uwaterloo.ca/navigation/Current/cheating_policy.shtml
Appeals: A student may appeal the finding and/or penalty in a decision
made under Policy 70 - Student Petitions and Grievances (other than
regarding a petition) or Policy 71 - Student Discipline if a ground
for an appeal can be established. Read Policy 72 - Student Appeals,
http://www.adm.uwaterloo.ca/infosec/Policies/policy72.htm