CS 100 (Learn)CS 100 (Web)Module 01


Digitization (part two)

(direct YouTube link)

NOTE: If your internet access is restricted and you do not have access to YouTube, we have provided alternate video links.

TRANSCRIPT

Now that we understand discrete data, and how digital information is really just discretized data that can be stored on a computer, let's consider another example.

We are going to revisit the discretization of your height, which is a continuous quantity. What if we want to plot your height from the moment of your birth to now, and we want to plot it on a computer?

The real, "true" plot of your height would be a perfectly smooth, continuous curve. Sure, you may have had a growth spurt when you were around thirteen, but even that growth spurt was smooth and continuous: you didn't instantaneously grow, despite what your grandmother may have said.

We are going to call this height curve a signal, which is a fancy term for a quantity that changes over time.

We want to discretize your height signal, but this discretization is going to be a bit more interesting because we need to discretize in two different ways (or dimensions).

First, we are going to have to discretize in time, which is also known as sampling. How often are we going to measure (or sample) your height? Once a year? Once a month? Once a second? In practice, you want to sample at a regular interval, which is known as a sampling frequency, and it depends on how how often the signal changes. For your height, once every five years is definitely too infrequent and would not capture the data very well. Every year is pretty common, and your parents may have measured your height every year when you were little. To properly discretize your growth, every sixty days is probably pretty good.

Sampling frequencies are measured in Hertz [Hz], which measures how many times something happens per second. If we were to sample your height once every second that would be one Hertz [1 Hz]. If we measured it ten times a second, it would be ten Hertz [10 Hz]. We are sampling your height every sixty days, which is one hundred and ninety three nanoHertz [193 nHz]. Later, we'll learn how computers can do billions of operations per second, and how computer speeds are measured in GigaHertz [GHz].

So by sampling your height every sixty days we have discretized the time (or the horizontal axis). We now need to discretize your height (or the vertical axis). Again, we have the question of what an appropriate amount of discretization will be. Measuring to the closest millimeter is probably good enough.

Note that if we zoom in, we can see that your plot has the "jerks" or "jumps" associated with discrete data. Remember, that because of discretization those jumps will always exist. We used a good sampling rate and discretized your height to a pretty good precision so it looks good. If we had used a lower precision, the jumps would be even more apparent. The magic of modern computers is that they can use really high precision to make things appear smooth, but you should never forget that those discrete jumps are there.

If throughout this entire video, you've been wondering how we're going to measure your height signal in the past, I forgot to mention that to do this we are going to need a time machine. Forgive me, I wanted to focus on the digitization, not on the practicalities. However, it does raise an issue that some people struggle with, and that is the loss of information that occurs every day. We generate so much digital content, that we sometimes forget that digitization is not perfect. The "high precision" we use today may not be enough for historians of the future. It's unlikely that they would be concerned that your selfies are throwing away information, but it's hard to know for sure. All we can do is be mindful that we are digitizing, and use high levels of precision.