Note that there separate sets of assignments for CS 451/651 and CS
431/631. Make sure you work on the correct assignments!
CS 431/631 Assignments
CS 431 Final Project
This final project is only for undergraduate students taking CS 431.
The topic of the final project is analysing datasets related to the cryptocurrency market. The topic can be on anything you wish in the space of big data related to the cryptocurrency market. Anything reasonably related to topics that are covered in the course is within scope. For reference, there are two types of projects you might consider:
-
Learn additional capabilities (e.g., visualization) of Python and Jupyter, and use them to build an interactive notebook for visualizing or exploring the cryptocurrency market related dataset of your choice. Your interactive notebook should interact with Spark, so that it will be capable of supporting exploration of data sets that are too large to fit in the memory of a single machine.
-
Perform some interesting data science. Is there a particular cryptocurrency market dataset you'd like to explore or analyze? Your project could involve performing interesting analytics on a dataset—here, the focus would be the analytical product and the insights gleaned, as opposed to the raw algorithms themselves.
The use of Apache Spark should be justified in your project. For example, if you analyze only 1 MB of data, isn't it better to use Python? Remember that it is okay
to analyze a smaller dataset if (1) the dataset can potentially be considered big data. For example, using 20 MB of Twitter data makes sense because it can be potentially much bigger, (2) your Spark solution is scalable. Even if you are testing it on smaller datasets, it can potentially handle much bigger datasets. If you do not follow this rule, you cannot get more than 50% of the project mark.
Group size
You may work in groups of up to three, or you can also work by yourself if you wish. It is strongly recommended to work in a group. The amount of effort devoted to the project should be proportional to the number of people in the team. As a guideline, the level of effort should be comparable to two to three assignments per person.
Deliverables
The deliverable for the final project is a short video up to 5 minutes (Note: longer videos will not be accepted even if it is 5:01!). Your final project will be evaluated according to the following criteria, with roughly equal weight placed on each one.
Your final project will be evaluated according to the following
criteria, with roughly equal weight placed on each one.
- Scope/Relevance: Is the objective clear? Is
the project related, course-related, and substantial enough?
- Methodology: Is the methodology appropriate
and clearly described?
- Evaluation: Did you evaluate your work? Did
you achieve your objective? If not, did you explain why not?
- Presentation: Is your video report well organized
and clear?
Your report should clearly indicate where you obtained any data that
you used in your project. Include a link to the data if possible.
Submission
Please submit your project here. Each group must submit only once.
The (hard) deadline for submission of your project report is 4 pm on Tuesday December 14, 2021.
Back to top
CS 631 Final Project
This final project is only for graduate students taking CS 631.
The topic of the final project can be on anything you wish in the
space of big data. Anything reasonably related to topics that are
covered in the course is within scope. For reference, there are four
types of projects you might consider:
- Learn additional capabilities (e.g., visualization) of Python
and Jupyter, and use them to build an interactive notebook for visualizing
or exploring a dataset of your choosing. Your interactive
notebook should interact with Spark, so that it will be capable
of supporting exploration of data sets that are too large to fit
in the memory of a single machine.
- Implement a big data algorithm in Spark: choose a
particular big data algorithm (for processing text, graphs,
relational data, etc.) and implement it. Ideally, the implementation
does not already exist in a library or open-source package. Since we
want you to implement the algorithm from scratch, it might perhaps
be too tempting to simply copy existing
code—see notes on academic
integrity.
- Learn and explore a (new) big data processing framework:
although we discussed a variety of processing frameworks in class,
the assignments focused on Spark. Here's your chance
to learn a new processing framework, e.g., Spark Streaming, GraphX,
Giraph, Flink, etc. The project would involve learning to use the
processing framework and doing something interesting with it. The
"something interesting" might be a data mining algorithm, although
the expectations would be lower than building something in
Spark, since learning the new framework would form an
essential component of the project.
- Perform some interesting data science. Is there a particular
dataset you'd like to explore or analyze? Your project could involve
performing interesting analytics on a dataset—here, the focus
would be the analytical product and the insights gleaned, as opposed
to the raw algorithms themselves. However, a superficial analysis
with existing machine-learning libraries is not enough.
You may work in groups of up to three, or you can also work by
yourself if you wish. The amount of effort devoted to the project
should be proportional to the number of people in the team. As a
guideline, the level of effort should be comparable to
two assignments per person.
Deliverable
The deliverables for the final project are a report and a video.
- Report:
Use the ACM
Templates. The contents of the report will vary depending
on the type of project you are doing. However, it should certainly
describe the goal of you project (what is your learning objective,
or what problem are you trying to solve), your methodology, and some
kind of evaluation of your results or progress.
There are no hard limits on the length of your final report, but you
should target something in the range of 4-6 pages.
- Video:
In addition to your final report, you are required to deliver a short video
up to 5 minutes (Note: longer videos will not be accepted even if it is 5:01!).
This video must explain (1) the idea, (2) methodology used, (3) implementation, and (4) results.
This is a great opportunity to show your results specially if you have interactive components in your project.
Evaluation
Your final project will be evaluated according to the following
criteria, with roughly equal weight placed on each one.
- Scope/Relevance: Is the objective clear? Is
the project course-related and substantial enough?
- Methodology: Is the methodology appropriate
and clearly described?
- Evaluation: Did you evaluate your work? Did
you achieve your objective? If not, did you explain why not?
- Presentation: Is your report well organized
and clearly written?
Your report should clearly indicate where you obtained any data that
you used in your project. Include a link to the data if possible.
Submission
Please submit your project here. Each group must submit only once.
The (hard) deadline for submission of your project report is 4 pm on Tuesday December 14, 2021.
Back to top