OA3802 Computational Methods for Data Analytics

(Same as CS3802.) This course introduces several tools for analysts to acquire, store, access, clean, and merge relevant data, so as to produce a dataset that can be analyzed with necessary tools. The topics include binary data, popular text formats, bash command interpreter, relational and NoSQL databases, web scraping methods, parallel processing, and geographic data. Students will be introduced to high-performance computation facility, such as NPS Hamming and Grace clusters.

Prerequisite

OA2801 or consent of instructor

Lecture Hours

4

Lab Hours

0

Course Learning Outcomes

Upon successful completion of this course, students will achieve the following learning outcomes:

·      Students will know how to identify and correct bottlenecks to their Python code. They
will be able to utilize a Linux environment via an ssh connection for remote computation.

·      Students will be able to wrap fast Fortran and/or C code in Python.

·      Students will be able to identify the difference between multithreading and multiprocessing parallelism and write programs that perform both in Python.

·      Students will be able identify the benefits and weaknesses of using cloud computing for various computational needs, and have practice utilizing a cloud machine for a computational task.

·      Students will be able to clone a git repository, and push and pull commits to a repository. They will be able to develop Python packages for easy distribution of research code.

·      Students will be able to interact with relational databases using the Structured Query Language (SQL).

·      Students will be able to write Bash scripts for automating tasks on remote Linux computers.