Introduction to Data Science

An introduction to data science methods and tools, covering the full pipeline from data acquisition to analysis and visualization.

Term: Spring

The CS 1656 / CS 2056: Introduction to Data Science course provides an overview of data science technologies and techniques, offering a holistic view of the field, from data management & manipulation to data analysis and data presentation.

The course covers the main data management/querying paradigms (Relational/SQL, Graph/Cypher, RDF/SPARQL) along with information retrieval, recommender systems, data warehousing, data mining, data visualization, classification, and other data analysis and data visualization topics.

The course utilizes Python as the default programming language and leverages existing libraries as appropriate. No prior Python experience is assumed, but a strong programming background (e.g., in Java, from the prerequisites chain leading to CS 1501) is expected.

Topics include:

  • Data Mining
  • Clustering
  • Information Retrieval / PageRank
  • Recommender Systems
  • Classification / Decision Trees
  • Relational Databases / SQL
  • Graph Databases / Cypher
  • Data Warehousing
  • Data Visualization
  • Python Data Science Libraries

For current course materials, schedules, and assignments, please refer to Canvas.