About ISA5810, Fall 2021

Data mining is a subject involving algorithms for seeking unexpected pearls of wisdom. These algorithms are adapted from various domains including machine learning, artificial intelligence, pattern recognition, statistics, and database systems. This course covers the concepts, background information, techniques, and applications associated with Data Mining.

The covered topics include:

  • Association Rules
  • Clustering
  • Classification
  • Social Network Mining
  • Text Mining
  • Data Mining Applications

Text Book

    Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison Wesley

Time in 2021

  • Tuesday 9:00PM-10:20PM
  • Tuesday 10:30AM-11:40PM

Location

  • Delta 106
  • Teams

People

Supporting this course

Instructor:

Yi-Shin Chen

Yi-Shin Chen

She offers the fundamental database course and advance database courses for more than a decade. Her current research interests are: social networks, data mining, emotion analysis, and web intelligence.

  • email: yishin@gmail.com
  • phone: +886-3-573-1211
  • office: Delta 607
  • office hours: Mondays 15:00-16:00

Teaching Assistants:

Fernando Calderon

Fernando Calderon
  • email: fhcalderon87@gmail.com
  • office: Delta 701
  • office hours: By email appointment

Khemmanat (Moo) Seehawallop

Moo
  • email: k.seehawallop@gmail.com
  • office: Delta 701
  • office hours: By email appointment

Syllabus

Orientation

9/14 for 3 hours

You will get familiar with the course, the instructor, your classmates, and the learning environment. The overview of the course will also be covered during the same session.

Activities

Interesting Videos

Overview and Data

10/5, 10/12 for 6 hours

Understanding and reprocessing the data is the most important part in the whole data mining processes. In this session, the types of attributes and characteristics of data sets will be introduced first. Subsequently, several data preprocessing techniques are covered and followed by various similarity and distance measures. Finally, data visualization concepts will be addressed.

Related Videos

Activities

  • Classes are offered in Teams
  • Team up with at most 5 students
  • Proposal for the final project

Lab for Data Exploration and Management using Numpy, Pandas, Tensorflow, and Plotly

10/19 for 3 hours

In this lab session we will focus on the use of scientific computing libraries to efficiently process, transform, and manage data. Furthermore, we will provide best practices and introduce visualization tools for effectively conducting big data analysis.

Activities

  • Classes are offered in Teams
  • Assignment One should be submitted before Nov 4

Classification

10/26, 11/2 for 6 hours

Classification, also referred as supervised learning techniques, is widely used in data mining and machine learning. The objective of classification is to predict a class from input data. In this session, various fundamental algorithms, as well as core measures, will be covered.

Related Videos

Activities

  • Classes are offered in Teams

Text Mining

11/9, 11/16 for 6 hours

Text mining is the process to extract valuable information from unstructured text. Text mining usually involves natural language processing (NLP) processes, including lexical analysis, syntactic analysis, and inference. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, and entity relation modeling. We will cover some concepts in this session.

Related Videos

Activities

  • Classes are offered in Teams

Student Presentation & Discussion

11/23 for 3 hours

Time for student presentation and discussion based on one specified paper

Activities

  • Classes are offered in Delta 102
  • Team up with classmates.
  • Each group prepare slides no more than 3+1 slides.
  • Each group also prepare a good quiz question for your classmates

Lab for Deep Information Retrieval and Neural Word Embeddings

11/30 for 3 hours

In this lab session we will focus on the use of information retrieval techniques for modeling, training, and classifying textual data. Specifically, deep learning frameworks, such as word2vec, doc2vec, and FastText, will be introduced. Additionally, traditional approaches used for text classification, such as KNN, SVM, and Naive Bayesian, will also be covered.

Activities

  • Classes are offered in Teams
  • Assignment Two should be submitted before Nov 26

Cluster Analysis

12/07, 12/14 for 6 hours

Cluster analysis is the task to group objects, where the objects in the same cluster are more similar to each other than those in other clusters. Many clustering techniques are initially adopted in pattern recognition/signal processing domain, but now adapted in many other domains, such as data mining, machine learning, information retrieval, data compression, and computer graphics. In this session, we will introduce and discuss various clustering techniques.

Related Videos

Activities

  • Classes are offered in Teams

Association Rules

12/21, 12/28 for 3 hours

Association rule learning discovers interesting relations between variables from large data sets. By utilizing interestingness and confidence measures, it can identify strong rules discovered from data sets. It is widely used for product recommendations. In this session, several basic algorithms will be introduced and discussed.

Activities

  • Reading: J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD’95
  • Reading: J Han, J Pei, Y Yin, R Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, 2004 - Springer
  • Reading: N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999
  • Reading: R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables”. ACM SIGMOD96

Related Videos

Activities

  • Classes are offered in Teams

Examination

01/04 for 3 hours

Time to evaluate. Different from other examination in our life, we do not want to assess how much we remember. It is more important to know how much we understand. Hence, each student can bring one A4-page paper with all kinds of notes into the classroom. Enjoy.

Notes

  • Students can take one A4 page with them
  • The locations will be annouced through emails

Final Project Demo

1/11 3 hours

Realizing by learning, thinking, and doing. This is the final realizing step, to develop a fine application with data mining skills, for this course. Through this practice, we not only polish our DM skills but also experience team works. With teammates, we dare to dream.

Requirements

  • Each group should generate 4 minute youtube clips to show in the class
  • Final project requirement description will be given through emails
  • Presentation will be given at Delta 102h