About ISA5810, Fall 2018

Data mining is a subject involving algorithms for seeking unexpected pearls of wisdom. These algorithms are adapted from various domains including machine learning, artificial intelligence, pattern recognition, statistics, and database systems. This course covers the concepts, background information, techniques, and applications associated with Data Mining.

The covered topics include:

  • Association Rules
  • Clustering
  • Classification
  • Social Network Mining
  • Text Mining
  • Data Mining Applications

Text Book

    Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison Wesley

Time in 2018

  • Monday 12:30PM-13:45PM
  • Monday 13:55AM-15:10PM

Location

  • Delta 106

People

Supporting this course

Instructor:

Yi-Shin Chen

Yi-Shin Chen

She offers the fundamental database course and advance database courses for more than a decade. Her current research interests are: social networks, data mining, emotion analysis, and web intelligence.

  • email: yishin@gmail.com
  • phone: +886-3-573-1211
  • office: Delta 607
  • office hours: Mondays 9:00-10:00

Teaching Assistants:

Elvis Omar Saravia

Elvis Omar Saravia
  • email: ellfae@gmail.com
  • office: Delta 701
  • office hours: By email appointment

Evan Yu

Evan Yu
  • email: evan800112@gmail.com
  • office: Delta 701
  • office hours: By email appointment

Ray Lee

Ray Lee
  • email: gn01697933@gmail.com
  • office: Delta 701
  • office hours: By email appointment

Syllabus

Orientation

2018/9/10 for 2 hours

You will get familiar with the course, the instructor, your classmates, and the learning environment. The overview of the course will also be covered during the same session.

Activities

Interesting Videos

Overview and Data

2018/9/17, 10/1 for 6 hours

Understanding and reprocessing the data is the most important part in the whole data mining processes. In this session, the types of attributes and characteristics of data sets will be introduced first. Subsequently, several data preprocessing techniques are covered and followed by various similarity and distance measures. Finally, data visualization concepts will be addressed.

Related Videos

Activities

  • Project proposal due at Sep 30
  • Team up with at least one Taiwan citizenship student

Lab for Data Exploration and Management using Numpy, Pandas, Tensorflow, and Plotly

10/8 for 3 hours

In this lab session we will focus on the use of scientific computing libraries to efficiently process, transform, and manage data. Furthermore, we will provide best practices and introduce visualization tools for effectively conducting big data analysis.

Activities

  • Attend the class with your personal laptop
  • Assignment One should be submitted before Oct 22

Text Mining

10/15 for 3 hours

Text mining is the process to extract valuable information from unstructured text. Text mining usually involves natural language processing (NLP) processes, including lexical analysis, syntactic analysis, and inference. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, and entity relation modeling. We will cover some concepts in this session.

Related Videos

Student Presentation & Discussion

10/22 for 3 hours

Time for student presentation and discussion based on one specified paper

Activities

  • Team up with classmates.
  • Each group prepare slides no more than 3+1 slides.
  • Each group also prepare a good quiz question for your classmates

Classification

10/29, 11/12 for 6 hours

Classification, also referred as supervised learning techniques, is widely used in data mining and machine learning. The objective of classification is to predict a class from input data. In this session, various fundamental algorithms, as well as core measures, will be covered.

Related Videos

Lab for Deep Information Retrieval and Neural Word Embeddings

11/05 for 3 hours

In this lab session we will focus on the use of information retrieval techniques for modeling, training, and classifying textual data. Specifically, deep learning frameworks, such as word2vec, doc2vec, and FastText, will be introduced. Additionally, traditional approaches used for text classification, such as KNN, SVM, and Naive Bayesian, will also be covered.

Activities

  • Attend the class with your personal laptop
  • Assignment Two should be submitted before Nov 26

Final Project Progress Discussion Part I -- Prepare for Semi-Final

11/19 for 3 hours

Time for student presentation and discussion based on specified papers

Activities

  • Each group prepare report and slides.
  • Animated slides should be recorded within 5 minutes and upload to Youtube

Cluster Analysis

11/26, 12/3 for 6 hours

Cluster analysis is the task to group objects, where the objects in the same cluster are more similar to each other than those in other clusters. Many clustering techniques are initially adopted in pattern recognition/signal processing domain, but now adapted in many other domains, such as data mining, machine learning, information retrieval, data compression, and computer graphics. In this session, we will introduce and discuss various clustering techniques.

Related Videos

Final Project Progress Discussion Part II -- Prepare for Final

12/10 for 3 hours

Time for student presentation and discussion based on specified papers

Activities

  • Each group prepare report and slides.
  • Animated slides should be recorded within 5 minutes and upload to Youtube

Association Rules

12/17, 12/24 for 3 hours

Association rule learning discovers interesting relations between variables from large data sets. By utilizing interestingness and confidence measures, it can identify strong rules discovered from data sets. It is widely used for product recommendations. In this session, several basic algorithms will be introduced and discussed.

Activities

  • Reading: J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD’95
  • Reading: J Han, J Pei, Y Yin, R Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, 2004 - Springer
  • Reading: N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999
  • Reading: R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables”. ACM SIGMOD96

Related Videos

Final Examination

1/07 for 3 hours

Time to evaluate. Different from other examination in our life, we do not want to assess how much we remember. It is more important to know how much we understand. Hence, each student can bring one A4-page paper with all kinds of notes into the classroom. Enjoy.

Notes

  • Students can take one A4 page with them

Final Project Demo

1/14 3 hours

Realizing by learning, thinking, and doing. This is the final realizing step, to develop a fine application with data mining skills, for this course. Through this practice, we not only polish our DM skills but also experience team works. With teammates, we dare to dream.

Requirements

  • Students should work with several people for their project.
  • Each group should generate 4 minute youtube clips to show in the class
  • Each group should have a poster and a working system
  • Final Project Requirement Description