2018 Fall ISA 5810: Data Mining: Concepts, Techniques, and Applications

Syllabus

Orientation

2018/9/10 for 2 hours

You will get familiar with the course, the instructor, your classmates, and the learning environment. The overview of the course will also be covered during the same session.

Activities

Reading: Syllabus
Join NTHU ilms System
Take Orientation Quiz before 2018/9/25

Interesting Videos

How We Used Data to Win the Presidential Election by Dan Siroker

Overview and Data

2018/9/17, 10/1 for 6 hours

Understanding and reprocessing the data is the most important part in the whole data mining processes. In this session, the types of attributes and characteristics of data sets will be introduced first. Subsequently, several data preprocessing techniques are covered and followed by various similarity and distance measures. Finally, data visualization concepts will be addressed.

Activities

Project proposal due at Sep 30
Team up with at least one Taiwan citizenship student

Lab for Data Exploration and Management using Numpy, Pandas, Tensorflow, and Plotly

10/8 for 3 hours

In this lab session we will focus on the use of scientific computing libraries to efficiently process, transform, and manage data. Furthermore, we will provide best practices and introduce visualization tools for effectively conducting big data analysis.

Activities

Attend the class with your personal laptop
Assignment One should be submitted before Oct 22

Text Mining

10/15 for 3 hours

Text mining is the process to extract valuable information from unstructured text. Text mining usually involves natural language processing (NLP) processes, including lexical analysis, syntactic analysis, and inference. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, and entity relation modeling. We will cover some concepts in this session.

Student Presentation & Discussion

10/22 for 3 hours

Time for student presentation and discussion based on one specified paper

Activities

Team up with classmates.
Each group prepare slides no more than 3+1 slides.
Each group also prepare a good quiz question for your classmates

Classification

10/29, 11/12 for 6 hours

Classification, also referred as supervised learning techniques, is widely used in data mining and machine learning. The objective of classification is to predict a class from input data. In this session, various fundamental algorithms, as well as core measures, will be covered.

Lab for Deep Information Retrieval and Neural Word Embeddings

11/05 for 3 hours

In this lab session we will focus on the use of information retrieval techniques for modeling, training, and classifying textual data. Specifically, deep learning frameworks, such as word2vec, doc2vec, and FastText, will be introduced. Additionally, traditional approaches used for text classification, such as KNN, SVM, and Naive Bayesian, will also be covered.

Activities

Attend the class with your personal laptop
Assignment Two should be submitted before Nov 26

Final Project Progress Discussion Part I -- Prepare for Semi-Final

11/19 for 3 hours

Time for student presentation and discussion based on specified papers

Activities

Each group prepare report and slides.
Animated slides should be recorded within 5 minutes and upload to Youtube

Cluster Analysis

11/26, 12/3 for 6 hours

Cluster analysis is the task to group objects, where the objects in the same cluster are more similar to each other than those in other clusters. Many clustering techniques are initially adopted in pattern recognition/signal processing domain, but now adapted in many other domains, such as data mining, machine learning, information retrieval, data compression, and computer graphics. In this session, we will introduce and discuss various clustering techniques.

Final Project Progress Discussion Part II -- Prepare for Final

12/10 for 3 hours

Time for student presentation and discussion based on specified papers

Activities

Each group prepare report and slides.
Animated slides should be recorded within 5 minutes and upload to Youtube

Association Rules

12/17, 12/24 for 3 hours

Association rule learning discovers interesting relations between variables from large data sets. By utilizing interestingness and confidence measures, it can identify strong rules discovered from data sets. It is widely used for product recommendations. In this session, several basic algorithms will be introduced and discussed.

Activities

Reading: J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD’95
Reading: J Han, J Pei, Y Yin, R Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, 2004 - Springer
Reading: N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999
Reading: R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables”. ACM SIGMOD96

Final Examination

1/07 for 3 hours

Time to evaluate. Different from other examination in our life, we do not want to assess how much we remember. It is more important to know how much we understand. Hence, each student can bring one A4-page paper with all kinds of notes into the classroom. Enjoy.

Notes

Students can take one A4 page with them

Final Project Demo

1/14 3 hours

Realizing by learning, thinking, and doing. This is the final realizing step, to develop a fine application with data mining skills, for this course. Through this practice, we not only polish our DM skills but also experience team works. With teammates, we dare to dream.

Requirements

Students should work with several people for their project.
Each group should generate 4 minute youtube clips to show in the class
Each group should have a poster and a working system
Final Project Requirement Description

ISA 5810

About ISA5810, Fall 2018

Text Book

Time in 2018

Location

Instructor:

Yi-Shin Chen

Teaching Assistants:

Elvis Omar Saravia

Evan Yu

Ray Lee

Orientation

Activities

Interesting Videos

Overview and Data

Related Videos

Activities

Lab for Data Exploration and Management using Numpy, Pandas, Tensorflow, and Plotly

Activities

Text Mining

Related Videos

Student Presentation & Discussion

Activities

Classification

Related Videos

Lab for Deep Information Retrieval and Neural Word Embeddings

Activities

Final Project Progress Discussion Part I -- Prepare for Semi-Final

Activities

Cluster Analysis

Related Videos

Final Project Progress Discussion Part II -- Prepare for Final

Activities

Association Rules

Activities

Related Videos

Final Examination

Notes

Final Project Demo

Requirements