Data mining is a subject involving algorithms for seeking unexpected pearls of wisdom. These algorithms are adapted from various domains including machine learning, artificial intelligence, pattern recognition, statistics, and database systems. This course covers the concepts, background information, techniques, and applications associated with Data Mining.
The covered topics include:
Supporting this course
She offers the fundamental database course and advance database courses for more than a decade. Her current research interests are: social networks, data mining, emotion analysis, and web intelligence.
Orientation2017/9/11 for 2 hours |
You will get familiar with the course, the instructor, your classmates, and the learning environment. The overview of the course will also be covered during the same session. Activities
Interesting Videos |
Overview and Data2017/9/12, 9/18, 9/19, 9/25 for 8 hours |
Understanding and reprocessing the data is the most important part in the whole data mining processes. In this session, the types of attributes and characteristics of data sets will be introduced first. Subsequently, several data preprocessing techniques are covered and followed by various similarity and distance measures. Finally, data visualization concepts will be addressed. Related Videos
|
Lab for Data Exploration and Management using Numpy, Pandas, Tensorflow, and Plotly9/26 for 2 hours |
In this lab session we will focus on the use of scientific computing libraries to efficiently process, transform, and manage data. Furthermore, we will provide best practices and introduce visualization tools for effectively conducting big data analysis. Activities
|
Classification10/2, 10/3, 10/16 for 6 hours |
Classification, also referred as supervised learning techniques, is widely used in data mining and machine learning. The objective of classification is to predict a class from input data. In this session, various fundamental algorithms, as well as core measures, will be covered. Related Videos
|
Association Rules10/17,10/23 for 4 hours |
Association rule learning discovers interesting relations between variables from large data sets. By utilizing interestingness and confidence measures, it can identify strong rules discovered from data sets. It is widely used for product recommendations. In this session, several basic algorithms will be introduced and discussed. Activities
Related Videos
|
Cluster Analysis10/24, 10/30, 10/31 for 6 hours |
Cluster analysis is the task to group objects, where the objects in the same cluster are more similar to each other than those in other clusters. Many clustering techniques are initially adopted in pattern recognition/signal processing domain, but now adapted in many other domains, such as data mining, machine learning, information retrieval, data compression, and computer graphics. In this session, we will introduce and discuss various clustering techniques. Related Videos
|
Text Mining11/6, 11/7 for 4 hours |
Text mining is the process to extract valuable information from unstructured text. Text mining usually involves natural language processing (NLP) processes, including lexical analysis, syntactic analysis, and inference. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, and entity relation modeling. We will cover some concepts in this session. Related Videos |
Data Mining in Social Networks11/13 for 2 hours |
With the dramatic growth of social media activities, mining social media data becomes a popular trend. In this session, we will introduce some interesting applications and techniques typically applied for such data set. Activity
Related Videos |
Midterm Examination11/14 for 2 hours |
Time to evaluate. Different from other examination in our life, we do not want to assess how much we remember. It is more important to know how much we understand. Hence, each student can bring one A4-page paper with all kinds of notes into the classroom. Enjoy. Notes
|
Lab for Deep Information Retrieval and Neural Word Embeddings11/20 for 2 hours |
In this lab session we will focus on the use of information retrieval techniques for modeling, training, and classifying textual data. Specifically, deep learning frameworks, such as word2vec, doc2vec, and FastText, will be introduced. Additionally, traditional approaches used for text classification, such as KNN, SVM, and Naive Bayesian, will also be covered. Activities
|
Student Presentation & Discussion12/11, 12/25 for 4 hours |
Time for student presentation and discussion based on specified papers Activities
|
Final Project Demo1/9 2 hours |
Realizing by learning, thinking, and doing. This is the final realizing step, to develop a fine application with data mining skills, for this course. Through this practice, we not only polish our DM skills but also experience team works. With teammates, we dare to dream. Requirements
|