InterPARES Trust - Terminology - Citations (English)

Citations

Dilly 1995 (†480)
Dilly, Ruth. "Data Mining: An Introduction: Student Notes" – ver. 2.0 (The Queen's University, Belfast, 1995).
URL: https://web.archive.org/web/19990424113923/http://www-pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_1.html

Existing Citations

data mining (1.1): The term data mining has been stretched beyond its limits to apply to any form of data analysis. Some of the numerous definitions of Data Mining, or Knowledge Discovery in Databases are: ¶ Data Mining, or Knowledge Discovery in Databases (KDD) as it is also known, is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. This encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analysing changes, and detecting anomalies. · William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus Data mining is the search for relationships and global patterns that exist in large databases but are `hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and the objects in the database and, if the database is a faithful mirror, of the real world registered by the database. · Marcel Holshemier & Arno Siebes (1994) (†685)
data mining (1.1): Data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. It is the computer which is responsible for finding the patterns by identifying the underlying rules and features in the data. The idea is that it is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernable or so obvious that no-one has noticed them before. (†686)
data mining (1.2.4): Knowledge Discovery in Databases (KDD) or Data Mining, and the part of Machine Learning (ML) dealing with learning from examples overlap in the algorithms used and the problems addressed. The main differences are: · KDD is concerned with finding understandable knowledge, while ML is concerned with improving performance of an agent. So training a neural network to balance a pole is part of ML, but not of KDD. However, there are efforts to extract knowledge from neural networks which are very relevant for KDD. · KDD is concerned with very large, real-world databases, while ML typically (but not always) looks at smaller data sets. So efficiency questions are much more important for KDD. · ML is a broader field which includes not only learning from examples, but also reinforcement learning, learning with teacher, etc. ¶ KDD is that part of ML which is concerned with finding understandable knowledge in large sets of real-world examples. When integrating machine learning techniques into database systems to implement KDD some of the databases require: · more efficient learning algorithms because realistic databases are normally very large and noisy. It is usual that the database is often designed for purposes different from data mining and so properties or attributes that would simplify the learning task are not present nor can they be requested from the real world. Databases are usually contaminated by errors so the data mining algorithm has to cope with noise whereas ML has laboratory type examples i.e. as near perfect as possible. · more expressive representations for both data, e.g. tuples in relational databases, which represent instances of a problem domain, and knowledge, e.g. rules in a rule-based system, which can be used to solve users' problems in the domain, and the semantic information contained in the relational schemata. ¶ Practical KDD systems are expected to include three interconnected phases · Translation of standard database information into a form suitable for use by learning facilities; · Using machine learning techniques to produce knowledge bases from databases; and · Interpreting the knowledge produced to solve users' problems and/or reduce data spaces. Data spaces being the number of examples. (†687)

Citations

Dilly 1995 (†480)

Existing Citations