Data science terminology

Application Programming Interface (API)
Collection of software routines, protocols, and tools which provide a programmer with all the building blocks for developing an application program for a specific platform (environment). An API also provides an interface that allows a program to communicate with other programs, running in the same environment. (
Artificial Intelligence (AI)

Artificial intelligence is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence such as learning, problem solving, visual perception and speech and pattern recognition.

Data Science
Data Science is an interdisciplinary field that uses scientific methods and algorithms to extract information and insights from diverse data types. It combines domain expertise, programming skills and knowledge of mathematics and statistics to solve analytically complex problems.
Deep Learning
Subset of machine learning that imitates the workings of the human brain in processing data and improves performance. Typically, a multi-level algorithm that gradually identifies things at higher levels of abstraction. For example, the first level may identify certain lines, then the next level identifies combinations of lines as shapes, and then the next level identifies combinations of shapes as specific objects. Deep learning is popular for image classification. (
Machine Learning (ML)

"Machine learning is the science of getting computers to automatically learn from experience instead of relying on explicitly programmed rules, and generalize the acquired knowledge to new settings."

United Nations Economic Commission for Europe's Machine Learning Team (2018 report)
The use of machine learning in official statistics.

In essence, Machine Learning automates analytical model building through optimization algorithms and parameters that can be modified and fine-tuned.

Machine Learning Algorithms
Machine learning algorithms use computational methods to "learn" information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. (
Machine Learning Model
The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process. (
Natural Language Processing (NLP)
Natural language processing (NLP) is a method to translate between computer and human languages. It is a method of getting a computer to understandably read a line of text without the computer being fed some sort of clue or calculation. In other words, NLP automates the translation process between computers and humans. (
Breaking a data block into smaller chunks by following a set of rules, so that it can be more easily interpreted, managed, or transmitted by a computer. Spreadsheet programs, for example, parse a data to fit it into a cell of certain size. ( ML algorithms can also be used to parse data.
A programming language available since 1994 that is popular with people doing data science. Python is noted for ease of use among beginners, and great power when used by advanced users, especially when taking advantage of specialized libraries such as those designed for machine learning and graph generation. (
An open-source programming language and environment for statistical computing and graph generation available for Linux, Windows, and Mac. (
Robotic Process Automation (RPA)
Robotic process automation (RPA) is the term used for software tools that partially or fully automate human activities that are manual, rule-based, and repetitive. They work by replicating the actions of an actual human interacting with one or more software applications to perform tasks such as data entry, process standard transactions, or respond to simple customer service queries. (
Supervised Learning
A type of machine learning algorithm in which a system is taught via examples. For instance, a supervised learning algorithm can be taught to classify input into specific, known classes. The classic example is sorting email into spam versus non-spam. (
Unsupervised Learning
A class of machine learning algorithms designed to identify groupings of data without knowing in advance what the groups will be. (
Web scraping
Web scraping is a term for various methods used to collect information from across the Internet. Generally, this is done with software that simulates human Web surfing to collect specified bits of information from different websites. (techopedia)
Date modified: