Listed below you will find some of the most common terms we use in Data Governance and Master Data Management Projects. See some missing? Let us know!

Algorithm – a set of instructions in computer programming that is designed to perform a specific task or produce a specific result. A simple algorithm would be the Counting function in Excel. Algorithms are designed to behave like small programs that can be referenced by a larger program. Machine learning and data mining algorithms are used to create analytics platforms.

Apache Hive – a widely used data warehouse system that serves as a central repository for data. It is often used with the analytics platform called Hadoop which aggregates and refines data for businesses. Users use Apache Spark to aggregate data then store it in a file handling system from which it can be sent on to Apache Hive for centralized storage.

Big Data – the diverse, huge set of information and data points that businesses collect during operations. It is typically so large and complex in nature that it cannot be processed by traditional database management tools. In order to make this data useful and valuable, businesses must apply new and innovative information processing, computational and analytical techniques and tools to derive meaningful patterns and insights from which decisions can be made.

Business Analytics –  a set of solutions such as data mining, statistics and predictive modelling that are used create business scenarios, understand the current state of business and forecast future business outcomes. Business analytics are packaged into various applications and solutions that are targeted for different types of business users and needs.

Business Intelligence –  the collection of applications, infrastructure, tools, and procedures that enable access to and analysis of information to improve and optimize decisions and performance.

Chief Data Officer (CDO) – a senior corporate executive who is responsible for an organization’s data governance, use of information and data assets and data processing. The CDO usually reports to the CEO.

Cloud Computing–  the use of various services such as software development platforms and especially servers to store, process and access computer data via the internet rather than a direct connection to a server.  Cloud computing makes it possible to store data, files, and software in a remote location and to be accessed from anywhere there is a web browser and connection.

Dark Data – the information and data that organizations collect, process and store during business operations that is not used, but is rather retained for non-valuable purposes such as compliance. It is usually unstructured and untagged and is mostly ignored by the organization.

Data Architecture – a formal set of rules, policies, standards and models designed and managed by a data architect that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its IT systems.

Data Cleansing – the process of carefully reviewing data sets and data storage protocols and subsequently altering data in a storage repository to ensure that it is accurate, correct and therefore useful and potentially valuable.

Data Context – the network of connections among data points that give the data meaning, value and relevance. For example, numbers without a correlated context, such as “time” or “place”, are not meaningful and thus cannot inform a business or provide insights. The addition of context to data is particularly crucial for realizing value from Big Data.

Data Discovery – a user driven process of searching for and extracting patterns from multiple sources of data.  Data discovery applications typically present data in a visual format such as a dashboard, geographic map or pivot table. Data discovery relies on technologies that enable the aggregation of Big Data.

Data Element  – the smallest unit of information defined by size and type for processing, which therefore makes it meaningful and usable. It is the basic building block of a data model and is called a “data field” in a database. An example is ACCOUNT NUMBER.

Data Governance – a framework that lays out decision rights, accountabilities and formal management procedures for information-related processes. Its purpose is to ensure that high quality data that has usability, security and integrity, exists throughout an organization.  Governance describes the who, what, when, why and how of data management and data-related decisions.

Data Lake – a method of storing data in its natural format alongside a collection of instances of various data assets that are near-exact or exact copies of the source data. The purpose of a data lake is to be able to provide a view of raw data to analysts to help them explore data refinement and analysis techniques while simultaneously storing the transformed data used for tasks such as reporting or analytics in a single repository.

Data Mining – the process of sifting through very large amounts of data and information stored in repositories in order to uncover useful and insightful relationships, patterns and trends. Data mining relies on multiple disciplines such as advanced statistical and mathematical techniques.

Data Modeling – a process used to define and analyze data requirements needed to support the business processes. In the data modelling process professional data modelers work closely with business stakeholders and potential users of an information system. Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner across systems in order that users can manage it as a resource.

Data Preparation – the necessary but often tedious process of collecting, cleaning, structuring and organizing data into one file so that it can be used in analysis. Data preparation is most often used when handling messy or un-standardized data, when trying to combine data from multiple sources, reporting on data that was entered manually, or dealing with data that was scraped from an unstructured source such as PDF documents. In big data applications it is largely supported by machines learning algorithms.

Data Protection Officer (DPO) –  a cybersecurity position within a corporation whose role is to develop privacy and data protection policies and act ensure the proper management and use of customers’ or clients’ information. The DPO role is required by the by the European Union as part of its General Data Protection Regulation (GDPR). The DPO ensures adherence to data protection laws and compliance pertaining to data by conducting internal assessments.

Data Repository –  a general term used to refer to a destination designated for data storage. A repository can be a particular kind of setup within an overall IT structure, such as a group of databases, that keeps a population of data isolated or partitioned so that it can be mined. It is also commonly called data warehousing.

Data Steward – a person with data-related responsibilities, including management, protection and oversight of an organization’s information assets, that are typically outlined in a Data Governance or Data Stewardship program.  Types of Data Stewards include Data Quality Stewards, Data Definition Stewards, Data Usage Stewards, etc.

Data Warehouse – a storage architecture designed to hold structured data extracted from operational systems and external sources. The warehouse combines that data in an aggregate, summary form so that it can be used for business intelligence and reporting purposes. Also see Data Repository.

Enterprise Architecture – a comprehensive framework or conceptual blueprint used to manage and align an organization’s structure, business processes, information technology, local and wide area networks, people, operations and projects to achieve the organization’s overall strategy.

Enterprise Metadata Management (EMM)– the business processes that provide control and visibility for managing the metadata about the information assets of the organization.

General Data Protection Regulation (GDPR) – a regulation enacted by the European Parliament, the Council of the European Union and the European Commission to strengthen privacy and data protection for individuals within the European Union (EU). GDPR aims to allow EU residents to maintain total control over their personal data on a global basis, and to simplify the regulatory environment for international businesses by unifying data privacy regulations within the EU. In addition it addresses the export of personal data outside the EU. The law becomes enforceable from 25 May 2018 after a two-year transition period. Violations carry steep penalties.

Hadoop – an Apache Software Foundation project that is a free open-source programming framework that supports the distributed processing and storage of extremely large data sets, both structured and unstructured. Because it allows for the saleable handling of large data sets Hadoop is often associated with the concept of Big Data.

Internet of Things –  the networking capabilities that allow physical devices such as equipment, machines, appliances and other items that contain embedded electronics, software, sensors, and network connectivity to exchange data using the internet. Each item is uniquely identifiable through its embedded computing system.

Machine Learning – an artificial intelligence (AI) discipline geared toward allowing software applications to become more accurate in predicting outcomes without being explicitly programmed to do so.  Machine learning allows computers to search for patterns and trends in data and self-adjust accordingly.  Computers learn to handle new situations via analysis, self-training, observation and experience through exposure to new scenarios, testing and adaptation.

Master Data – non-transactional, non-quantitative data units that can be used in a business across a variety of IT platforms and software programs. Master Data includes core data elements such as customer ID, addresses, supplier accounts, products, and other “nouns”.

Master Data Governance Model – A Master Data Governance Model is the collection of people, policies and procedures required to govern an organization’s Master Data. The implementation of a Master Data Governance Model helps ensure the effective application of Master Data Management across an organization’s critical data.

Master Data Management (MDM) – a structured, technology-based approach to defining and managing an organization’s Master Data and other critical data.  Through MDM the business along with IT defines policies, standards and tools that ensure the uniformity, accuracy, stewardship, consistency and accountability of the organizations shared master data assets.

Master Data Management Model – A Master Data Management Model is the methodology or collection of best practices used to implement Master Data Management across an organization’s shared critical data. A Master Data Management model typically includes guidelines for implementation, as well as prototypes and templates for various Master Data Management components.

Metadata – data that answers the questions who, what, where, when, why and how about other data. Examples of metadata provide information about data elements such as what database stores it, what type of data it is, and how long is its field.

MetaDirectory –  a centrally managed data integration service that permits the flow of data and data transactions between multiple directories, or between directories and databases. Metadirectory products and services include features to filter, evaluate and monitor data in transit.

Predictive Analytics – describes any approach to data mining and extracting and manipulation of data variables with the goal of identifying patterns and trends that are used to predict future outcomes.  This approach is more advanced than descriptive or diagnostic analytics and places an emphasis on prediction (rather than description, classification or clustering), speed of analysis, the business relevance of the resulting insights and ease of use and accessibility to business users.

Prescriptive Analytics –  the most advanced form of business analytics which examines data or content to answer the question “What should be done?” or “What can we do to make ‘X’ happen?”.  Its purpose is to define the best course of action or solution among various options. It includes techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristics, and machine learning.

Self Service Analytics – an approach to data analytics and business intelligence in which professionals are given the ability able to extract data, generate reports and run queries on their own using simple-to-use tools. These tools enable professionals to manipulate data and extract business insights from the data with no analytics experience or IT support.

Structured Query Language (SQL) – a standard programming language used for relational database management and data manipulation. SQL is used to query, insert, update and modify data. It is regularly used by database administrators, developers and data analysts.

Structured Data –  data that conform to fixed fields and as such are organized in a format easily used by a database, spreadsheet or other technology.  Users of structured data can anticipate having fixed-length pieces of information and consistent models in order to process that information.

Unstructured Data –  any data that does not have a recognizable structure or pre-defined data model. Unstructured data is unorganized and raw that doesn’t reside in a traditional row/column database.  It can be non-textual or textual. The body of an email is an example of unstructured textual data.