1/44
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What does Data Deduplication do
Scan files, divides those file into chunks, and retains only one copy of each chunk
After deduplication files are not stored as independent data instead they are replaced with what that points to the data on the common chunk
stub
where are files kept after deduplication
in common chunks
data deduplication may do what to overall disk performance
improve it
data deduplication run as a
scheduled task that can have minimum age requirements set
The components of data deduplication role
Filter Driver
Deduplication service
Garbage Collection
Filter Driver
Monitors local or remote input/output (I/O) and manages the chunks of data on the file system by interacting with the various jobs. There is one for every volume.
Deduplication service
Consist of multiple jobs that perform both deduplication and compression of files according to the data deduplication policy for the volume. After initial optimization of a file, if the file is then modified and meets the data deduplication policy threshold for optimization, the file will be optimized again.
Garbage collection
Consist of jobs that process deleted or modified data on the volume so that any data chunks no longer being referenced are cleaned up. This job processes previously deleted or logically overwritten optimized content to create usable volume-free space. When an optimized file is deleted or overwritten by new data, the old data in the chunk store isn't deleted immediately. This can also be scheduled to run or ran manually.
Data Deduplication has built in data integrity features such as
checksum validation and metadata consistency checking
Data Deduplication will try to rebuild corrupted data by using
Backup copies
Mirror Image
New Chunk
Backup copies
Deduplication keeps backup copies of popular chunks (chunks referenced over 100 times) in an area called the hotspot. If the working copy suffers a soft damage such as bit flips or torn writes, deduplication uses its redundant copy.
Mirror image
If using mirrored Storage Spaces, deduplication can use the mirror image of the redundant chunk to serve the I/O and fix the corruption.
New chunk
If a file is processed with a chunk that is corrupted, the corrupted chunk is eliminated, and the new incoming chunk is used to fix the corruption.
Because of the additional validations in deduplication, it may be one of the first system,s to report any early signs of
data corruption in the hardware or file system
Unoptimization does what
undoes deduplication on all the optimized files on the volume.
some jobs for unoptimization
decommissioning a server with volumes enabled for Data Deduplication, troubleshooting issues with deduplicated data, or migration of data to another system that doesn't support Data Deduplication.
what should you do before running unoptimization
Disable-DedupVolume in Windows PowerShell
three main types of data deduplication
source, target (or post-process) deduplication, and in-line (or transit) deduplication
what are optimized files
files that are stored as reparse points, and that contain pointers to a map of the respective chunks in the chunk store that are needed to restore the file when it's requested.
Chunk store
location for the optimized file data
Data deduplication is designed to be applied on
primary data volumes
Data deduplication can be scheduled based on
the type of data that is involved, and the frequency and volume of changes that occur to the volume or particular file types.
Data Deduplication should be considered for the following data types
General file shares, software deployment shares, VHD Libraries, VDI deployments, and Virtualized Backup
General file share are
These include group content publication and sharing, user home folders, and Folder Redirection/Offline Files.
Software deployment share are
These are software binaries, images, and updates.
VHD Libraries are
These are Virtual Hard Disk (VHD) file storage for provisioning to hypervisors.
VDI deployments are
These are Virtual Desktop Infrastructure (VDI) deployments using Microsoft Hyper-V.
Virtualized backup is
These include backup applications running as Hyper-V guests and saving backup data to mounted VHDs.
When applied to the correct data, deduplication can save up to
50 to 90 percent of a systems storage
What is an example of a bad file to have data deduplication run on
Files that are often changed and accessed by users or applications
How can you see what saving DeDuplication will give you
by using the Deduplication Evaluation Tool
what savings can you expect to see on User Documents
30 to 50 percent
what savings can you expect to see on Software deployment shares
70 to 80 percent
what savings can you expect to see on Virtualization libraries
80 to 95 percent
what savings can you expect to see on General file shares
50 to 60 percent
ideal candidates for deduplication
Folder redirection servers
Virtualization depot or provisioning library
Software deployment shares
Microsoft SQL Server and Microsoft Exchange Server backup volumes
Scale-out File Servers (SOFS) Cluster Shared Volumes (CSVs)
Virtualized backup VHDs (for example, Microsoft System Center Data Protection Manager)
Virtualized Desktop Infrastructure VDI VHDs (only personal VDIs)
non ideal candidates for deduplication
Microsoft Hyper-V hosts
Windows Server Update Service (WSUS)
SQL Server and Exchange Server database volumes
Data Deduplication interoperability
Windows BranchCache
You can optimize access to data over the network by enabling this on Windows Server and Windows client operating systems. When this is enabled the system communicates over a wide area network (WAN) with a remote file server that's enabled for Data Deduplication, all the deduplicated files are already indexed and hashed, so requests for data from a branch office are quickly computed. This is similar to preindexing or prehashing.
you shouldn't create a hard quota on a volume root folder enabled for Data Deduplication, instead you should use
A soft quota
Data Deduplication is compatible with
Distributed File System (DFS) Replication
Distributed File System (DFS) replication works by
remote differential compression
you can backup and restore individual files and dull volumes using
Data Deduplication
You can create optimized file-level backups/restores using
Volume Shadow Copy Service (VSS) writer