MDF, NDF, LDF, Temp DB files and Filegroups.

MDF: Primary database files: This is where everything in the database is normally stored. Where there are multiple database files (see .NDF files), the .MDF file will always store the database’s internal configuration information and other internal system data. A database can have only one primary database file and no two databases can share the same primary database file. SQL server by default will create one primary data file and one log file. The primary database file has a .mdf extension.

NDF: Secondary database files: Secondary database files are used to store all data that does not fit in the primary database file. Secondary files can be used to spread data across multiple disks by putting each file on a different disk drive. SQL server by default will not create an ‘ndf’ file. NDF files can be created while creating SQL server database or manually created after the database is created using the ‘ALTER DATABASE’ statement. A secondary database file has a .ndf extension.

LDF: Transaction log files: Every database contains one primary database file and one transaction log file (Log files are written to sequentially, so there is no benefit in having multiples of them, unless you exceed the maximum log file size (2TB) between backups.). The transaction log files are used to hold the log information used to recover the database. The transaction log is a serial record of all modifications that have occurred in the database as well as the transaction that performed each modification. The transaction log records the start of each transaction. It records the changes to the data and enough information to undo the modifications (if necessary later) made during each transaction. Data and transaction log information is never mixed up on the same file. Every database must include at least one transaction log file. Transaction log files have a .ldf extension.

Temp DB is a special system database used in certain key operations, and has a big performance impact on your overall system. Tempdb handles a lot of system tasks as well as user-created temp tables. Query operations like joins and aggregations happen there. Online index rebuilds and INSTEAD OF triggers are also done in tempdb. Tempdb installs with just one data file and one log file by default.

If your Tempdb is on a shared disk it is probably best to move it to a local disk. There are two reasons why you might want to move tempdb from a shared disk to a local disk, and both are related to performance.

The first reason is that the relatively recent increase in cost effective, ultra-fast solid-state storage presents an opportunity to achieve significant performance gains on servers experiencing heavy tempdb usage. The challenge prior to SQL Server 2012 was that solid-state storage cards plug straight into a server’s motherboard to avoid all the overhead of traditional storage buses*. This made it very difficult to use them at all in failover cluster instances and now they can be used for the discrete task of running tempdb.

The second reason you might want to use a local tempdb is to take I/O requests off your shared storage to improve the performance of the shared storage.

*Until SQL Server 2012, a failover cluster instance of SQL Server required all its database files to be on shared disk resources within the cluster. This was to ensure that when the instance failed over to another node in the cluster, all its dependent disks could be moved with it. Nothing in tempdb persists after a restart and it’s effectively recreated every time. The failover process for a clustered instance involves a restart of SQL Server so nothing in tempdb needs to be moved across to the other node and there’s no technical reason why tempdb should be on a shared disk.

A Filegroup in SQL Server is a named collection of one or more files the forms a single unit for data allocation or administrative purposes. For example, two files can be created on two separate disks (i.e. f1 and f2) and assigned to the filegroup filegroup1.  Filegroups let you target specific operations—primarily backups and restores—to a single filegroup, thereby affecting a bunch of files.

Let’s say that, for performance reasons, you’ve split your database across three .NDF files in addition to the main .MDF file. Each of those four files is located on a separate disk volume, and each volume is accessed by a different disk controller. That’s great for parallel operations, and it should help performance in the system. But now you have to back up and recover four files, making a lot more work for yourself. Instead, you can group those into a single filegroup, and then run backups against that filegroup, grabbing all four files at once.

Another option for using filegroups is to assign tables or indexes to different FILEGROUPs and then place the files for those FILEGROUPs on different physical drives, RAID arrays, or LUNs. This way it’s possible to allow SQL Server to engage in parallel I/O operations that can help speed the execution of more complex queries. Two common techniques for the use of multiple FILEGROUPs and files are:

  • Placing tables on the PRIMARY FILEGROUP, then creating key, non-clustered, indexes on an ‘INDEX’ FILEGROUP – which has files spread across other physical locations. This, in turn, allows SQL Server to engage one set of drives (or spindles) to do bookmarking activities against the table itself, while using a completely different set of drives/spindles to seek or scan against an index – rather than waiting on the same I/O subsystem from a single set of drives/spindles to queue those requests and process them as needed.
  • Put heavily used tables (via JOINs) in a ‘SECONDARY’ FILEGROUP – which has files placed on different physical drives/spindles. This, in turn, then allows quicker completion of commonly used JOINs by allowing quicker completion of underlying I/O operations as these operations are handled by different drives or sets of drives.

File placement.

It’s best practice to separate data, transaction logs, and tempdb. This recommendation lies with the separation of types of workload between different physical storage, i.e. separate physical disks.

This is still a valid recommendation for environments where you can guarantee that separation, but more commonly we see SQL Server deployed in a shared storage environment, where physical separation is much harder to achieve and usually isn’t even necessary for performance reasons.

It is still a good idea however to maintain separation to help with manageability so that potential problems are easier to isolate. For example, separating tempdb onto its own logical disk means that you can pre-size it to fill the disk without worrying about space requirements for other files, and the more separation you implement the easier it is to associate logical disk performance to specific database files.

The general objectives are as follows:

  1. Optimize parallelism of IO.
  2. Isolate different types of IO from each other that may otherwise cause a bottleneck or additional latency, such as OS and page file IO from database IO, or sequential log file IO from random data file IO.
  3. Minimize management overheads by using the minimum number of drive letters or mount points required to achieve acceptable performance.

In order to achieve objectives 1 and 2, we recommend splitting out data files and Temp

DB files from log files onto separate drive letters or mount points. This has the effect of killing two birds with one stone. By separating log files into their own drive or mount point, you maintain the sequential nature of their IO access pattern and can optimize this further at the hypervisor and physical storage layer later if necessary. If the log files share a drive or mount point, the access pattern of that device will instantly become random. Random IO is generally harder for storage devices to service. At the same time, you are able to increase the parallelism needed for the IO patterns of the data files and Temp DB files.

To achieve greater IO parallelism at the database and operating system layer, you need to allocate more drive letters or mount points. The reason for this is that each storage device (mount point or drive) in Windows has a certain queue depth, depending on the underlying IO controller type being used. Optimizing the total number of queues available to the database by using multiple drives or mount points allows more commands to be issued to the underlying storage devices in parallel. We will discuss the different IO controllers and queue depths in detail later.

As a starting point for standalone database instances, we recommend that you configure a drive letter or mount point per two data files and one Temp DB file. This recommendation assumes each file will not require the maximum performance capability of the storage device at the same time. The actual number of drive letters or mount points you need will be driven by your actual database workload. But by having fewer drives and mount points will simplify your design and make it easier to manage.

The more users, connections, and queries, the higher the IO requirements will be, and the higher the queue depth and parallelism requirements will be, and the more drive letters and mount points you will need.

Number of files.

The usual recommendation from Microsoft is 1 file per CPU core but that is a generalisation and a more accurate recommendation would be as follows:

files-cores

Some people recommend having the number of tempdb data files equal to 1/4 to 1/2 the number of logical processor cores and this may be perfectly fine, it all depends on you setup and workload but the above table is a good place to start.

Here is an example of data files, Temp DB files, and transaction log files allocated to a SQL Server 2012 Database on a sample system with four CPU cores and 32GB RAM.

file locations2

When formatting a partition you are given the option to choose the allocation unit size, which defaults to 4KB. Allocation units determine both the smallest size the file will take on disk and the size of the Master File Table ($MFT). If a file is 7KB in size, it will occupy two 4KB allocation clusters.

SQL Server files are usually large. Microsoft recommends the use of a 64KB allocation unit size for data, logs, and tempdb. If you use allocation units greater than 4KB, Windows will disable NTFS data compression (this does not affect SQL Server compression).

Database file size.

Data files, unlike transaction log files, are accessed in parallel and the I/O pattern is more random. Temp DB files are accessed in parallel in a round-robin fashion. This is why having more database files improves the parallelism of I/O access to storage. In effect, the I/O is striped across the data files.

It is important to configure your database files to be equal size to start with. SQL Server will write data to the available files evenly if they are the same size, as it uses a proportional fill algorithm that favours allocations in files with more free space. If the files are the same size, then each file will have the same free space. Having equally sized files ensures even growth and more predictable performance.

Also, you should pre-allocate all your data files and transaction log files. This will eliminate the need for ‘autogrowth’. It is a Microsoft best practice to manually and proactively manage file sizes. Because you are pre-sizing and proactively managing your database files, you shouldn’t need to rely on Auto Grow. The reasons to avoid autogrowth are:

  • Autogrowth puts a significant burden on a server, and when it happens during production hours your users will be impacted.
  • Autogrowth results in the physical fragmentation of the database files, which slows down database access. This is because disk space is requested from the OS in chunks (set by the autogrowth size you’ve configured), and those chunks won’t be all lined up in a contiguous space.
  • Autogrowth isn’t managed. Either you’ll allow unlimited growth, which will eventually consume available disk space, or you’ll cap the growth, forcing you to deal with the capacity problem anyway.

When a database starts to need more room, you should plan to manually increase its size during a scheduled maintenance window. That maintenance window can involve taking the database offline, so that Windows can defragment the physical layout of the database file. Autogrowth is a good last-ditch emergency measure (so leaving it on by default isn’t a bad thing), but your goal should be for it to never happen. With autogrow using fixed-growth amounts is generally a better approach for because it makes autogrow events more predictable. If you’ve configured multiple data files and you want to allow autogrow, enable trace flag 1117, which will force all data files to grow uniformly so you don’t break the load balancing between files.

For data files, the preset size you should use is based on the estimated or actual size of your database. You should allow for realistic estimated growth (three to six months). Once you have the total estimated size of your database, including growth, divide that by the number of files to get the size of each file (read ‘MDF, NDF, LDF, Temp DB files and Filegroups’ to see how many files for your database). For example, if you had a database 200GB in size with four cores configured, you would have four data files, assuming one file per core, with a preset size of 50GB each. Each data file should always be of equal size and be extended at the same rate.

You need to watch how much of the database file is being used, and monitor that figure. That way you’ll be able to predict when it will need more room. You’ll also know about how much room to give it to have it last for a specified amount of time. To perform that monitoring, you can look at SQL Server Management Studio every few days and check out the databases’ properties. You’ll see the file sizes, and, if you like, you can record them in spreadsheet. That way, you can have the spreadsheet generate charts that show you database growth over time.

TempDB initial sizing.

A default installation of any SQL Server edition will create a tempdb database with an 8MB data file and a 1MB transaction log file and are configured to autogrow by 10% as needed. In a lot of installations these file sizes won’t be enough. With normal data files ‘autogrowth’ is not desired but with tempdb it is even more troublesome. When you restart your SQL Server instance, tempdb is re-created and sized to the value specified in the database properties, which by default is only 8MB for the data file and 1MB for the log file by default. Even if you resize the tempdb file you must monitor it and avoid autogrowth, here’s why:

tempdb growth

Here we can see that the tempdb files were initially at the default sizes. The DBA then resized them to 300MB and 55MB but autogrowth increased their sizes to 2500MB and 600MB. The database was then restarted (possibly as a result of ‘failing over’ and the tempdb file are then created to the sizes the DBA assigned but the autogrowth that increased the file sizes isn’t taking accounted for and we are now left with tempdb file that are far too small. So what size should tempd be set to?

If you can give tempdb its own disk, then configure it to almost fill the drive. If nothing else will ever be on the drive, then you’re better off setting it to be larger than you’ll ever need. There’s no performance penalty, and you’ll never have to worry about autogrow again.

If you can’t put tempdb on its own disk, then you’ll need to manually manage size and autogrow. You could just let it autogrow for a while and then manually set it to be a bit larger than the size it grows to or you could just make it a reasonable size in relation to your other databases and set large autogrow amounts.