Skip to content

01 Data Storage

Jordy Homing Lam edited this page Oct 23, 2023 · 12 revisions

Semi-Permanent Storage Principle

My laptop, like many other recent laptops in the market, is only equipped with a Solid-State-Drive (SSD). There is no Hard-Disk-Drive (HDD) for permanent storage of data. Unlike an HDD, an SSD can only endure a limited number of write/erase cycles also known as program/erase (P/E) cycle. This is because the oxide layer that traps electrons in a NAND flash memory cell will deteriorate when a P/E cycle is completed. In other words, the SSD will eventually become unreliable - it will wear out and ultimately lose its ability to write data, even though there are mechanisms to prevent unsuccessful read after the SSD gone completely worn out. A few implications to ML tasks.

  • SSD in general offers faster access to large-volume data compared to HDD. This is a big plus.
  • SSD is also more shock-resistant compare to a spinning disk drive. Good for a laptop to be used on a bumpy road.
  • However, we should not frequently update (erase and write) data/model into an SSD. We should reduce the number of P/E cycles. If we observe this rule, we have a nice almost-permanent storage.

Creating Temporary Storage

Wait. I have a habit of downloading programs and arxiv papers with nice graphics and making slides for conference and hundreds of revisions of manuscript. My storage demand grows linearly. How to deal with this? A solution is buy oneself a thumb drive 500 GB and make all temporary storage there. This include many default directories

  • Web browser Edge, including Downloads, Cookies, etc
  • Email Outlook, including Downloads, Cache, Search results, etc
  • Office Microsoft Office, including word/powerpoint autosave location
  • Telegram Download direrctory
  • Academia
    • Pymol, including the very huge structure you downloaded by "fetch" last night!
    • Zotero, including the data directory storing pdfs.

A nice tiny thumb drive (pinky drive!) will be SanDisk 512GB Ultra Fit USB 3.1 Flash Drive - SDCZ430-512G-G46. This can be plugged into the USB slot 24/7. You should expect this drive to die anytime, so never store anything important there.

How about documents, slides, etc? These has to be stored somewhere safer, right? I would suggest Microsoft's cloud drive.

Creating Symbolic Link

I also realise in Win 10, we can create symbolic link. This makes life easier as we no longer need to edit pointers.

How to make SSD last longer?

  • Disable hibernation: Hibernation involves writing the contents of your RAM to the storage drive, which can lead to frequent write cycles. Disabling hibernation can help reduce write operations on your SSD. On powershell powercfg -h off

  • AHCI and TRIM commands: You can check and change the former in the computer's BIOS setting. It varies depending on your computer, but with most systems you can enter the BIOS by tapping the Delete or F2 key as the computer boots up. Here, look for the storage section and change the value of "Configure SATA as" to "AHCI" (if it's not already AHCI). It's better to do this before you install the operating system, otherwise you'll need to install the storage drivers first before changing the value. Note that if you use two SSDs in a RAID configuration, then the RAID value (rather than AHCI) should be selected. Also if your computer doesn't have option for RAID or AHCI, but only IDE, then it's too old -- time to shop for a new computer. You can determine if TRIM is working by running elevated Windows Powershell as described above in the hibernation section, then executing this command: fsutil behavior query DisableDeleteNotify If the command returns "DisableDeleteNotify = 0", then TRIM is running. If not, you can turn it on by executing: fsutil behavior set disabledeletenotify 0

  • Disabling Superfetch helps reduce writing to the drive. Superfetch: This is a technology, first introduced in Windows Vista, that allows Windows to more efficiently manage system memory and preload frequently accessed data and applications into the memory for faster performance. However, this process requires Superfetch's cache to be written on the drive and updated regularly, which increases the amount of writing to the drive. If your computer uses a hard drive, Superfetch is useful. For an SSD, however, it's not necessary and only wastes the drive's P/E cycles. To disable Superfetch, run the Windows Powershell as mentioned above and execute the services.msc command. This will open the Services utility. Step 3: Double-click the SysMain option when the Services console appears, then click the Stop button. Step 4: Then choose Disabled from the dropdown menu next to Startup type by clicking it. On terminal sc stop “SysMain” & sc config “SysMain” start=disabled

  • The Page File's size tends to change dynamically, resulting in frequent writing to the drive, which, again, is not good for an SSD. So if you use a computer with 8GB of RAM or more, and you generally don't run lots of concurrent programs, it might be a good idea to turn off Page File completely. However, the best practice is to set it at a fixed size recommended by the system. Or if you're on a desktop with an SSD as its primary drive and a secondary hard drive, it's best to move the Page File to the hard drive and disable it on the SSD.

•	Advanced System Settings > Choose Advanced tab
•	Click on the top Settings... button (under the Performance section)
•	Choose Advanced tab
•	Click on Change
•	Uncheck the box that reads "Automatically mange paging file size for all drives"
•	Check the Custom size radio button
•	Enter the Initial size and Maximum size with the value of the number following the Recommended: at the bottom of the window
•	Click on the OK buttons to close the Windows, and choose to Restart the computer.

  • Avoid defragmentation: SSDs don't need defragmentation like traditional hard drives. In fact, defragmenting an SSD can cause unnecessary wear and tear. Most operating systems automatically disable defragmentation for SSDs.

  • Limit unnecessary writes. Move temporary files and browser caches to a traditional hard drive or RAM disk. Adjust your browser settings to limit temporary internet files. Save downloads, especially large ones, to a traditional hard drive.

  • Use a RAM disk: Store temporary files in a RAM disk, which is faster and doesn't involve write cycles on your SSD. Just be aware that data in a RAM disk is volatile and may be lost during a power outage.

  • Regular backups: Maintain a reliable backup system to reduce the need for frequent data writes and to protect your data.

  • Monitor your SSD's health: Keep an eye on your SSD's health using manufacturer-provided tools or third-party software. This can help you catch potential issues early.

  • Avoid excessive write-intensive tasks: If you have tasks that involve a lot of writing to the SSD (e.g., video editing, virtual machines), consider using a secondary drive for those activities.

  • Manage your page file (virtual memory) wisely: If you have plenty of RAM, you can reduce the size of your page file or move it to a different drive to reduce write operations on the SSD.

  • Optimize your operating system for SSD: Most modern operating systems have SSD optimization settings that can improve performance and reduce wear.

  • Maintain adequate free space: Leave some free space on your SSD (at least 10-20%) to allow for efficient wear leveling and garbage collection.

  • Keep your system cool: High temperatures can reduce an SSD's lifespan. Ensure proper ventilation and cooling in your computer case.

  • Use a high-quality power supply: A stable power supply can help prevent data corruption and potential damage to your SSD.

  • Consider wear leveling and over-provisioning: Some SSDs come with built-in over-provisioning or offer the option to enable it. Over-provisioning helps distribute write/erase cycles more evenly, which can prolong the drive's lifespan.