Monday, January 5, 2015

Data Entropy, Part 3 -- Data isn't formatted.

If you use MS Outlook, then you are probably familiar with the way in which Outlook stores phone numbers; that is as separate Country Code, Area Code, Number and Extension.  If you make sure you store these values separately, then Outlook will display the numbers in a standard format.  However, there is nothing in the system that will prevent you from storing the numbers in a non-standard format.  Outlook simply throws the entire thing into the "Number" field and displays it as you entered.  What this illustrates is the fact that formats are and should be kept separate from the data which they display.

When data gets shared, the format in which it is stored can cause confusion or even errors.  Take for example a field that stores the amount of available storage used by files on your harddrive.  Given the wide variance in file sizes, we have grown accustomed to seeing the units appended to the end, like 45GB or 3.5TB.  But if you stored those values "45GB" and "3.5TB" in your database, how would you start adding them up?  How would you be able to total the amount of storage used by .gif files versus the amount used by .mp3s?  The logic required becomes mind numbing and maintaining that code requires considering all of the ways in which a user may enter the value.  The right solution is to decide on a unit of measure for the storage metric, (Personally, I like bytes in this case) and stick to it.  Then you can modify your display algorithms to provide units.  Additionally, you have a standard value by which you can transfer the data to any system without worrying about how the information is formatted.

On a slightly different note:
A group of colleagues and I started reminiscing about backup media and what it took to store all those floppies/CDs/DVDs "back in the day".  So, I decided to take a look back and see what it would take to store 1 terabyte of information using some common media from our past.  Here are the results.

Media Type Capacity # to store 1TB
5-1/4 In floppy disk 360 KB 2,982,617
3.5 in HD floppy disk 1.44 MB 728,178
8 in floppy disk 6.2 MB 169,126
SuperDisk - LS120 120 MB 8,739
SuperDisk - LS240 240 MB 4,370
CD-ROM 650 MB 1,614
Zip Disk 750 MB 1,399
Jaz Disk 1 GB 1,024
DVD (2-Layer) 9.4 GB 109