Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php on line 5524

Deprecated: Function get_magic_quotes_gpc() is deprecated in /homepages/18/d158249911/htdocs/blogs/inc/_core/_param.funcs.php on line 2220

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 398

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 40

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 336

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 337

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 338

Warning: Cannot modify header information - headers already sent by (output started at /homepages/18/d158249911/htdocs/blogs/inc/_core/_misc.funcs.php:5524) in /homepages/18/d158249911/htdocs/blogs/inc/_core/_template.funcs.php on line 339
Info, Information ya'll!
« ZooElite AI Development: Beta!What The Mashup »

Info, Information ya'll!

04/07/10 | by Comp615 [mail] | Categories: Current Events, The World

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /homepages/18/d158249911/htdocs/blogs/plugins/_bbcode.plugin.php on line 234

Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/18/d158249911/htdocs/blogs/plugins/_auto_p.plugin.php on line 502

Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/18/d158249911/htdocs/blogs/plugins/_auto_p.plugin.php on line 500

Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/18/d158249911/htdocs/blogs/plugins/_auto_p.plugin.php on line 500

Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/18/d158249911/htdocs/blogs/plugins/_texturize.plugin.php on line 116

People have long been calling the post-millennium years the Information Age, but only recently has that name truly become a way of life for consumers and businesses across the globe. Partially created by globalization and partially by the rapid advancement and shrinking of technology, this trend which treats information as currency seems to be the new unit of trade in the digital millennium. Read on for my full analysis after the jump...

Gathering Data
Everytime you load a page, the server logs your visit by storing information such as the page you visited, your IP address (which is a unique identifier of your internet connection), and a couple goodies like the result of the page load (success/error) and other boring things. This is just the way web servers work. Now if someone is running a website on said webserver, they might want to store all this in a more useful form. Perhaps they assign and store your session ID, your logins, perhaps how long you look at a page for, and some form data. They now have a huge amount of data about your browsing habits, and what the popular content is on their site.

Now take this small level and expand it to the internet as a whole. Part of the reason this stockpiling of information has occurred recently, actually the entire reason, is the rapid growth of technology recently. Currently, a consumer 1TB hard drive can be had for about 80$. If you consider storing someones name (50), phone number (10), and address (255), that's about 315 Bytes per entry, meaning you could store approximately 3.5 billion records on said drive (Minus some technical restrictions, so we'll say 3 billion)...which means...well putting this at the customer level is meaningless, so we'll say it costs about 1 cent to store a half million peoples data....Wow.

The point of that exercise is to show that it is very very very cheap to store data. So given that, it's clearly in a company's best interest to store as much data as possible. There are different granularity of storage durations as well. For instance, while customer and sales data will be kept permanently, web page logs might only be kept for a year. Once that detailed log data is up for archiving, we can create a process to store a summary or less specific data. I.E. Only store page hits, and discard IPs, arguments, debugging, etc.

Using Data
It's subconscious, I can't help it, I am crazy for data. Data to turn, flip, convert, analyze, chop and display. Last summer I worked extensively on a data warehouse which is basically a way of converting individual entries in something like a daily server log into summarized and quickly accessible numbers. Then through presentation tools such as Excel or other software, users can create a pivot table which allows one to view any piece of data summarized and filtered by any metrics. When done right, and believe me, this is a goldmine for any company.

As opposed to writing arduous SQL queries like us peons have to do, we can now send that excel spreadsheet right to the top, allowing bigwigs to simply use excel or look at pretty pictures which tell us things like:
Number of sales per month
Number of Sales per State
Value of Sales per Month Per State
Average Value of Sales Per Month Per State Per Gender
Quantity of Sales by Gender by Day of Week per State when individuals make > 100k annual and...

You get the idea. Answers to critical business questions are literally seconds away. One might think that processing almost a trillion rows of data to get these answers would take minutes or hours, but the way these tables are structured, they are pre-summarized and indexed based on the various metrics we can apply, which means even elaborate questions such as the last one above will appear in a matter of seconds.

Data as Currency
People really like data. Just today, I compiled some stats from my Yale Drama database for the Dean of the Arts today. I shared it with some other YDC members as well who all commented, "Wow, this is really cool". I don't ask people on the site for much data, and all of it is needed to support the rich content and utilities we have on the site, but think about this from a companies perspective for a moment. I am sitting on a list of every person who's auditioned for a show at Yale, every person who's gone to see a show at Yale, every person who's worked on a show at Yale, and all this data I can sort however I want. A list of every Yale Theater person? That's valuable! Especially if you're a Broadway casting agent.

Companies have begun to realize just how much the data they have is worth and are taking extreme measures to increase value and protect it. Take for instance the story of a guy who tried to crawl data from facebook for a project (Full story here), although he was accessing public data on the web, and following facebooks guidelines, he was sued by them for infringing on their information.

Facebook is the utopia of data, they have designed their own database system, with the sole purpose of being very good at correlating data very fast. Their system is designed to know what you want to see when. They have your address, phone, name, gender, sexual preference, interests, hobbies, colleges, birthdate, happy aquarium pet names...literally the only thing they don't have is your SSN. That data is worth trillions of dollars, seriously. They are able to provide perfectly targeted ads, try it! Go on facebook and change your relationship status or gender preference and see what happens to the ads.

Why has this demand for information arisen? It's a viscous cycle. The more ads you see, the more each ad needs to stand out to get noticed and the only way to make it stand out is by making each ad targeted to you. Ohhh, what's that? Hot Chrisitan Singles? Jeez, that's exactly what I was looking for, however did you know!? It's like we're all entering Minority Report. Don't know what I'm talking about? Check out the mall from the movie: Youtube Link Here.

Recently, Facebook and Google have both altered their privacy agreements to allow themselves and affiliates to utilize and collect more information about your browsing habits. Google is literally tracking you around the internet since almost every page has a Google Adsense ad on it. Since Google ads are on every page, when you go to a page, you necessarily load a Google ad which means sending news of your visit to Google.

Using this data, Google knows which pages you are most likely to visit, which pages you are most likely to visit after having visited other pages, or on which days, and at what times you like to view content. Based on this, they can alter which ads they serve you on which pages. The ads you see on the side of my website right now might not be the same ads I see. They don't utilize all this data currently to its full extent, but rest assured they have it.

So given that they basically stalk my daily routine, I tend to feel a little exposed on the internet. This has turned me into an internet privacy advocate of sorts. I don't feel that I need to go out of my way to protect myself, but it makes me wary. Browsers have recently added a "Private Browsing" feature which is supposed to be rather anonymous and doesn't store any data on your computer. The problem with this though, is that the sites still have your IP address and your session data. All that info is server side, even if you don't accept those dreaded cookies.

Really there's no viable way to baffle google ads (and Facebook is impossible, since you identify yourself by logging in to your account). One solution exists, but is somewhat inconvenient. It's called the Tor network. You can use Tor (and a few other programs) to effectively bounce traffic around the internet. Meaning instead of your computer going to the server directly, it will bounce off Switzerland and a few other people's computers first. It works, but it's terribly inconvenient for the average user.

Unfortunately, companies know that they can sell this information (Although not directly) to advertisers in the form of super-targeted ads like I mentioned above, and with the insane amounts of cash this nets them, they aren't very willing to ignore your browsing habits. Oh well, at least ads will be more interesting in the future.

The Future
Look at that transition, smooth huh? While Minority Report might be a few years out yet, information will continue to become an increasingly valuable asset, and one to be spliced and traded away for millions of dollars per year. From your perspective, think how much this data could do for you. Want to sell that old car? Wouldn't having a list of people in your area who have been browsing used car sites recently be helpful? How about the admissions offices around the country. They'd love to be able to pick out and serve ads to 17 year olds who have been visiting a lot of .edu sites.

But these examples aren't the future, they are reality. The technology exists to do this right now. SO how will we utilize this data in the future? Look for more personalized ads, or ads that change based on your interests. When I sell my car, I might be able to tell women that my car is safer than competitors and men that it's faster than competitors and teenagers that it's a better value than competitors.

In addition, while I'm not sure of the technological aspects of ti, digital cable and satellite tv should currently be capable of logging the times the TV is on and which channels are being watched. Not only can this technology be used for Nielson Ratings, but think about advertisers perspectives. If you already watch the Amazing Race every week, they can show an ad for a show that's similar instead of an ad for the Amazing Race. When they see you watch kids shows Saturday mornings and football Friday Afternoons, they could show toy ads in the morning, but beer and car ads in the afternoon.

There are obvious obstacles to this, and it would probably have to be done on a network-by-network basis, since they would be unwilling to relinquish control of their ad slots. This implementation would mean that cable providers would have to feed viewing information back to networks, which as I just showed, they are probably also unwilling to do. So perhaps the impasse of company's highly bureaucratic processes will save us all from the era of ultra-targeted marketing. Nevertheless, the data now is being gathered for the future, waiting to be used.

In the meantime, I hope this blog has shown you the growing trend and underlying value of storing information. I always try to store information on all my sites which allow me to analyze how people interact with the site, and to create richer content in popular areas. Compared to Google, Facebook, and Twitter, I am nothing. This post will end up as just another data point on graphs throughout the world, hundreds of search engines will crawl this page and reduce it's content to a story about "Information". Only to be aggregated into summary tables where executives (Or machines) will decide to show you ads relating to information systems courses or something similar. Because you've been visiting so many pages about information...haven't you? Yes, I see you have...


No feedback yet

Form is loading...

A collection of musings from my time at Yale along with some thoughts about my "Freshman year of life" in San Francisco.


  XML Feeds

Photo albums software