There are two types of Big Data data. This guide to them is surprisingly simple.

I recently started a new job in Big Data Data. One of the things I learned was that there are different kinds of data that are valuable to every business — there’s database data,…

There are two types of Big Data data. This guide to them is surprisingly simple.

I recently started a new job in Big Data Data. One of the things I learned was that there are different kinds of data that are valuable to every business — there’s database data, content data, customer data, employee data. So you better dig around to find the specific elements of each kind of data to find the right data to look at, to get the right information. If you find it you should be pretty happy and start interpreting it, but if you don’t, then chances are pretty good you will be very unhappy.

Everyone — and this is not just my advice but data scientists’ advice, too — has been saying that Big Data is going to get more and more sophisticated, so what we need is rules and expectations about what we can and can’t do. It is our new guide, my data science apprentice in a govt lab, to use and interpret and treat all kinds of Big Data. You will probably never see it, but for the future, it can be your guide. The results, or not, of this work is going to be super interesting for all of us.

Here are some good examples of principles.

First there are the principles of data stewardship: I am not going to create an algorithm that can help you predict how hot or cold it is going to be tomorrow in your location. I am going to create an algorithm that would automatically guess whether it is going to be hot or cold. I am also not going to take old data and use it to create a model or prediction algorithm (see International Data Privacy Principles, and see also Stuxnet and many more similar examples of such actions). It is my personal goal to help data scientists find useful models, not throw out all the old data because you think it should not be part of your model, because it is too old or a bad fit. I guess maybe part of that is feeling bad that the old data does not fit the model, but I am suggesting that there is enough stuff that should be learned from it (junk is junk) that there should be a place for it in your model, at the very least.

The second set of principles are for applying Big Data to business intelligence (BI). That’s kind of weird, but basically, we are not going to be using a trove of raw data to feed another raw data product to the user. Yes, some raw data can be useful to some people, but that is not how we are going to use it in your BI products. I am not going to take HR data to create a predictive model for a new employee and then set up a model to come back later when she has been hired. I am going to combine HR data from a variety of sources into a model, and then extrapolate to see what your new employee looks like and where they will show up in your data collection, and then throw away the HR data.

My third set of principles is the things you can’t do — and they may seem like basic things that anyone working in Big Data would understand, but if you actually look at a few examples, they show something else, and more than that they show a few Big Data principles. These are some not-so-basic things for not trying to do with Big Data.

It should NOT be you. It should not be a corporate big brother running your world. It is for you to say, as you are creating models for your business in these other forms of data (segmentation, continuous operations, telemetry, etc.). It is not for me to tell you when to do it, and it is not for me to change the Hadoop file systems to show you one type of business intelligence results, instead of another.

And I am just scratching the surface. So I highly recommend you go take a look at this data science trail you could come out of here doing in your own small way.

If you want to learn about other examples of what I mean here are some that come to mind:

Some people actually agreed to do the tasks the Big Data team is asking you to do. That is awesome.

The authors of the LEGO argument at IT World think I am too negative in my expectations for BI. I don’t think that BI should be painful. But I do think it will suck a lot, and it is up to each individual for how much they will let it suck.

Really, if you read the basics of Big Data generally you should get a sense of the principles and then be ready to apply them in your own small way.

I am 100 percent serious.

Leave a Comment