A Big Data Sqoop of Hadoop

Posted Posted in Big Data, Development

This week I’m working with one of our partners on migrating some data from Microsoft SQL Server to Hadoop. Sqoop is the tool of choice because it can migrate the data and maintain data types inside of Hive. Sqoop is a very straight forward tool, but there were some serious gotchas including driver issues and unsupported data types. For those who think I am writing gibberish at this point, I have dedicated an entire post to defining some jargon. Once you¬†know the jargon, the rest of this post will make more sense. Driver Issues By default Sqoop does not include […]

Big Data and Hadoop Jargon

Posted Posted in Big Data, Development

I was starting to write a post on using Sqoop with Hadoop, and realized that to a normal person, it would sound as if I were speaking gibberish, so I decided to define as much of the jargon in one place as possible. Defining Big Data Jargon Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets (Big Data) in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Big Data extremely large data sets that may be analyzed computationally to reveal patterns, […]