Hadoop, the elephant in the enterprise

PALO ALTO, Calif.--This is a big-data week in Silicon Valley, kicking off last night with a Churchill Club event here called "The Elephant in the Enterprise: What Role will Hadoop Play?" and featuring a high-powered group of big-data executives.

Hadoop, the open-source software that has emerged as the de facto standard for big data processing, may be what tips enterprise in the favor of open source. The desire to get more data and find value in it has become a business priority, and Hadoop is playing a major role in making sense of data.

And while the … Read more

VMware works to make Hadoop 'virtualization-aware'

VMware today announced a new open-source project called Serengeti, which enables enterprises to quickly deploy, manage, and scale Apache Hadoop in virtual and cloud environments.

VMware says it is working with the Apache Hadoop community to contribute extensions that will make Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects "virtualization-aware" to support elastic scaling and further improve Hadoop performance in virtual environments.

In case you've been living outside the big data vacuum, open source Hadoop has emerged as the de facto standard for big data processing and is packaged up in a few different distributions by … Read more

The joys of real-time data analysis for online retailers

Re-reading a piece I wrote a few weeks back about the uptick in online sales during Black Friday, I started to wonder if real-time customer intelligence is what is driving online retail growth.

There are undoubtedly a number of aspects to the growth in online sales. But after spending some time with a few of the major online retailers last week--including one who might not be considered a "retailer" in the traditional sense, I realized that the online world has a huge competitive advantage in its predilection toward data analysis with actionable near real-time results.

Amazon's suggested … Read more

Hortonworks looks to grow Hadoop ecosystem

As big data becomes more and more top of mind, a number of new companies have popped up to support Hadoop, the leading open-source platform for data-intensive distributed applications. One of the newer entrants is Hortonworks, a company spun out of Yahoo, with a $15 million-plus cash infusion from both Yahoo and Benchmark Capital.

Last week I sat down with Hortonworks CEO Eric Baldeschwieler to understand how the company intends to differentiate from other vendors such as Cloudera, MapR, and the many as yet unlaunched companies that venture capitalists are still funding.

Hadoop itself was initially developed at Yahoo by … Read more

Open-source Scala gains commercial backing

The open-source Scala programming language is getting a big boost today in form of venture-funding for a new start-up.

Typesafe is launching the first commercial entity behind Scala, founded by Scala creator Martin Odersky and flush with $3 million from Greylock Partners.

Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It integrates features of object-oriented and functional languages and reduces code size in comparison to Java.

Greylock also funded Red Hat and Cloudera so it's no surprise that Typesafe will be taking a page from those companies … Read more

Cloudera ups the ante on open-source Hadoop

The Hadoop open-source project for distributed compute processing continues to be one of the most interesting projects for managing the vast amount of data being analyzed and collected in a wide variety of scenarios.

Today, Cloudera, a provider of Hadoop data management software and services, is set to release a major release of its open source software distribution--Cloudera Distribution for Hadoop (CDH), including Apache Hadoop v3.

Cloudera's CDH3 distribution is an integrated set of components and functions that interoperate through standard APIs and manage required component versions and dependencies.

CDH3 is an integrated stack that includes not just software … Read more

Big data in context

A few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenges that await the vendors trying to target this new way of slicing and dicing vast amounts of information.

One of the big takeaways for me was the realization that even with all of the processing power available nowadays, the amount of data is growing at such a rapid pace that people are simply looking to cope with the problem, rather than facing it head on.

The issue of processing large amounts of data is not necessarily new--most developers and IT staff can tell you about having too much information to deal with--but, the big difference is that there are new approaches, tools and technologies that can help alleviate the difficult in processing.

Over the course of the last 30 years or so the way that machines process transactions has changed, but so too has the vast amount of data that is being processed and collected, now with an eye toward real-time analysis of information.

This has led to the advent of a number of technologies that allow for data processing to be offloaded and managed in both structured and unstructured ways--examples include open-source projects like Memcached and Hadoop as well as NoSQL data storage mechanisms like Cassandra.… Read more

Cloudera goes enterprise with new Hadoop offering

Cloudera, a provider of support and services around the open-source cloud platform Apache Hadoop, on Tuesday announced Cloudera Enterprise, a suite of subscription-only add-ons to its free distribution.

The core platform, called Cloudera's Distribution for Hadoop (or CDH for short), was first unveiled in March 2009 and is 100 percent open-source software. Now, the company is offering Cloudera Enterprise, a suite of additional tools for monitoring, managing, and administering a cluster in production to complement the core CDH platform--for a fee.

This business model fits into the open-core category, where companies charge for exclusive tools or functions on top … Read more

Cloudera teams up to connect Oracle and Hadoop

This week Cloudera, a provider of software and services for the Apache Hadoop project, is set to announce a partnership with Quest Software to develop, support, and distribute an Oracle connector for Hadoop.

Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables its users to explore complex data, using custom analyses tailored to users' information and questions.

Code-named "Ora-Oop," the connector will provide connectivity between Cloudera's Hadoop distribution and Oracle through an interface that allows for bidirectional, scalable, and functional data transfer … Read more

IBM chooses Hadoop to analyze big data

IBM on Wednesday is set to announce a new portfolio of solutions and services to help enterprises analyze large volumes of data. IBM InfoSphere BigInsights is based on Apache Hadoop, an open-source technology designed for analysis of big volumes of data.

IBM InfoSphere BigInsights is made up of a package of Hadoop software and services, BigSheets, a beta product designed to help business professionals extract, annotate, and visually uncover insights from vast amounts of information quickly and easily through a Web browser, and industry-specific frameworks to help clients get started.

IBM has been aggressive in consuming and repackaging open-source projects … Read more