NoSQL

Where IT is going: Cloud, mobile, and data

Cloud computing seems to often get used as a catch-all term for the big trends happening in IT.

This has the unfortunate effect of adding additional ambiguities to a topic that's already laden with definitional overload. (For example, on a topic like security or compliance, it makes a lot of difference whether you're talking about public clouds like Amazon's, a private cloud within an enterprise, a social network, or some mashup of two or more of the above.)

However, I'm starting to see a certain consensus emerge about how best to think about the broad sense … Read more

Can PostgreSQL pickup where MySQL left off?

EnterpriseDB, a provider of enterprise-class products and services based on PostgreSQL, today announced Postgres Plus Cloud Server, which the company has billed as "a full-featured, Oracle-compatible, enterprise-class PostgreSQL database-as-a-service for public and private clouds with support for Amazon EC2, Eucalyptus, Rackspace, and GoGrid."

We've seen other database-as-a-service offerings come on the scene from the likes of Salesforce.com's Database.com, Amazon RDS, as well as from startup Xeround. But they're not based on PostgreSQL, which has had years of hardening and development by a committed community. The other databases are not "Oracle compatible," … Read more

Are databases in the cloud really all that different?

Last week a discussion emerged in regards to the necessity of the NoSQL moniker associated with a new wave of open-source distributed database projects like CouchDB, MongoDB and Cassandra.

CouchOne, the commercial entity behind CouchDB even announced that it's moving away from associating the company with NoSQL as focuses on enabling offline data and applications.

The current orthodoxy would have you believe that if you are trying to get your head around "big data" or "Web scale" (see video), NoSQL is the answer. If you are dealing with preset data definitions being accessed by all … Read more

Why relational databases make sense for big data

In 2010, the talk about a "big data" trend has reached a fever pitch. "Big data" centers around the notion that organizations are now (or soon will be) dealing with managing and extracting information from databases that are growing into the multi-petabyte range.

This dramatic amount of data has caused developers to seek new approaches that tend to avoid SQL queries and instead process data in a distributed manner. These so-called "NoSQL," such as Cassandra and MongoDB databases, are built to scale easily and handle massive amounts of data in a highly fluid manner.

And while I am a staunch supporter of the NoSQL approach, there is often a point where all of this data needs to be aggregated and parsed for different reasons, in a more traditional SQL data model.

It occurred to me recently that I've heard very little from the relational database (RDBMS) side of the house when it comes to dealing with big data. To that end, I recently caught up via e-mail with EnterpriseDB CEO Ed Boyajian, whose company provides services, support, and training around the open-source relational database PostgreSQL.

Boyajian stressed four points:

1. Relational databases can process ad-hoc queries

Production applications sometimes require only primary key lookups, but reporting queries often need to filter or aggregate based on other columns. Document databases and distributed key value stores sometimes don't support this at all, or they may support it only if an index on the relevant column has been defined in advance.

2. SQL reduces development time and improves interoperability

SQL is, and will likely remain, one of the most popular and successful computer languages of all time. SQL-aware development tools, reporting tools, monitoring tools, and connectors are available for just about every combination of operating system, platform, and database under the sun, and nearly every programmer or IT professional has at least a passing familiarity with SQL syntax.

Even for the types of relatively simple queries that are likely to be practical on huge data stores, writing an SQL query is typically simpler and faster than writing an algorithm to compute the desired answer, as is often necessary for data stores that do not include a query language. … Read more

Xeround scales MySQL for the cloud

Today, Xeround officially announced the release of the private beta of its "MySQL for the Cloud" service--an elastic, linearly scalable, relational database designed to run applications in cloud environments.

Xeround is based on an in-memory database and has been tested in a number of telco production environments, according to CEO Razi Sharir. The software utilizes virtual partitions where data partitions are decoupled--or abstracted--from physical resources. These virtual partitions hold copies of both the data and the indexes, in order to ensure high availability and performance.

Despite the ubiquity of open-source MySQL, the database has in the past suffered … Read more

Free NoSQL and data scalability cheat sheet

NoSQL databases and associated operational-data technologies based on nonrelational approaches to data management and manipulation continue to be top of mind for big Web shops and are slowly starting to make their way into enterprise IT infrastructure.

This means that developers need to get a handle on the latest information about NoSQL and big data in order to stay on top of the trend.

Accordingly, developer site DZone just released a new Getting Started with NoSQL and Data Scalability reference card as part of their cheat-sheet library.

The refcard is a good primer to get you asking all the right … Read more

NorthScale, Zynga team up on NoSQL

The massive amounts of data being created on the Web and the rise of cloud computing together make an ideal environment for alternative database technologies to thrive. And the Web is often proving to be just an entry point for bleeding-edge technology to be tested out before it starts heading into the enterprise.

NoSQL databases and associated operational-data technologies based on nonrelational approaches to data management and manipulation continue to be top of mind for big Web shops and are slowly starting to make their way into enterprise IT infrastructure.

I've spoken with a number of vendors roaming the NoSQL space over the last few months and there seems to be one common thread that they push: traditional relational databases are expensive, bulky, and simply not ideal for this new era of Web technology.

On Wednesday, a new NoSQL database joins the fray: Membase. Launched as an open-source project under the Apache 2.0 license and co-sponsored by NorthScale, Zynga, and NHN (Korea's top online gaming portal), Membase is optimized for storing the data behind interactive Web applications.

Membase says it is 100 percent compatible with Memcached, the de facto standard for distributed object caching behind Web applications. Basically, Membase is as easy to use as Memcached but also stores data.

According to James Phillips, NorthScale co-founder and senior vice president of products, the thousands of organizations that use Memcached (18 of the top 20 most visited Web sites including Twitter, Facebook, and Google) have a demand for a solution that looks like Memcached but acts like a distributed, highly available, high-performance, elastic database technology. … Read more

Cloudera teams up to connect Oracle and Hadoop

This week Cloudera, a provider of software and services for the Apache Hadoop project, is set to announce a partnership with Quest Software to develop, support, and distribute an Oracle connector for Hadoop.

Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables its users to explore complex data, using custom analyses tailored to users' information and questions.

Code-named "Ora-Oop," the connector will provide connectivity between Cloudera's Hadoop distribution and Oracle through an interface that allows for bidirectional, scalable, and functional data transfer … Read more

NoSQL goes mobile with the help of CouchDB

If there is one aspect of mobility that has yet to live up to user expectations, it's the ability for data to be accessible in near real-time across multiple devices.

Despite all the advances in technology, including a wealth of Wi-Fi and 3G networks, many devices become impotent without an Internet connection.

This issue becomes even more apparent when you are dealing with browser-based applications and smartphones that don't have multithreading functionality to maintain state across applications and data stores.

I recently had the chance to chat with Damien Katz, the creator of CouchDB and CEO of Couchio, … Read more

Apache Cassandra gets boost from Riptano (Q&A)

A new company called Riptano recently launched to provide support and services for the Apache Cassandra project, a nonrelational open-source database designed for high performance that has a strong presence in Web shops like Twitter, Digg, and Reddit. I recently had the chance to chat with Matt Pfeil, founder of Riptano, and he provided some insight into the project and the new world of NoSQL database approaches.

What exactly is Cassandra and who uses it? Cassandra is a highly scalable, distributed, open source database. It's a top-level Apache project with committers from Riptano, Rackspace, Digg, Facebook, and others.

Cassandra … Read more