Getting started with CouchDB: a beginner’s guide

by Matt Apperson. 34 Comments -

Have you ever dreamt about a powerful database that you can access easily, without using the SQL language? That what Apache CouchDB is all about. In this tutorial, I’m going to show you how to get started with this document-riented database and how you can use it with PHP.

Getting started with CouchDB

Apache CouchDB is one of a new breed of database management systems. These new systems are known as NoSQL. NoSQL is a buzz word term first popularized in early 2009 to describe a database that is non-SQL… NoSQL is a term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas.

The first reason I am quickly growing to love CouchDB, and hence decided to write this post is due to the fact that it is a document-oriented DB, rather then storing content into set tables, it allows us to store information, in a manor that is as flexible as an array.

For example here’s a sample document:

FirstName="Bob", Address="5 Oak St.", Hobby="sailing".

However another document could have this data:

FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=("Michael,10", "Jennifer,8", "Samantha,5", "Elena,2").

This is great because first of all, we are not wasting storage on empty, or null fields.

The second reason this is nice, is that we no longer worry about tables, and columns! we need to set info, then we set just what we need. This CAN cause issues if you do not plan correctly, but we will get into that a little later on.

Another big reason I like CouchDB, is that access is through a REST API, for those who know what that means, this is big! For those who don’t, it means access to get or set data can be granted directly from the browser via javascript, without the need to write extra PHP code on the server side!

Using CouchDB

Now that I have you all hyped about it, lets get to using it. The first thing you need to know is that PHP does not have any built in functions to access a CouchDB database.
To do this I recommend PHPillow, a class written by Kore Nordmann. It is definitely one of the best I have seen so far. The second thing you need to know is that setting, and querying a CouchDB is not the same as a MySQL query. As I stated, PHPillow is the best (in my opinion) way to access CouchDB, so that is what I will be using in this example…

Database connection
To connect to your CouchDB instance simply use the phpillowConnection class like shown here:

phpillowConnection->createInstance('localhost', 5984, 'user', 'password');

Once created this connection will be used in your document and view classes automatically.

Define a custom document
All documents extend the abstract base class phpillowDocument. A complete model defining a blog entry could look like:

class myBlogDocument extends phpillowDocument
{
    protected static $type = 'blog_entry';

    protected $requiredProperties = array(
        'title',
        'text',
    );

    public function __construct()
    {
        $this->properties = array(
            'title'     => new phpillowStringValidator(),
            'text'      => new phpillowTextValidator(),
            'comments'  => new phpillowDocumentArrayValidator(
                'myBlogComments'
            ),
        );

        parent::__construct();
    }

    protected function generateId()
    {
        return $this->stringToId( $this->storage->title );
    }

    protected function getType()
    {
        return self::$type;
    }
}

The static property $type defines the type of the stored document and should be unique for each document in your application. If you are implementing a module, prefix this type with the name of the module, like “blog” in this example. If you happen to use a PHP version prior 5.3 you have to return the document type in each of your document classes like shown above. 5.3 and above users can use a more generic approach with returning static::$type in a base document class.

The $requiredProperties array defined the properties, which are mandatory to be set. The properties itself are defined in the $properties property, which is initialized in the constructor of the document. We associate a validator with each property which validates the input set on the document. There are quite complex validators, like the phpillowDocumentArrayValidator shown here, which will be described later, which are all documented in the generated API documentation.

The last thing you need to define is the generation of the document ID. An ID in CouchDB needs to fulfill some requirements, which are ensured by using the protected method stringToId(). Normally you use one somehow unique property of the document. If this is not entirely unique the document handler will append something, so that it will get unique. Just return null if you want CouchDB to give you an unique id for the document.

Using a document
Now to save data using the document layout that the above code would create we can simply call:

$doc = new myBlogDocument();
$doc->title = 'New blog post';
$doc->text  = 'Hello world.';
$doc->save();

With the call to the save() method the document will be generated and stored in the database. After this a new magic property is available for the document:

$doc->_id;

Using documents directly this ID is the way to fetch the document back from the database, like:

$doc = new myBlogDocument();
$doc->fetchById('blog_entry-new_blog_post');

This call retrieved the above document back from the database. The magic CouchDB properties _id and _rev (for revision) are set for the document. Beside the defined properties another property has been created by the wrapper, called revisions, which contains all old (and the current) revisions of the document:

echo $doc->revisions[0]['title'];

If you now change a property on the object and store it again in the database the old revision will also be stored in the database, so that no information is lost on change. This behavior may be deactivated by setting the $versioned property to false.

Did you say revisions?
Why yes I did! Thanks for noticing! My Steve Jobs “One more thing!” moment, is that if you alter a document in a CouchDB database, it save the pervious version as a revision automagicly! No need for multiple database entries to make sure your application can roll back!

So thats about it for this tutorial. Next time we will get into how to run more advanced queries using PHPillow.

More posts about SQL

Share this article

 

Comments (34) - Leave yours

  1. Jan Hertsens said:

    One thing you did not explain is the “why” ?

    Why do we need a new format to store data? What here cannot be implemented just as easily in XML, YAML, JSON,… ?

      • Phoenixheart said:

        CouchDB and NoSQL are both great, but that’s no way near MySQL DB being outdated… I would say each serves different purposes. For large-scale applications with extremely high number of DB read-writes, we definitely use Erlang/Python and NoSQL DB’s like Couch and Mongo, but for simpler systems, SQL DB’s like MySQL, MSSQL and Oracle will still stand their place for a long time.

        • Matt Apperson said:

          I agree as long as MySQL stays free :) I guess my point was that for so long our primary free option was MySQL… now we have a better option for larger projects that is also free.

      • Cliff Wells said:

        Sorry, but this is absolutely not the reason to use a NoSQL database. If you think that NoSQL is a drop-in replacement for a relational database then you completely miss the point. What you are saying is that the airplane has “outdated” the car.

        NoSQL databases such as CouchDB fit a particular niche. They are nice for particular applications but they in no way displace relational databases. Relational databases have been a point of pain for many people because they were trying to shoehorn non-relational data into them. However, the opposite is even worse: trying to do relational queries on data from a NoSQL database would make you cry.

        Each has their place and if you are making choices about which to use, you really ought to fully understand the trade-offs you are making. I use both CouchDB and PostgreSQL and I make the choice based on the type of data I’m storing and how I will need to access it.

        And as an aside, MySQL is perhaps the worst relational database ever created. If you want an open-source relational database you should consider PostgreSQL instead.

        • Tash Pemhiwa said:

          @Cliff Wells, to make a statement like MySQL is the worst relational database ever created and not give any basis for your arguments is plain irresponsible. I use both PostgreSQL and MySQL and I like both. But I would like to understand why you would call MySQL the worst.

          • metaljunk said:

            I’d say, depending on the version, several bugs and strange behavior would make a good reason.
            I still know how I was cursing when I found out version 5.xx names foreign keys different than you tell it(table_fbpk_nr or sth like that), or INSERT … ON DUPLICATE KEY UPDATE … can have strange effects in combination with AUTO_INCREMENT.

            However, I’m quite glad getting to work with CouchDB :D

  2. Alex said:

    Suppose I’m a beginner who has never seen CouchDB in action…

    Statements like “To connect to your CouchDB instance” make the assumption that I have a CouchDB instance to connect to. Where is this instance? How did it come to exist? Is CouchDB something that runs in a separate process? On a separate server? How do I install it? Do I even need to install it?

    Those are the kinds of questions I was hoping to have answered when I clicked on the link.

    • Matt Apperson said:

      This was written to be a beginners guide to using CouchDB, not to get you started from nothing… My reasoning there was that if you need to be told that basic of info, then you should prob just stick to MySQL for now. (no offense intended to anyone)

  3. Bruno Alexandre said:

    Hi,

    Just to let you know that I had to stop working with CloudDB as their service sometimes are down for more than 12 hours (I’m in Europe).

    The diea is fantastic, but the service is just rabish if you are builing apps that need the DB everytime.

    I started to use Amazon SimpleDB instead and I’m very happy, the prices are very very low than before, I know, it doesn’t beat free, but for having an application that can’t connect to the DB several times in a month, I do prefer to pay.

    Just to let you guys know.

  4. Vatar said:

    CouchDB does NOT support document versioning like SVN or GIT. Versions are only used for conflict resolution and are deleted whenever the DB is compacted.

  5. Jan Lehnardt said:

    > Just to let you know that I had to stop working with CloudDB as their service sometimes are down for more than 12 hours (I’m in Europe).

    @bruno: I’m sorry to hear that the CloudDB service is unreliable, but what does that have to do with the open source project CouchDB that you run on your own servers? :)

  6. Jay said:

    Hi Jean!

    Firstly, thanks for all of your previous work and many thumbs up for this awesome tutorial. Have been a follower to your blog for a long time – which definitely is one of my favorites!

    I wonder if you would consider writing an additional part or followup to this tutorial that covers how to manage queries (views) while using couchDB.

    I got really stuck trying to learn this and I guess I am not alone :)

    I would really appreciate it. Many thanks once again!

  7. Chris Henry said:

    Great tutorial! Do you, or does anyone have experience with how this type of database affects application lifecycle? One of the things you mentioned was that no set schemas can cause problems later on. I can imagine that application code has be more aware of the fact that the results being returned are probably not going to be consistent, like what you would get back from a DBMS. Aside from that, what other things should be taken into account when building an app on top of a NoSQL DB?

  8. Havrest said:

    A little error in your example :
    phpillowConnection->createInstance(‘localhost’, 5984, ‘user’, ‘password’);

    should be
    phpillowConnection::createInstance(‘localhost’, 5984, ‘user’, ‘password’);

    I didn’t know those NoSql databases. You tell that it is a non-relational database.
    But what is comments then ? A static array of comment stored directly in the blog_entry object or an array of links (relations) to some comment_entry stored in the database.

    As you talk about the non-relational design of this database it would have been good to talk about that and provide a *complete* example with the comments explication.

    If I understand correctly from the CouchDB documentation it’s more likely to be a ‘static’ array.

    if I want to add author’s informations in my blog_entry, I can’t link it to a global author entry. And if my author change their username then I would have to modify each blog_entry created by this author ?

    I have a hard time understanding the non-relational logic.
    Do you have a good introduction article ?

    Thank you.

  9. Binod Suman said:

    As I am Java person and dont have idea about PHPillow. So I used couchdb-0.9-windows.zip to work on CouchDB. Just unzip this file and go to bin directory and call couch_start.bat. It will start the CouchDB server and you can start your practice.

    Many thanks for this nice tutorial.

    Binod Suman

  10. David Advisor said:

    For a high number of large-scale database applications to read write, we must to Erlang / Python and NoSQL DB like sofas and Mungo, but for simple systems, SQL databases, such as MySQL and MsSQL and Oracle are still will stand in their place a long time.

  11. Erm said:

    This is a useless page. It doesn’t show you jack. For instance how do you retrieve data? What’s the use of putting stuff into a database if you can’t get it out. Or search for it. I’m starting to think couchdb is a stupid idea if 90% of the tutorials don’t show you jack on how to actually use it.

  12. Oliver said:

    What I do not really understand is: you seem to throw away data normalisation for document orientend DBs like CouchDB.
    But this seems to be a sure-fire way to get data inconsistencies over time.

    For example in your Document with the children in the parent… what happens if one of the kids marries?
    I would have to update the last names for the kids for the person_jennifer id, but I would also have to find out somehow (how?) that the same person is mentionend as a kid for Jonathan and update the data there.

    Also how would I see that while Jonathan is the father, Mary is the mother?

    In relational databases this would be solved because instead of a list of kids, we would have only the kid’s ids stored with their parents. I could then add records for father and mother holding the IDs of the parents so that I could easily find out the father of Samantha.

    How is it solved in NoSQL?

    I am really missing hints on data modelling for NoSQL DBs – do you have some pointers?

  13. Sreenath Gotur said:

    We have a situation where we have a table with few columns that are static, meaning we know what data will be loaded with datatype and we need 10+ columns that are dynamic, which depends on some business definitions. Do you think Couch DB will help? if so, can you explain?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please respect the following rules: No advertising, no spam, no keyword in name field. Thank you!