Mathias Meyer talking about “NoSQL – The definitive guide”. About Mathias Meyer: http://2010.berlinbuzzwords.de/content/mathias-meyer Website: https://berlinbuzzwords.de/ Twitter: https://twitter.com/berlinbuzzwords LinkedIn: https://www.linkedin.com/showcase/berlin-buzzwords/ Reddit: https://www.reddit.com/r/berlinbuzzwords/
I like to hear about no sequel right so
I gotta love you I don’t have much to
add to the next speaker he gives us a
very nice overview of all the things
we’re here after his talk is done but it
will be a very good introductions great
pleasure to introduce Matthias maya
thanksgiving man ok so my talks title is
no no sequel the definitive guide what I
want to do today is give you like a
whirlwind tour of what the technology is
all about and what the tools are all
about and it’s going to be a really fast
one if you have questions you can always
come up to me after that ooh am I my
name is Matthias Maya I carry the fancy
title of chief visionary at a company
called Paratore here in Berlin I play a
lot of with clouds amazon CC to ruby and
no sequel databases I I tweet and I
write a lot of code and what I do for a
living basically is built this this tool
called scuderi emits an awesome cloud
management and deployment platform and I
want to say a little disclaimer I I’m
just a user I am I’m not here to talk
about any of the tools in specific I’m
just I’m just want to give you an idea
and I just like to play with them and I
work with them that is what I what I
have to do with no sequel and I blame a
couple of people for me being here one
is this guy he’s sitting right in front
here he’s one of the committee’s of
catchy be on Leonard and the other guy
is salvatori sanfilippo he’s the author
of Redis another awesome tool I’ve grown
very fond of and yeah I just blend these
people for me being here because they
like kind of infected me with really
playing and working with that stuff what
is no sequel it’s a weird word that’s
that’s pretty much a given and it leaves
a lot to imagination first off is it
about having no no sequel it is not
really it is a there there’s been a
couple of other terms like less less SQL
or not only SQL or even post relational
and for me they’re all pretty may
because we’re not talking about new
technologies basically it’s like we’re
talking
more of an evolution they just the
technologies comes in a prettier package
to buy now and I’m going to go into into
good detail about that what no secret is
really about and this is the part where
it gets opinionated this is my view of
the no secret world if you’re talking
probably to a lot of other guys able to
tell you a different idea but and I know
that a lot of ideas i’m going to be
talking about today are at least chaired
by by other people so i think you’re
good in either way the first and for me
most important thing is that it’s about
simplicity it’s relational databases as
we’re working with today they love
imposing constraints on us they love
imposing constraint on our data and
constrains has constraints have the nice
side effect of slowing down not just
performance if you ever had to build an
index on a mysql table with a couple of
million entries you pretty much know
what I’m talking about it also slows
down slows down development I’m not very
flexible with my schema I always have to
take care of my schema and I’m kind of a
lazy guy I don’t let I like to take care
of my schema but I I don’t like like to
have that constraint imposed on me by my
database simplicity is about removing
these constraints about giving me giving
me the tools to do what I want with my
data and it doesn’t reinvent the wheel
the wheel because that’s what that’s
kind of very important to me the tools
we’re talking about they’re not exactly
new there are just combinations of new
technology like forged into new kind of
awesome products but they’re not they’re
not trying to build everything from
scratch data and that is no SQL is for
me is about data and that question I was
asked why why is it about data what is
what is your point with data because
relational databases are also about data
obviously because I’m storing data in
them it’s about having different use
cases and different use cases require
different data structures in relational
databases I only have one data structure
I only have tables and I have to fit my
data I have to rework my data somehow to
fit in tables and in the end well
depending on my scale I always ended up
with
d normalizing my data and kind of weird
for a relational database but if you
look at what reddit did what it did did
whether a lot of other companies did was
in the end they built somewhat of an a
sequel database on top of a relational
database and to me that just doesn’t
really make sense although you could
obviously do it it’s no no sequel for me
the biggest tagline is it is about using
what’s right for your data even if
that’s a relational database I’m not
excluding relational database from the
world of databases now just because I’m
talking about no sequel it’s about using
what your particular use case what your
particular use case requires you just
pick the right data structure for that
and that is somewhat of the essence for
me oh and then the best part which is
basically where why we’re all here
scalability but it’s kind of not that
big point for me I gotta say I’m more
about the other things than about
scalability scalability is a nice thing
I get on the site by simplicity and
having having simple data structures
it’s obviously about handling ginormous
amount of data there’s going to be a lot
more talks on that today I’m not going
to be I’m not going to be talking about
stuff on the largest scale really i’m
just going to be talking about no sequel
and in data data is awesome it’s about
simpler ways to scale up relational
databases are kind of weird when it
comes to scaling up and probably a lot
of you know that and i’m going to go
into a lot more detail on that later
it’s about diversity it’s about i
already said you’re going to use
supposed to pick what’s right for your
data and that doesn’t necessarily mean
you’re just supposed to pick one tool
you pick whichever tool is right for
your particular use case and that is
what diversity for me is all about
relational databases really try to fit
all sizes and I don’t know about you but
i but i always found they did it rather
bad job at that and a disclaimer i
haven’t just worked with my mysql and my
in my time I’ve worked with a lot of
relational database and
it just never wasn’t any fun to work
with that matches and working with the
even working with tools it should be
somewhat fun and my favorite quote is
this it is 20 years old its database
research has produced a number of good
results but the relational database is
not one of them that is 20 years ago
think about that some guy wrote that to
the e to the ACM forum back then because
it was a time when people started
writing a lot about relational databases
and yeah that’s a this one this is a
link and you can read the whole letter
and I would highly recommend it because
it’s basically it basically describes to
me everything that no sequel is all
about the fun started when relational
databases met the web yeah this is a
wordpress schema it’s really good fun to
look at I really like it it’s it’s not
even complex but still it’s already
heavily lida normalized which is kind of
ironic to me it as the web is kind of it
didn’t need more structure to me it
quite the opposite quite the opposite of
that Italy needed less structure I
wanted my data to want to be whatever I
want I think in the earlier talk today
someone said it eighty percent of the
data we we produce is unstructured was
that right awesome and that is pretty
much my point what I’m producing and
data is I think I like to sing think
more of that in documents they’re pretty
loosely structured and I don’t have I
don’t have a particular schema I want to
I want to have my documents in there
loosely structured it’s kind of the the
what schema-less in the end was was
supposed to be but schema is kind of
weird turned to me as well I like to
describe it in documents but schema-less
is about having no constraints and about
being very flexible about your data
model so it still fits perfectly for to
have less structure and simple data
access I don’t wanna I’m not a fan of
SQL I gotta say it’s it was always
mind-bending to me I have tried but in
the end we’ve never become friends let’s
structure this is what less structure is
to me it is again I say it’s upfront
it’s a catchy beat
cumin but it’s it’s just it’s just a
standard it’s a standing for a document
that’s standing for schema-less and for
JSON because if anything no SQL loves
JSON or anything that is somewhat JSON
related why is that jason is a very is a
very simple data format and it brings a
couple of a couple of data structures
that I just well you will find in most
databases and that are usually good
enough to describe all of your data and
it’s pretty universal you can press it
in any in any programming language and
you can basically it’s just a giant hash
and it’s a fun data structure to work
with it’s not the optimum and it’s not
the only data structure that is that is
suitable for that kind of stuff it’s
just somehow it just it has emerged as
something that people just love to use
even though verna Fogle’s from amazon
doesn’t really like it simple data
access it’s again it’s a couch gb
example I gotta say it but it’s it’s
it’s a good example for simplicity
basically I just have a giant ID which
looks kind of weird it could be a number
if you wanted to but at the end I’ve
gotten pretty used to using these ID IDs
and you just have basically it’s just
key value axis you use a key and get
back or either a JSON or whatever
structure you have in your database but
simple data access is it’s kind of weird
to explain because you could always say
that relational databases I can always
get my data but with just a primary key
but to me it’s just it’s just not the
same and I’m I have kind of a hard time
coming up with a good reason for that it
just it’s it’s you’ll get a lot clearer
picture when I time when I’m going to be
talking about scaling that it makes a
lot more sense to have similar data
access because it just makes exactly
that simpler scaling up and that’s in
the end what everyone wants their likes
to do and why we’re all here we like to
scale up scaling up wasn’t really a big
issue before today’s web it was we sure
we have big installations and we had
giant databases around on mainframes but
we had like there were a lot of well new
new boundaries we had to work with like
databases across different different
data centers and stuff like that or what
what yon loves talking about the offline
offline web you know when you’re just
you’re anywhere and you want to work
with their database and you want just
bit later when you get back online you
just want you you want the data to get
back on the web the classification it’s
just the way the first part was merely a
small introduction and the there’s been
a couple of tools that has that have
emerged and a couple of well a couple of
different different categories these
tools somehow tried were put in because
people love putting putting their tools
into different categories and I’m going
to try to do while a simple
classification because one knows sequel
databases like the other there’s all
somewhat different but they’re all
somewhat strangely similar which is kind
of fun we have four contestants in the
categories we have key value stores we
have document databases and column
stores and graph databases and all these
databases will except for key value
stores I think are somewhat represented
at this conference and I would highly
suggest going to all these talks to get
more detail on that on these kind of
tools because I won’t go into a lot of
detail on them i will just just merely
give you an idea but there’s going to be
a lot of talk on them and I would really
recommend checking them out keep the
other stores they’re basically what I
what I just said they’re in the simplest
thing you could probably probably
probably imagine you just have a key and
you have a value and you just use the
keys to look up that data and that’s
pretty much it a key value store can do
its kind of mind-boggling because you
know that’s always a question what can
you do with that simple axis you can do
a lot of things with it what you can do
is query query for any data with a key
value store I’m going to try and to do
something more historical to really give
you an idea that we’re not talking about
new technology here it’s there is a tool
called berkeley DB that was built in
nineteen ninety one that basically did
just that it was you got a you had a key
it was an abandoned database engine and
you could store your data at this she
realized array
and it’s still in use today and
apparently it can handle up to 265
terabytes of data in one database so
that’s that was pretty amazing to me to
realize that and it’s still an active
use even buy tools that are built today
modern modern no sequel tools what you
would find today is project Voldemort
for example Tokyo or Kyoto copy net the
author of these tools is quite weird he
likes to build one tool leave it on the
side and then build a new tool and you
have redness and Amazon’s s3 is probably
the best example for any key value store
that a lot of people probably already
have used it’s a pretty awesome tool and
I really love using it and the the final
one of the content one of the final
contestants is culeros there is a lot
more of these tools I’m just giving you
five examples it’s pretty become pretty
much hip to write your own key value
store but these are pretty much in
widespread use at least well i’m not i’m
not exactly sure about skill areas to be
honest but the form of four are in our
ineffective use bigger i know because
i’m using at least two of them and i
know of a lot like a lot of other
companies using them the next contestant
document databases you have well the
example basically already gave you you
have Jason you have pretty rich
documents there self-contained they
don’t have they don’t have a they don’t
have a strict structure you they could
have any kind of data they could have
any attributes any value it’s it’s up to
you there are probably the most
versatile of all of all of these tools
and you can do queries on your data with
JavaScript and map MapReduce kind of
stuff it kind of has emerged as a way to
do that in document databases but if you
think back in the mid 90s there were xml
databases I have no idea who actually
use them but there is somewhat there are
somewhat similar they’re also document
databases and but you would use for
example xpath to access your data and if
anyone has used xpath it’s really not
it’s really not fun to use and yeah this
pretty much the same example as as I
already gave you earlier
this is an example document this one a
document would look like in catchy be
but it could be it could be anything
that you could fit a loose data
structure in XML for example but I did
want to show you XML because it’s that
is very 90s ah lotus notes the first
real document databases humankind has
ever used built in 1989 these years are
probably estimates and what I could find
on the internet but it’s as you can see
it’s very old but it’s still on the bags
you you get at the conference but it was
document-oriented anyway you can have
you and that was you know what everyone
will tell you about lowes notes what was
amazing you could stuff any data in your
lotus notes databases that is pretty
awesome and it had offline replication
you could take your laptop anywhere work
in your lotus notes and when you get
back online would sync up your data it’s
unfortunately still news any if any one
of you has really used lotus notes you
you you understand why i right
unfortunately what do we have today
today we have CouchDB we have react and
we have MongoDB but we also still have
XML data bases unfortunately too but
apparently they’re still a use for them
yeah what can you say cal chibi is lotus
notes on steroids and i really like i
really like telling people about that
because it is the best example for an
and kind of old technology that has been
well that has been equipped with new
technology and turned turn into
something awesome you have like a couple
of a couple of Lotus Notes technologies
it is offline by default you can sync up
to any to any added a couch to be
database and you end up with like a p2p
appear to pee like a replication I’m not
going to go into a lot of detail because
the next talk will be on couch to be but
there’s also react which is pretty much
it is pretty similar to catch to be in
the in the core but it follows a very
different model of scaling a model of in
yeah i’ll be talking about later column
databases there are
most mind-boggling databases to me
because they’re they’re very much about
storing similar data at one in one in
one particular point there’s going to be
a talk on Cassandra tomorrow and i had
to recommend checking out i’ll surely be
there similar data is stored together
and the difference there’s a weird
difference to relational databases you
access your data directly by key and an
attribute and then is incredibly fast if
you ever try if you look at a
traditional column in in a relational
database this is usually what you have
and you would end up looking for the ID
and then go to the row and fetch the
data in a column databases in a column
database this is what basically what a
column would look like it’s basically a
hash but you can access that hash by the
key and the name and that that excess
won’t go it just won’t go threw to first
fetching the column and then the
attribute it will do that in one step
and that is incredibly fast and has the
the nice wall the nice side effect of
being nicely scalable the best example
is the earliest example I could find was
sybase IQ it was built in nineteen
ninety-six so once again we’re not
talking about a new technology here it
is like a technology that has had a
place for in business and analytics for
about 15 years already and apparently it
can handle it sybase IQ in 2007 they did
they did a kind of a test and it handled
the largest database of business
analytics data which was about one
petabyte probably by now facebook
already has suppressed that but back
then mmm back then it was a pretty big
deal as you can imagine but then they’re
almost was no Facebook what do we have
today the best contestants today and
probably the most well-known are
Google’s big table it is not exactly the
same as for example Cassandra but it was
its kind of the inspiration for all of
that and yeah and you have h bays and
hyper table and i just want to go into I
just wanted to scratch Cassandra because
Cassandra is a mix of big table and
dynamo I’m going to be I’m going to be
going into a bit more detail in dynamo
in a moment but yeah it basically takes
the scaling model of dynamo and the
data model of big table and well in a
warm and loving embrace if you will and
it supports a lot of other fancy stuff
really highly recommended to grow
through the talk tomorrow to get a lot
more details because some of it is
really it’s really mind-blowing I gotta
really say the last contestant graph
databases graph databases they’re
allowed to store very very large
networks of trees of data and to
traversal of that of that graph and the
tree is very cheap because well there’s
going to be a neo4j talk to it today
later on but the gist is traversing the
tree is so cheap because the data is is
just story in a way that allows for that
and basically it’s very easy to just
dump whole trees or graphs of objects in
the graph database and just let it take
care of of storing all your associations
and it’s you just don’t have to do any
nested queries to get to your stuff in a
graph database you just you can walk you
can walk from association the
association or you can do fancy queries
on them and yeah this is this is what it
could look like this is the Internet
someone around 2000 I think it’s
probably a lot bigger than that but just
this is basically a kind of a kind of
graph that you could store in a graph
database and yeah the earliest one I
could think of was this one it’s not
exactly a graph database but is this
kind of a thoughtful predecessor to to
graph databases is an object database
you had like something like trends a
transparent object persistent you could
just take a Java object or C++ object
yeah yeah I know Java and C++ you could
just take any object and stay and dump
it in the database and it would just
take care of you take care of
serializing attributes and object
associations for you I’ve had the
pleasure of working with that database
so it was pretty good fun but it’s
pretty easy to store and Traverse
millions of object and that’s basically
the same what for example these tools
are about maybe not core data core data
is shipped with em echoes 10 it is a
persistence framework and it’s some
a it is it is a graph database as
amazing as it sounds and obviously the
most the most popular player today is
near for J but also hypergraph TB
there’s a lot more tools to that but
these are just a couple of examples who
r which I could find we’re in active use
and near for j is well it’s it used to
be an embedded engine by now they also
have a restful a restful service i think
and the data is somewhat semi structured
so you go basically going away from the
motto from the model of having closed
objects like an object databases your
data is very flexible just you can think
of it as a document database basically
that has a very a very nice way of of
traversing your data and apparently it’s
very easy to store hundreds of millions
of objects to neo4j and walk them back
the categories i was talking about they
overlap in some cases document database
can be a key value store and a lot of
people use it like that this is why i
mentioned the key value access because
it’s still the same in a document
database a column story is just a fancy
key value store but it is really fancy a
crown and a graph database basically
this today is a document database on
steroids but i’m really on a lot of
steroids basically and some document
database head can handle crafts for
example react has somewhat built-in
support to to have to have links between
documents and you can fetch them on one
go it’s pretty awesome and makes react
kind of unique in the world of document
databases the funnest part scaling out
there’s two pretty common models of
relational databases you would find in
real life and one is to have a
master-slave set up and with an obvious
bottleneck up at the top although
basically all your rights go to the
master and all the reads go to the
slaves and if your master goes down all
you can do is read data which is okay in
some cases but it’s basically it’s
probably not what you want the other one
is charting but charting in a way that
your client it did some some consistent
hashing algorithm or just knew how
how your data is partitioned and just
would go to the correct chart and in
some way it’s just not just not great
ways of scaling out if you ask me but
charting was it’s gotten pretty popular
for example a thicker they they did that
pretty early on they just started
spreading out their data partitioning it
by some key partitioning it across
across charts and accessing data as they
want the problem is you can you couldn’t
do any joints uncharted uncharted data
and somewhat somewhat started defeating
the purpose of having a relational
database some no no sequel approach I’m
not going to be talking about all of
them because there are a lot I’m just
going to talk about what what some way
some way or the other has become very
popular and you were some stuff you will
find and tools that are going to be
talked about today and tomorrow this is
pretty much my favorite it is if you
will the peer-to-peer whip and yun is a
very big fan of the peer-to-peer web and
I’m a very big fan of the PHP wet myself
basically any database can talk to any
database you can this is the CouchDB
replication model you can take any
database and another it CouchDB database
and have them replicate each other and
no matter how how how different their
data is you would probably end up with
some conflicts in the end but you could
see you could replicate any database to
any database and that is I think it’s
pretty unique to CouchDB and it’s yeah
it’s pretty amazing I gotta say and the
funnest part Amazon’s dynamo which you
will find in at least Cassandra or in
react you have a ring which is basically
a fancy version of off charting if you
will your data is your data is
partitioned into some slices it looks
like pizza slices and your partition it
by some key by any key you basically
want any key that is suitable in this
case is just the prime just an ID it
could be the range could be from could
be millions the range could be just one
to a thousand it is pretty much up to
you the gist is you slice up your data
in equal in equally sized partitions
that data can be replicated you have for
example one partition one partition of
your data that is replicated on at least
three nodes the number the thing to look
for is that end up there you have n
nodes just get to that in a minute and
you can go to every to every key to
every note and the cluster to get any
key and that is where it gets different
to classic charting you can ask any node
for any data even though you know that
note doesn’t have that data so basically
what you don’t have to do is to care
about how which nodes you have in the
cluster and that is pretty awesome about
that you can go to any note and ask for
any data it’s pretty nice way to scale
up if you ask me dynamos also about
engineering data consistency is you have
rights that go to you can tell you can
tell the ring I want my rights to go to
at least aw replicas and you want to
have your reads coming from at least are
replicas and that either way determines
if the operation was successful or not
if you can read the same data from from
from three different replicas it is a
successful read and the end the r and
the WR called the quorum and it’s kind
of real weird mathematical thing and i
try to come up with some definition of
it if you have three replica notes in
the system or even if you have four and
you see you’re saying i only want three
nodes to have that successful right that
is the quorum do you know when when all
these when these three replicas come
back and say that that read or write was
successful it is good never mind if the
if the fourth note we’re going to have
the data later because that’s where the
next stuff brought to you by Amazon
comes in eventual consistency it is it
is also if you handled with mysql
replication year they were rather they
were not very open about the fact that
it is actually something eventual
consistency because what it is about the
storage system guarantees that if no new
updates are made to the object
eventually all accesses were trying to
return the last updated value so imagine
back to the ring you have four replicas
and you just need three of them to have
a successful right at some point even
the fourth node will return that value
you wrote but it doesn’t have to be
right now it doesn’t
it might can be at any point usually you
would obviously expect that it’s within
milliseconds and obvious usually that
will always would be the case but you’re
accepting you’re accepting that the
right could be that you the next
successful read could be a little bit
later that is basically what Amazon
built all of their tools on ec2 as three
simple DB all the rare stuff sort of is
based on the idea of eventual
consistency and calc CB is based on that
idea react is based on that idea any any
any storage you would probably find that
doesn’t have that doesn’t really work
around distributed transactions which it
really shouldn’t it’s somewhat based on
the idea of eventual consistency why
would you pick dynamo for your database
because all these most of most of these
except a graph databases column stores
key value stores and document database
is a all access their data by key and if
you just need a key to exit your data
you suddenly suddenly everything becomes
very partitioning friendly you can you
can take that key put a hair on a hash
function on it and and could go to could
find out any partition it could find out
the partition that keys in that’s why
that mo has become really somewhat
popular and well I don’t want to say the
choice of the no sequel generation but
in that you will find out it you will
find it in a lot of tools Cassandra and
react are just the more the more popular
examples of that and these are just two
approaches to scaling out I’m not I’m
not implying these are the perfect ones
or that they’re just the two only ones
they’re just the ones you will find a
lot the PHP one when you’re working with
catchy be in the Amazon dynamo the
Amazons dynamos scaling model you will
find that in a lot of other stores the
funny the fun question is and I was
talking about the earlier we don’t have
the technologies aren’t really knew how
is today different from back then you
know why is why do we need different
tools today a couple it’s simply the
simplicity in the tool doesn’t really
new lotus notes in some way or the other
for example was already simple to but
maybe maybe not great for me
data is what’s become important it’s all
about my data and that’s what’s awesome
you know I’m free to use any tool that
just it’s just the right fit for my data
and the simpler ways to get out
obviously there it’s not it’s not that
important to me to really have awesome
ways to get out but I really I really
enjoy playing with the technologies that
have emerged for scaling out and i
really love ya playing and working with
that but it’s it’s more about data and
simplicity for me because these are like
my my personal development models the
evolution is around data and access
patterns it’s before you pick it a year
before you pick a database you’re
supposed to think about you you start
more thinking about what kind of data do
I have my database and how do I access
the data and when I’m saying access I
obviously mean how do i query my data
and you would probably their cases were
you we would find out I need a good I
need a good amount of dynamic queries
and a no sequel databases maybe not a
good fit but it forces you to think
about that and you should do that anyway
but with relational databases it was
somewhat easier you just you just had a
relational model you knew your data
would somewhat end up in a table or in
multiple tables and in the end you would
end up d normalizing your data anyway
because otherwise it doesn’t scale up
very well open web technologies are not
just awesome they’re a pretty good fit
for that lets you saw a lot of JSON and
JavaScript and HTTP in the talk today
and it’s somehow they have just emerged
there they’re proven technologies and
they just made a perfect fit to be
embedded in a database or as protocols
the database could use to talk to i love
i love texture protocols I hate kind of
hate binary protocols I gotta say and I
love playing with databases where I can
use just used curl to to access or play
with my data it’s very geeky I know but
what are you going to do if anyone would
these days were trying to sell you a
closed product I can only recommend one
thing is just to run because it’s not
just about getting your data in it’s
about getting a date it’s about for
example getting a data out I’m not
saying you’re going to flee through
another database later on but it’s it’s
about the ease
that you can you can handle any kind of
your data in any way you want and closed
products somewhat have have the tendency
to to make that to make that stuff a lot
harder if you have used amazon simply be
at any point you’re pretty much know
what I’m talking about even though it
you could count it as a as a contestant
in the no secret world it’s just not a
great tool no sequel isn’t the Holy
Grail and that is very important that is
kind of very important to me it is not
i’m not saying to anyone that he should
you definitely go use a no sequel
database for his next project if you
love playing if you love living on the
edge sure why not there is a couple of
things you just can’t you just can’t do
really nicely range queries can be kind
of hard especially when you’re talking
about a larger scale when you have when
you have a good ring it becomes harder
and harder to do to do more complex
range queries and that is one example
for example and couch to be that can get
kind of weird it’s just that’s just what
it fell to me right did you arrange
korean couch TV it’s kind of awkward but
it is possible but yeah it’s it’s boils
down to the fact if are you willing to
deal with that awkwardness or do you
just want to do with the the classic way
of your relational database ad-hoc
queries and it is just kind of the the
most important thing there is no there’s
almost no database where you can do
fancy ad-hoc queries for you know if
your boss comes to you and you need some
fancy business and analytics a report
that it gets very hard with the nose
equal with the only database i could
think of you could do that with is
probably simply be but only in a very
restrictive manner hmm
okay if you can’t leave seen amongst
those then you could probably do that
that is true you don’t have any
transactions in the end I don’t know a
lot of cases where you would end up you
needing transactions it is just a very
important thing to know you just if you
have multiple operations you need to
have in one go it’s very unlikely or
that that need to be that you need to be
atomic it is very unlikely that you
would end up you using a no sequel
database because that too it just
transactions don’t scale well in the
distributed system if you ever used a
distributed transaction for example in
the j2ee world it’s just not fun the
question the end is if I if I if I went
through all of that if I went through
the pass process of analyzing my data
and looking at my ex’s patterns and how
in the end would I would I pick the
right tool for my job of course I can
tell you the universal answer for that I
can I won’t I wouldn’t in any way imply
that I knew the answer to that because
it depends it always does it depends on
data structure do you just have simple
data do you need it easy easy to be
scaled out then you probably find with
the key value store or a column store
depending on if you have a slightly more
complex data model columnstore is
probably a good good idea if you just
have simple data trim for fast access my
favorite tool to do that is Redis you
can use MongoDB for that because that is
a good good part that MongoDB does well
to or basically any kind of simple key
value store if you need richer data
models you would probably end up with
the document database and for me
document databases are pretty much well
they’re the most universal of all the
four contestants there have they just
have pretty pretty good use you can you
can put a lot of data in a document but
also obviously that still comes with
some caveats but I can I can mostly
think of a lot of uses that a document
database is a good fit and you can use a
graph database read brothers right
patterns I’m not going into a lot a lot
of detail because in the end you need to
play with the tools so you
to play with them you need to work with
them you need to stuff the data you have
in them and you need to find out if if
they scale up in your in the demand you
probably you’re expecting which is
probably always not the greatest idea
but if you just access documents by a
key you can use the key value store you
can use a document database if you just
access objects by more complex queries
you can again use a document database
but you need just the thing because the
same caveat is true you can do any ad
hoc queries you have to you have to know
how you queer your data and you have to
define the queries and for example
javascript or of the XPath you can
probably do ad hoc queries next path but
who would want to do that or you use a
graph database for example near for Jay
has I think it uses RDF to to do fail to
do fancy your career stuff and that is
pretty nice it looks a bit awkward I
gotta say but at least you can you can’t
query your network of objects at any
time it is really cheap and just to give
you a particular example because I love
talking about it is Redis but red is in
the end requires if you have if you have
just simple data and ritis really you
need to yeah you started duplicating
data you just start start maintaining
lists or sets of keys that you use for
reverse lookups to two other data
structures to get back your data it is
quite fun and radius and I’ll be happy
to talk about that later but it’s just
one example if you only X a similar data
like for example dick or all reddit and
and twittered us use a column store
because it’s that’s what they’re really
really good at and kind of amazing tools
can only repeat that they’re kind of
mind-boggling to me and well ad hoc
queries I’m sorry I leucine is not in
here but you could bring you could
usually seen for that and you could use
MongoDB because they already the only
ones basically apart from simple DB
which I excluded on purpose there are
the only ones where you could do some
sort of ad hoc query ring on your data
and it can can be a problem for some use
cases but so far enough for mine I gotta
say and you could use new fridge
again you can use something I already
have to always create we’re your network
of Gras of objects or documents if you
have tightly connected object graphs
obviously your only choice would be
graph or object databases but there is
another choice for you and it’s called
react react has I already said that
react is the only document database
where you have a built-in way to story
links between between your documents and
you can fetch them in one go you don’t
have to yeah I don’t have to go back to
your database to to fetch linked
documents you can all you can get them
in with one request easy to scale out
yeah that’s a good category because in
some way or the other they’re all our
rear cassandra provo llamar etc etc
because scaling out is well it’s it’s
still somewhat a problem but these tools
someone made it easier the final
question how do I know I’m not wasting
my time with these tools they’re pretty
new I gotta say and we’re still lacking
somewhat of the world production
experience with with them in a wider
range but there’s still solutions to
real life problems Cassandra react
Margaret Eby they always they all were
built out of the need out of a
particular need that existed mostly it
were in and some in one case or the
other it was a particular scaling need
that forced them to simply simplify that
data model but it’s not that these tools
haven’t proven themselves in production
that is really not the case and they’re
more of a natural evolution and that is
what that is how I like to think of them
they’re not they’re not aiming to
replace they’re aiming to a company and
it’s what I like to tell people that
they’re you know it’s not about
replacing replacing the database they
already haven’t used it’s not about
throwing away your knowledge it is about
polyglot data storage it is about
choosing the right tool for a particular
use case even if that use case means you
have you have three you have two or
three different databases in your in
your setup it’s not a bad idea in
general and well personally I already do
that I use I use whichever database is
right for which I whichever database I
find to be
right for a particular use case if that
is a different database that is just
fine with me if it’s good for the fur if
it’s good for the purpose I’m after it’s
just a bit it’s just a better choice the
name is still kind of unfortunate and
I’m personally I’m not standing here
trying to cope with and come up with a
new one it will probably stick around
for a while and stand until someone will
be able to come up with the new one and
with that i’m done any questions there
was a lot of stuff i know and yea it is
was just a basic intro definitely go to
all the other talks to the reactor
CouchDB and the cassandra and the
MongoDB talks to get more details on all
of these tools yeah I personally have
though the question is if I can say
anything about the usage of for example
Cassandra in companies like Twitter or
Facebook and I can’t I haven’t worked
with Cassandra twitter twitter is kind
of weird about what they’re using
because I love saying they’re using new
tools in the end you never know if
they’re already using Cassandra facebook
has because I has had Cassandra news for
a long time and they built it so that
was their one of the particular needs I
was talking about one of the particular
real life needs not sure if and how
they’re still using it but I can a worn
out when I look at Cassandra and look at
the code at their data model it’s it’s
not hard to find a use case for that so
it’s kind of it’s kind of it’s kind of
different and it’s very special but I
can tell you a lot of details but
probably when eric evans is going to be
talking about Cassandra tomorrow he will
probably be able to tell you a lot more
as for what what we’re using i’m using
CouchDB and red is in production myself
and so far we’ve had pretty good
pretty good experience with that it’s
obviously it’s it’s new territory and
somewhere in some edges is weird for
example of range careers in KGB and but
in the end I’m very happy with it and
I’m using can’t see beyond new projects
for example and I just love playing
around with the with these tools they
need independent of who is using them in
production because yeah that’s what the
guys probably know best who wrote it
anything else yes okay
in the other well how do you discover
them I could not probably not answer you
that question because it’s very specific
to your user needs but in the end and
that’s that is what i was saying
schema-less is kind of where term
because in the end you will always
migrate data but you will just stop
migrating my grading schemas you will
migrate data in no in ways that make
more sense for you or for your excess
patterns you know this you just don’t
you don’t stop moving around your data
model just because you’re using a very
flexible no sequel database you always
have have areas where you where you push
around data to be to be a lot more well
suited for the excess patterns but
obviously you’d have to do that in
somewhat of a migration step you can
just go ahead and write a new query for
that well you can but you can in
JavaScript for example but it is less at
a hawk definitely then in SQL query in
this case so that’s definitely something
to keep in mind about that and there’s
no Universal answer around that but
maybe if you know the demand for that
increases who knows what’s going to
happen I don’t know I can tell you the
future time’s up okay thanks again