This week I’ve been busy with a proof-of-concept using ElasticSearch to store a whole bunch of data. For me this was the first time in the NoSQL/Document-based world having used RDBMS systems all the time. With that said, I’ve been spending quite some time in the ElasticSearch documentation and Google to find ways to achieve my goals with ElasticSearch’s document types.
One of the most confusing things to me was how ElasticSearch is build on document types (which often get compared to data tables in a RDBMS-world). Having a setup where FileBeat watches a set of directories and reads a bunch of files I wanted to get these document types correct.
First thing I noticed was that FileBeat sends these document types in a field called ‘doc’ while ElasticSearch uses ‘_doc’ to identify a document type. This was somewhat confusing, but it was something I really wanted to find out because my Symfony bundles are highly reliant on these document types. After some searching I came to the following conclusions to tldr; it for you:
- ‘doc’ (sent by FileBeat) is the old field name ElasticSearch used to identify a document type
- ‘_doc’ (in ElasticSearch) is the current name of the field to identify a document type
- Document types will disappear from version 7.x on! (source: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html#_schedule_for_removal_of_mapping_types)
With this given, document types matter a lot less. Yes, it is still required to index your documents and setup mappings and a lot of external software still heavily relies on them. But we need to start preparing for a world where documents are just…. documents.
This made me realise that my document type issue wasn’t as big and I’m now ignoring the field. The challenges that lie ahead are of course making external software (like the ONGR ElasticSearch Bundle) work with this new way of thinking. But that’ll be something for a new blog post…