Solr logo

Paginating Solr results as entities in Symfony

.During my work I make use of a lot of code which is out there for free. Why would you reinvent the wheel, right? But all these different components can’t possibly know about each other. So if you want to combine them you might run into integration issues. This is what happend when I was working on paginating Solr results on a website.

Solr setup

For my Solr setup I’m very charmed by¬†Florian Semm’s Solr Bundle which makes Solr Documents more like Doctrine entities. You can use annotations to setup your entity’s properties and the rest is almost magic. This bundle made life so much easier that I also invested some time to do a few pull requests to improve the bundle.

Pagination setup

On the other end I’m using the KNP University Paginator bundle. Up ’till now the bundle has helped greatly in paginating large data sets from an array or a Doctrine database result. As the document states the bundle also has support for other datasets like Solr documents.

Paginating Solr results with a new bundle

So this all sounds good, but what was the actual problem then? Even though both bundles are awesome at what they do, the Paginator bundle wasn’t able to process a Solr dataset to it’s original entities because it didn’t know about a datamapper being used. After some research I found out that KNP University has the option to create your own pagination subscribers. These subscribers can listen for an incoming data set and handle these appropriately.

This was my entry point to hook up these 2 bundles, after a few hours of coding (and coffee, and mingling with colleagues, it wasn’t all that hard) I created a new bundle called KNP Paginator Extra Bundle. For now it only contains the subscriber to solve this specific problem, but feel free to use it, expand it, improve it or learn from it. But when you do useful changes please return the love and send a pull request on Github.

Elasticsearch header image

Document types for ElasticSearch can be forgotten

This week I’ve been busy with a proof-of-concept using ElasticSearch to store a whole bunch of data. For me this was the first time in the NoSQL/Document-based world having used RDBMS systems all the time. With that said, I’ve been spending quite some time in the ElasticSearch documentation and Google to find ways to achieve my goals with ElasticSearch’s document types.

One of the most confusing things to me was how ElasticSearch is build on document types (which often get compared to data tables in a RDBMS-world). Having a setup where FileBeat watches a set of directories and reads a bunch of files I wanted to get these document types correct.
First thing I noticed was that FileBeat sends these document types in a field called ‘doc’ while ElasticSearch uses ‘_doc’ to identify a document type. This was somewhat confusing, but it was something I really wanted to find out because my Symfony bundles are highly reliant on these document types. After some searching I came to the following conclusions to tldr; it for you:

With this given, document types matter a lot less. Yes, it is still required to index your documents and setup mappings and a lot of external software still heavily relies on them. But we need to start preparing for a world where documents are just…. documents.

This made me realise that my document type issue wasn’t as big and I’m now ignoring the field. The challenges that lie ahead are of course making external software (like the ONGR ElasticSearch Bundle) work with this new way of thinking. But that’ll be something for a new blog post…