Hello there! Have you ever considered how Elasticsearch knows where to store your documents? And how does it know where to look for them, as well as whether to retrieve, update, or delete them? It’s an exciting process that all comes down to the concept of routing.
你好!你有没有考虑过Elasticsearch如何知道你的文档存储在哪里?它如何知道在哪里查找它们,以及是否要检索、更新或删除它们?这是一个令人兴奋的过程,所有这些都归结为路由的概念。
Introduction to Routing 路由简介
Routing is the process of determining which shard a document belongs to in order to retrieve it or store it where it belongs. When Elasticsearch indexes a document, it does various calculations to determine which shard to put it on. This is accomplished by using the following formula:
路由是确定文档属于哪个分片的过程,以便检索它或将其存储在它所属的地方。当Elasticsearch索引文档时,它会进行各种计算来确定将其放在哪个分片上。这通过使用以下公式来实现:
shard_num = hash(_routing) % num_primary_shards
shard_num = hash(_routing)% num_primary_shards
By default, ‘_routing’ equals the ID of the document. This indicates that Elasticsearch looks up the doc’s ID to determine which shard it belongs to. The same is true when we update or delete documents.
默认情况下,'_routing'等于文档的ID。这表明Elasticsearch查找文档的ID以确定它属于哪个分片。当我们更新或删除文档时也是如此。
So, when we ask Elasticsearch to retrieve a document by its ID, Elasticsearch uses the ID to locate the shard where the document is stored. If the doc exists, it is almost certainly on the shard that corresponds to the routing formula.
因此,当我们要求Elasticsearch通过ID检索文档时,Elasticsearch使用ID来定位存储文档的分片。如果文档存在,则几乎可以肯定它位于与路由公式对应的分片上。
The same is true when we update or delete certain documents. When it comes to looking for documents based on characteristics other than their IDs, though, things function a little differently. But we’ll get into that eventually.
当我们更新或删除某些文档时也是如此。但是,当基于特征而不是ID来查找文档时,情况就有点不同了。但我们最终会谈到的。
The Default Routing Strategy 默认路由策略
The wonderful thing about routing is that it is completely invisible to Elasticsearch users. Elasticsearch makes our life much easier by offering a default routing strategy, which saves us from having to deal with all of this routing stuff ourselves.
路由的奇妙之处在于它对Elasticsearch用户完全不可见。Elasticsearch通过提供默认的路由策略让我们的生活变得更容易,这让我们不必自己处理所有这些路由问题。
You might be wondering if you can change the default routing strategy. The answer is yes; you can modify it if you like. However, this is a complicated topic that we will discuss later.
您可能想知道是否可以更改默认路由策略。答案是肯定的;如果你喜欢的话可以修改它。然而,这是一个复杂的话题,我们将在后面讨论。
Besides making sure that documents are assigned to a shard and that we can get them by ID, the default routing strategy also ensures that documents are distributed evenly over all shards in an index. This helps to ensure that no single shard has significantly more docs than another.
除了确保文档被分配到一个分片并且我们可以通过ID获取它们之外,默认的路由策略还确保文档均匀地分布在索引中的所有分片上。这有助于确保没有一个分片比另一个分片拥有更多的文档。
If we ever decide to modify how documents are routed, we’d have to either ensure that they’re still distributed evenly or accept that one shard may wind up with many more docs than another.
如果我们决定修改文档的路由方式,我们必须确保它们仍然均匀分布,或者接受一个分片可能比另一个分片拥有更多的文档。
Elasticsearch Meta Fields and Custom Routing
Elasticsearch 元字段和自定义路由
Elasticsearch keeps some additional information with the documents it indexes. Elasticsearch includes meta fields like “_id” and “_source” in addition to the data we give it (like the JSON we use to add a doc). The “_id” field has the doc’s unique identifier, while the “_source” field contains the JSON payload that was used to index it.
Elasticsearch在它索引的文档中保留了一些额外的信息。Elasticsearch除了我们给予的数据外,还包括梅塔字段“_id”和“_source”(就像我们用来添加文档的JSON)。“_id”字段具有文档的唯一标识符,而“_source”字段包含用于对其进行索引的JSON有效负载。
There is also a meta field named “_routing.” This option is used to customize the routing plan for our documents.
还有一个名为“_routing”的梅塔字段。此选项用于自定义文档的发送路线计划。
Elasticsearch utilizes a hash-based routing method by default to determine which shards a document should be placed in. However, if we provide a custom routing value while indexing our documentation, Elasticsearch will utilize that value to identify the appropriate shard number.
Elasticsearch默认使用基于哈希的路由方法来确定文档应该放置在哪个分片中。但是,如果我们在索引文档时提供自定义路由值,Elasticsearch将利用该值来识别适当的分片号。
It’s critical to understand that the number of shards in an index is fixed and cannot be modified after the index is created. This is due to the fact that Elasticsearch’s routing formula is dependent on the number of shards in the index. Specifically, the formula it uses to determine the shard number is shard_num = hash(_routing) % num_primary_shards.
重要的是要理解索引中分片的数量是固定的,并且在创建索引后不能修改。这是因为Elasticsearch的路由公式取决于索引中分片的数量。具体来说,它用来确定分片数的公式是shard_num = hash(_routing)% num_primary_shards。
The routing formula will provide different results if the number of shards in an index is changed. This isn’t a huge deal for newly indexed documents, but it is for old ones. Assume we have an index with two shards and we index a document. The document is stored in shard number two based on the routing formula. However, if the number of shards in the index is later increased to four, the doc will need to be transferred to a separate shard. That can take some time, my friend. So, when creating an index in Elasticsearch, we must carefully consider the number of shards we want.
如果索引中的分片数量发生变化,路由公式将提供不同的结果。对于新索引的文档来说,这不是什么大问题,但对于旧文档来说却是如此。假设我们有一个包含两个分片的索引,我们索引一个文档。根据路由公式,文档存储在2号分片中。但是,如果索引中的分片数量后来增加到4个,则需要将文档转移到单独的分片。这可能需要一些时间,我的朋友。因此,在Elasticsearch中创建索引时,我们必须仔细考虑我们想要的分片数量。
Managing Elasticsearch Shards: Considerations and Best Practices
管理Elasticsearch分片:注意事项和最佳实践
Okay, so let’s pretend for a sec that we add more shards to the index, say approximately five. This way, we can add more documents without causing any obvious problems. However, when we try to find certain documents by their IDs, Elasticsearch sometimes fails to find them. Basically, the ID is run through the routing algorithm again, and because one of the factors has changed, the outcome may differ.
好的,让我们假设我们向索引中添加了更多的碎片,比如大约5个。这样,我们就可以添加更多的文档,而不会引起任何明显的问题。然而,当我们试图通过ID查找某些文档时,Elasticsearch有时会找不到它们。基本上,ID再次通过路由算法运行,并且由于其中一个因素已经改变,结果可能不同。
This means that Elasticsearch looks for the doc in the wrong shard and returns nothing, despite the fact that the doc is in the index. This problem is known to generate major issues, especially when dealing with time-sensitive information. One solution is to develop a better routing mechanism that can deal with shard number changes more effectively.
这意味着Elasticsearch在错误的分片中查找文档,并且没有返回任何内容,尽管文档在索引中。众所周知,这个问题会产生重大问题,特别是在处理时间敏感信息时。一种解决方案是开发一种更好的路由机制,可以更有效地处理分片号的变化。
It’s also worth noting that other factors, such as how documents are spread across different shards, can play a role in this issue. As a result, it’s critical to carefully evaluate the indexing strategy and ensure that it’s optimized for maximum efficiency and accuracy.
同样值得注意的是,其他因素,例如文档如何分布在不同的分片中,也会在这个问题中发挥作用。因此,仔细评估索引策略并确保其优化以实现最大效率和准确性至关重要。
When considering adding more shards to an index, keep this very crucial principle in mind. You must ensure that the documents in the index are not unevenly distributed, or you may experience performance issues. So, if you can balance things out over all the shards, it will be much better for search times and other things.
在考虑向索引添加更多分片时,请记住这一非常重要的原则。必须确保索引中的文档分布不均匀,否则可能会遇到性能问题。所以,如果你能在所有的碎片上平衡,搜索时间和其他事情都会好得多。
However, if you want to change the number of shards, you must create a new index and re-index all of the documents. That sounds awful, but there are certain APIs that can help. If you’re interested, check out the shrink and split ones. Using those can help you in creating a new index with a different number of shards and re-indexing all the docs without severe hassles.
但是,如果您想更改分片的数量,则必须创建一个新索引并重新索引所有文档。这听起来很糟糕,但有一些API可以提供帮助。如果你感兴趣的话,去看看心理医生和分裂者。使用这些可以帮助您创建一个具有不同数量分片的新索引,并重新索引所有文档,而不会遇到严重的麻烦。
Overall, you should use caution while establishing an index and consider the number of shards required, as well as any potential future revisions. However, with proper preparation and the right tools, managing shards can be a breeze and significantly improve your search speed.
总的来说,在建立索引时应该谨慎,并考虑所需的分片数量以及任何潜在的未来修订。然而,通过适当的准备和正确的工具,管理碎片可以是轻而易举的,并显着提高搜索速度。
FROM:
Mahmoud Yasser
https://braineanear.medium.com/elasticsearch-architecture-vi-routing-f88e7fb390d7