For us to answer this question. First we need to look at why capsule networks outperform convolution neural networks by as much as 45% in recognizing images that have been rotated, translated or are under a different pose. We can find Geof Hintons paper on capsule networks here https://arxiv.org/pdf/1710.09829v1.pdfIn a CNN architecture, a convolution layer is usual followed by a Max Pooling layer. This is so that the lower levels can detect low level features like edges while the high level layers can detect abstraction like eyes. However max pooling in CNN’s loose a lot of important information regarding the location and spatial relationship between certain features. On the other hand this is where capsule networks excel, the way they represent certain features is locally invariant. This is why capsule networks can recognize images under different lighting conditions and deformations. They are likely to excel at applications such as video and object tracking but not necessarily NLP.The current approach in nlp maps words and phrases to vectors. From there we exploit the concept of vectors and distances between them (cosine, euclidean e.t.c. ) to perform operations such as: finding the similarity between words and even documents, machine translation and natural language understanding (NLU). Capsule networks are unlikely to succeed in NLP. This is because algorithms which aim to find the hierarchical structure of natural languages or approaches that focus on grammer have met little success. Research by Stanford University aiming at finding the hierachical structure of natural languages can be found here https://nlp.stanford.edu/projects/project-induction.shtmlAlthough conclusive research regarding other applications of capsule networks have not yet been conducted. They are likely to excel at applications such as video intelligence and object tracking but not neccessarily NLP.