An example of 425-bit universal Turing machine in pixels [1]

Information, according to Shannon, is an ensemble notion. The entropy of a random variable X with outcomes in an ensemble S is the quantity H(X) = log d(S). This is a measure of the uncertainty in choice before we have selected a particular value for X, and of the information produced from the set if we assign a specific value to X. By choosing a particular message a from S, we remove the entropy from X by the assignment X = a and produce or transmit information I =log d(S) by our selection of a. …

Pi is the perfect example of a simple object that Shannon’s information theory fails to model

In this article, we used the concepts of “complexity”, “entropy”, and “information” interchangeably, in the next sections we make the distinctions more clear. This article mostly comes from this book

The notion of Kolmogorov complexity has its roots in probability theory, information theory, and philosophical notions of randomness, and came to fruition using the recent development of the theory of algorithms. The idea is intimately related to problems in both probability theory and information theory. …


  • RoBERTa, is a retraining of BERT with improved training methodology, 1000% more data, and compute power.
  • Removes NSP (Next sentence prediction) loss, introduces dynamic masking so that the masked token changes during the training epochs. Larger batch-training sizes were also found to be more useful in the training procedure. Outperforms BERT and XLNet


  • learns a distilled (approximate) version of BERT, retaining 97% performance but using only half the number of parameters. Specifically, it does not have token-type embeddings, pooler, and retains only half of the layers from Google’s BERT.


Three main differences with BERT:

  • It has a different embedding size…

Controlling the large-language models generation capability is an important task that is needed for real-world usage.

Image by Author

Full Paper


Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open text generation. However, controlling the generated text’s properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data.

We introduce Topical Language Generation (TLG) by combining a pre-trained LM with topic modeling information. We cast the problem using Bayesian probability formulation with topic probabilities as a prior, LM probabilities as the likelihood, and topical language generation probability as the posterior. In learning the model, we derive the topic probability…


Defeasible arguments are ones that can be acceptable at the moment even though in the future they may be open to defeat. New evidence may come in later that defeats the argument.

The canonical example of a defeasible argument, used so often in AI, is the Tweety argument:


Birds fly.

Tweety is a bird.

Therefore Tweety flies.

The Tweety argument may be rationally acceptable assuming that we have no information about Tweety except that he is a bird. But suppose new information comes in telling us that Tweety is a penguin…

“Reasoning” is one of the most important elements in achieving real AI. One of the main differences between the three kinds of reasoning is as below:

1- In deduction, it is impossible that the premises are true and the conclusion would be false. The relationship between premises and the conclusions is of “necessity”. All humans are mortal then all men are mortal.

2- In induction, it is improbable that the premises are true and the conclusion would be false. The relationship between premises and the conclusions is of “probability”. …

The problem of common sense and background knowledge is probably one of the most challenging problems in AI. Most of our reasoning about the world is based on unspoken knowledge in partially observable environments. We draw conclusions based on our shared understanding of the world and reason to the best explanations. Here we introduce some basic datasets and show how other datasets are created from them.

At the core of most of these problems the “story comprehension” problem lies: How we understand stories and how can we answer questions about them by reading between the lines and filling out gaps…

You may have been confused by some functions on torch that does the same thing but with different names. For example:

reshape(), view(), permute(), transpose()

Are they really doing differently? No! but in order to understand it first we need to know a little bit how the tensors are implemented in pytorch.

Tensors are abstract or logical constructs just like arrays that can’t be implemented the way they conceived. The obvious reason is that memory cells are sequential, so we need to find a way to save them in memory. …

Sometimes things happen in our surroundings that we ignore easily but for a real scientist, they are seeds to further investigations. A famous example as we all know is Newton’s apple. Here I want to trigger your curiosity by giving some simple examples: have you ever noticed that a big group of people can clap in sync without any prior training? How is that even possible?

Sometimes these observations can go to extreme forms that not only make a scientist question but also ordinary people. We all know fireflies flashing in nights but in some places on earth, they gather…

The attention mechanism is probably the most important idea in modern NLP. Even though it’s almost omnipresent in all-new architectures it is still poorly understood. It’s not because attention is a complex concept but because we, as NLP researchers really don’t know where it comes from and why it is so effective. In this article, I try to find a meaningful historical background for attention and how it is evolved to the new modern form we all know today. I believe the best way to understand a concept is the main motivations behind how it’s made and that only possible…

rohola zandie

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store