Three main differences with BERT:
Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open text generation. However, controlling the generated text’s properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data.
We introduce Topical Language Generation (TLG) by combining a pre-trained LM with topic modeling information. We cast the problem using Bayesian probability formulation with topic probabilities as a prior, LM probabilities as the likelihood, and topical language generation probability as the posterior. In learning the model, we derive the topic probability…
Keywords: DEFEASIBLE, PLAUSIBLE, AND PRESUMPTIVE REASONING
Defeasible arguments are ones that can be acceptable at the moment even though in the future they may be open to defeat. New evidence may come in later that defeats the argument.
The canonical example of a defeasible argument, used so often in AI, is the Tweety argument:
THE TWEETY ARGUMENT
Tweety is a bird.
Therefore Tweety flies.
The Tweety argument may be rationally acceptable assuming that we have no information about Tweety except that he is a bird. But suppose new information comes in telling us that Tweety is a penguin…
“Reasoning” is one of the most important elements in achieving real AI. One of the main differences between the three kinds of reasoning is as below:
1- In deduction, it is impossible that the premises are true and the conclusion would be false. The relationship between premises and the conclusions is of “necessity”. All humans are mortal then all men are mortal.
2- In induction, it is improbable that the premises are true and the conclusion would be false. The relationship between premises and the conclusions is of “probability”. …
The problem of common sense and background knowledge is probably one of the most challenging problems in AI. Most of our reasoning about the world is based on unspoken knowledge in partially observable environments. We draw conclusions based on our shared understanding of the world and reason to the best explanations. Here we introduce some basic datasets and show how other datasets are created from them.
At the core of most of these problems the “story comprehension” problem lies: How we understand stories and how can we answer questions about them by reading between the lines and filling out gaps…
You may have been confused by some functions on torch that does the same thing but with different names. For example:
reshape(), view(), permute(), transpose()
Are they really doing differently? No! but in order to understand it first we need to know a little bit how the tensors are implemented in pytorch.
Tensors are abstract or logical constructs just like arrays that can’t be implemented the way they conceived. The obvious reason is that memory cells are sequential, so we need to find a way to save them in memory. …
Sometimes things happen in our surroundings that we ignore easily but for a real scientist, they are seeds to further investigations. A famous example as we all know is Newton’s apple. Here I want to trigger your curiosity by giving some simple examples: have you ever noticed that a big group of people can clap in sync without any prior training? How is that even possible?
Sometimes these observations can go to extreme forms that not only make a scientist question but also ordinary people. We all know fireflies flashing in nights but in some places on earth, they gather…
The attention mechanism is probably the most important idea in modern NLP. Even though it’s almost omnipresent in all-new architectures it is still poorly understood. It’s not because attention is a complex concept but because we, as NLP researchers really don’t know where it comes from and why it is so effective. In this article, I try to find a meaningful historical background for attention and how it is evolved to the new modern form we all know today. I believe the best way to understand a concept is the main motivations behind how it’s made and that only possible…
Understanding emotions and using them in a chatbot setting is a daunting task.
We know that human conversations involve emotion understanding and responding to those emotions accordingly. And because of this, many usual chatbots fail to satisfy a coherent and meaningful conversation with users. Just imagine you are happy and want to share with your friend that you won a math contest but instead of giving you congratulations they talk about math contest in general! …
As we saw in the previous part, the dices or coins that represent a probability distribution themselves can come from another distribution because they, just like anything else in this world, are imperfect and subject to random variances.
Beta distribution can beautifully model the uncertainty of the process of making dices and the extension of this idea to more than two faces (head and tail) seems straightforward. Dirichlet distribution is the extension that we need. Following the same line of reasoning as we had in the previous part we can write down the Dirichlet distribution as follows: