Friday, March 3, 2023

ChatGPT and Intelligent Document Processing!!

 ChatGPT and Intelligent Document Processing!!

Question: How chatgpt can helpful in IDP?
Answer: As an AI language model, ChatGPT can be helpful in intelligence document processing in several ways:
  • Text summarization: ChatGPT can summarize long intelligence documents into shorter and more manageable summaries. This can save analysts time and effort in identifying key information in a document.
  • Entity recognition: ChatGPT can identify and extract entities such as people, organizations, and locations mentioned in a document. This can help analysts quickly identify important actors or entities in a document.
  • Sentiment analysis: ChatGPT can analyze the sentiment of a document to identify whether the tone is positive, negative, or neutral. This can help analysts quickly identify the overall tone and sentiment of a document.
  • Topic modeling: ChatGPT can identify the main topics discussed in a document or set of documents. This can help analysts quickly identify key themes or topics in a large corpus of intelligence documents.
  • Question answering: ChatGPT can answer specific questions related to a document or set of documents. This can help analysts quickly find specific information or answers to their queries without having to read through the entire document.
Overall, ChatGPT's natural language processing capabilities can help intelligence analysts process and analyze large amounts of information more efficiently and effectively.

Question: How chatgpt helps in Entity recognition in document??
Answer:
ChatGPT can help in entity recognition in documents using natural language processing techniques such as Named Entity Recognition (NER). NER is a subtask of information extraction that involves identifying and classifying named entities in text into predefined categories such as people, organizations, locations, and others.

ChatGPT can use pre-trained models that have been trained on large corpora of text data to recognize entities in documents. The model processes the input text and identifies words or phrases that match a given entity type.
For example, if a document discusses a company such as Google, ChatGPT's NER model can identify the word "Google" as an organization entity. Similarly, if a document mentions a city such as New York, ChatGPT's NER model can identify "New York" as a location entity.

ChatGPT can also be fine-tuned on specific domains or topics to improve its entity recognition performance. For instance, if you are processing intelligence documents related to a particular topic such as cybersecurity or terrorism, you can fine-tune the ChatGPT model on that topic to improve its entity recognition accuracy.

Overall, ChatGPT's entity recognition capabilities can help intelligence analysts quickly identify and extract important entities from large volumes of documents, enabling them to gain insights and make better-informed decisions

Question: How ChatGPT can fine-tuned on specific domains or topics to improve its entity recognition performance

Answer:
ChatGPT can be fine-tuned on specific domains or topics to improve its entity recognition performance. Fine-tuning involves training the model on a specific dataset related to the target domain or topic to improve its accuracy in recognizing entities related to that domain.
Here are the steps involved in fine-tuning ChatGPT for entity recognition:

  1. Collect a dataset: The first step in fine-tuning ChatGPT for entity recognition is to collect a dataset related to the target domain or topic. The dataset should include annotated examples of entities in text along with their corresponding entity types.
  2. Preprocess the dataset: The dataset should be preprocessed to ensure that it is in the correct format for training the model. This may involve converting the data to a format that can be ingested by the model, tokenizing the text, and splitting the data into training and validation sets.
  3. Fine-tune the model: The next step is to fine-tune the ChatGPT model on the target domain or topic dataset. During training, the model adjusts its weights and learns to recognize entities related to the target domain more accurately.
  4. Evaluate the model: After fine-tuning, the model should be evaluated on a separate test dataset to measure its performance. The evaluation metrics may include precision, recall, and F1-score, which measure the accuracy of the model in recognizing entities.
  5. Deploy the model: Once the model has been fine-tuned and evaluated, it can be deployed to recognize entities in new documents related to the target domain.

Overall, fine-tuning ChatGPT on specific domains or topics can improve its entity recognition performance and enable it to recognize entities related to the target domain more accurately, making it a more effective tool for intelligence document processing.

Courtesy: ChatGPT



Tuesday, February 28, 2023

BERT and Word embedding approach.

Machines understand language through language representations. These language representations are in the form of vectors of real numbers. Proper language representation is necessary for a better understanding of the language by the machine. Language representations are of two types: (i) Context-free language representation such as Glove and Word2vec where embeddings for each token in the vocabulary are constant and it doesn’t depend on the context of the word. (ii) Contextual language representation such as Elmo and OpenAI-GPT where token representation depends on the context of the word where it is used.
The problem with context-free language representation is that the same word with completely different meanings will have the same representation such as the representation of ‘bat’ will be the same for cricket bat or animal. With a contextual model, we get the representation of each token based on the sentence in which that token is used resulting in a better understanding of the language by the machine. The BERT model helps in generating the contextual representation of each token. It is even able to get the context of whole sentences, sentence pairs, or paragraphs.
BERT basically uses the concept of pre-training the model on a very large dataset in an unsupervised manner for language modeling. A pre-trained model on a very large dataset has the capability to better understand the context of the input sentence. After pre-training, the model can be fine-tuned on the task-specific supervised dataset to achieve good results. Two types of strategies can be applied namely features-based and fine-tuning. Elmo uses the concept of the feature-based model where model architectures are task-specific. It uses different models for different tasks and uses a pre-trained model for language representations. BERT uses the concept of fine-tuning and the final model for any task is almost the same as BERT. It uses deep bidirectional layers of transformers encoders for language understanding and hence got the name Bidirectional Encoder Representation from Transformer. The overall idea for the application of BERT can be depicted from the picture shown below



Model Overview

BERT’s model architecture is based on Transformers. It uses multilayer bidirectional transformer encoders for language representations. Based on the depth of the model architecture, two types of BERT models are introduced namely BERTBase and BERTLarge. The BERTBase model uses 12 layers of transformers block with a hidden size of 768 and number of self-attention heads as 12 and has around 110M trainable parameters. On the other hand, BERTLarge uses 24 layers of transformers block with a hidden size of 1024 and number of self-attention heads as 16 and has around 340M trainable parameters. BERT uses the same model architecture for all the tasks be it NLI, classification, or Question-Answering with minimal change such as adding an output layer for classification.


This article was published as a part of the Data Science Blogathon

Introduction

Machines understand language through language representations. These language representations are in the form of vectors of real numbers. Proper language representation is necessary for a better understanding of the language by the machine. Language representations are of two types: (i) Context-free language representation such as Glove and Word2vec where embeddings for each token in the vocabulary are constant and it doesn’t depend on the context of the word. (ii) Contextual language representation such as Elmo and OpenAI-GPT where token representation depends on the context of the word where it is used.

The problem with context-free language representation is that the same word with completely different meanings will have the same representation such as the representation of ‘bat’ will be the same for cricket bat or animal. With a contextual model, we get the representation of each token based on the sentence in which that token is used resulting in a better understanding of the language by the machine. The BERT model helps in generating the contextual representation of each token. It is even able to get the context of whole sentences, sentence pairs, or paragraphs.

BERT basically uses the concept of pre-training the model on a very large dataset in an unsupervised manner for language modeling. A pre-trained model on a very large dataset has the capability to better understand the context of the input sentence. After pre-training, the model can be fine-tuned on the task-specific supervised dataset to achieve good results. Two types of strategies can be applied namely features-based and fine-tuning. Elmo uses the concept of the feature-based model where model architectures are task-specific. It uses different models for different tasks and uses a pre-trained model for language representations. BERT uses the concept of fine-tuning and the final model for any task is almost the same as BERT. It uses deep bidirectional layers of transformers encoders for language understanding and hence got the name Bidirectional Encoder Representation from Transformer. The overall idea for the application of BERT can be depicted from the picture shown below.

bert

Image source: Author

Model Overview

BERT’s model architecture is based on Transformers. It uses multilayer bidirectional transformer encoders for language representations. Based on the depth of the model architecture, two types of BERT models are introduced namely BERTBase and BERTLarge. The BERTBase model uses 12 layers of transformers block with a hidden size of 768 and number of self-attention heads as 12 and has around 110M trainable parameters. On the other hand, BERTLarge uses 24 layers of transformers block with a hidden size of 1024 and number of self-attention heads as 16 and has around 340M trainable parameters. BERT uses the same model architecture for all the tasks be it NLI, classification, or Question-Answering with minimal change such as adding an output layer for classification.

The whole input to the BERT has to be given a single sequence. BERT uses special tokens [CLS] and [SEP] to understand input properly. [SEP] token has to be inserted at the end of a single input. When a task requires more than one input such as NLI and Q-A tasks, [SEP] token helps the model to understand the end of one input and the start of another input in the same sequence input. [CLS] is a special classification token and the last hidden state of BERT corresponding to this token (h[CLS]) is used for classification tasks. BERT uses Wordpiece embeddings input for tokens. Along with token embeddings, BERT uses positional embeddings and segment embeddings for each token. Positional embeddings contain information about the position of tokens in sequence. Segment embeddings help when model input has sentence pairs. Tokens of the first sentence will have a pre-defined embedding of 0 whereas tokens of the second sentence will have a pre-defined embedding of 1 as segment embeddings




Final Embeddings used by model architecture are the sum of token embedding, positional embedding as well as segment embedding. The final embeddings are then fed into the deep bidirectional layers to get output. The output of the BERT is the hidden state vector of pre-defined hidden size corresponding to each token in the input sequence. These hidden states from the last layer of the BERT are then used for various NLP tasks.

Pre-training and Fine-tuning

BERT was pre-trained on unsupervised Wikipedia and Bookcorpus datasets using language modeling. Two tasks namely Masked Language Model (MLM) and Next Sentence Prediction (NSP) were performed. During MLM, 15% of the tokens from the sequence were masked and then correct tokens were predicted at the final hidden state. To capture the relationship between sentence pairs given as input, NSP is used. For NSP, 50% of the data is labeled as isNext where sentence B of the input sequence is just the next sentence of sentence A from the dataset corpus. Another 50% of data is labeled as notNext where sentence B is not next to sentence A but any random sentence from the corpus dataset. Output hidden state corresponding to [CLS] token is used to predict the correct label and compute loss. After pre-training, BERT can be fine-tuned on the specific task-based dataset.





How to use BERT

For the implementation of BERT for any task on our dataset, pre-trained weights are available and we can easily use those pre-trained weights to fine-tune the model on our own dataset. The pre-trained weights for BERT are available in the transformers library and we can use that by the following code.

from transformers import BertModel
bert = BertModel.from_pretrained('bert-base-uncased')

Here, “bert” contains the pre-trained model weights for BERTBase. We also need to use the same tokenizer and tokens index mapping using which model has been pre-trained. We can get the tokenizer using the code given below.

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokens = tokenizer.tokenize("What's going on?")  

Output:  [‘what’,  “‘”,  ‘s’,  ‘going’,  ‘on’,  ‘?’]

Let’s try to fine-tune the pre-trained bert model for the sentiment classification task. The model can be designed just by adding a linear layer at the output hidden state of the [CLS] token.

import torch.nn as nn

class BERTSentiment(nn.Module):
    def __init__(self,
                 bert,
                 output_dim):
        super().__init__()

        self.bert = bert
        embedding_dim = bert.config.to_dict()['hidden_size']
        self.out = nn.Linear(embedding_dim, output_dim)

    def forward(self, text):
        #text = [batch size, sent len]
        embedded = self.bert(text)[1]
        #embedded = [batch size, emb dim]
        output = self.out(embedded)
        #output = [batch size, out dim]
return output
OUTPUT_DIM = 2

 

model = BERTSentiment(bert,
OUTPUT_DIM).to(device)

We can then easily train the model using the above model by defining the loss function and optimizer.

optimizer = AdamW(model.parameters(),lr=2e-5,eps=1e-6,correct_bias=False)
criterion = nn.CrossEntropyLoss().to(device)
max_grad_norm = 1

 

def train(model, iterator, optimizer, criterion, scheduler):
    epoch_loss = 0
    epoch_acc = 0

    model.train()
    for batch in iterator:
        optimizer.zero_grad() # clear gradients first
        torch.cuda.empty_cache() # releases all unoccupied cached memory 
        text = batch.text
        label = batch.label
        predictions = model(text)
        loss = criterion(predictions, label)
        acc = categorical_accuracy(predictions, label)
        #torch.nn.utils.clip_grad_norm_(optimizer, max_grad_norm)
        loss.backward()
        optimizer.step()
        scheduler.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)
def evaluate(model, iterator, criterion):
    epoch_loss = 0
    epoch_acc = 0

    model.eval()

    with torch.no_grad():
        for batch in iterator:
            text = batch.text
            predictions = model(text)
            loss = criterion(predictions, labels)
            acc = categorical_accuracy(predictions, labels)
            epoch_loss += loss.item()
            epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

We can then use train() and evaluate() function to train the model and to test.

import math
N_EPOCHS = 3
train_data_len = 25000
warmup_percent = 0.2
total_steps = math.ceil(N_EPOCHS*train_data_len*1./BATCH_SIZE)
warmup_steps = int(total_steps*warmup_percent)
scheduler = get_scheduler(optimizer, warmup_steps)

for epoch in range(N_EPOCHS):
    start_time = time.time()
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion, scheduler)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    end_time = time.time()
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f't Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
output

Courtesy: www.analyticsvidhya.com

Sunday, August 7, 2022

Ada Boost -Savior

 When nothing works, Boosting does. Nowadays many people use either XGBoost or LightGBM or CatBoost to win competitions at Kaggle or Hackathons. AdaBoost is the first stepping stone in the world of Boosting.

AdaBoost is one of the first boosting algorithms to be adapted in solving practices. Adaboost helps you combine multiple “weak classifiers” into a single “strong classifier”. Here are some (fun) facts about Adaboost!

→ The weak learners in AdaBoost are decision trees with a single split, called decision stumps.

→ AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well.

→ AdaBoost algorithms can be used for both classification and regression problem.






Sunday, May 24, 2020

EIA - AI/ML DRIVEN CHATBOT- INCREASE YOUR PPC CONVERSION RATE

People may take different paths to reach the final step of purchasing. Most of them may follow the typical conversion funnel by starting at brand awareness and ending with a purchase.
However, there can be many different ways through which they pass through the sales funnel. If you can figure out those paths, you may be able to optimize them and improve your PPC conversion rate.

 TRY !!  EIA 



ARTIFICIAL INTELLIGENCE DRIVEN CHATBOT:
Google Analytics can show you the conversion flow, behavior, and acquisition. By going through this report, you can figure out the path which most of your shoppers are taking.
There may be different pages they are visiting before making a final purchase. Each of these pages should be optimized well so that the path is smooth for your customers.
It is also essential to take mobile devices and desktops into account to see which device is the most effective at generating sales. You can find this information through cross-device reports on Google AdWords.
The reports will also show you which channels are influencing their decisions to buy from you. Based on the results, you can accordingly improve the ones that are lagging behind and increase your PPC conversion rate.


Sunday, March 29, 2020

Top AI-powered chatbots

Whether it’s on Facebook Messenger, their website, or even text messaging, more and more brands are leveraging chatbots to service their customers, market their brand, and even sell their products.
But even though most chatbots can handle moderately sophisticated conversations, like welcome conversations and product discovery interactions, the if/then logic that powers their conversational capabilities can be limiting. For instance, if a customer asks a unique yet pressing question that you didn’t account for when designing your chatbot’s logic, there’s no way it can answer their question, which hangs your customer out to dry and ultimately leaves them dissatisfied with your customer service.

Unlock tips, systems & recommended resources to stay ahead of the tech curve.


Fortunately, the next advancement in chatbot technology that can solve this problem is gaining steam -- AI-powered chatbots. By leveraging machine learning and natural language processing, AI-powered chatbots can understand the intent behind your customers’ requests, account for each customer’s entire conversation history when it interacts with them, and respond to their questions in a natural, human way.
If you’re currently using a standard chatbot, but want to upgrade to an AI-powered one, we’ve put together a list of the best AI chatbots for 2019. Read on to find the right one for you.

1. Watson Assistant

AI Chatbot - Watson Assistant
Image Credit: IBM
Developed by one of the leaders in the AI space, IBM, Watson Assistant is one of the most advanced AI-powered chatbots on the market. Pre-trained with content from your specific industry, Watson Assistant can understand your historical chat or call logs, search for an answer in your knowledge base, ask customers for more clarity, direct them to human representatives, and even give you training recommendations to hone its conversational abilities.
Watson Assistant can run on your website, messaging channels, customer service tools, and mobile app. The chatbot also comes with a visual dialog editor, so you don’t need any coding experience to develop it.

2. Bold360

AI Chatbot - Bold360
Image Credit: Bold360
Trusted by customers like Intuit, Edible Arrangements, and Vodafone, Bold360 patented its own natural language processing technology to help brands build chatbots that can understand your customers’ intent without the need of keyword matching and learn how to deliver the most accurate answers to them.
Bold360’s conversational AI can interpret complex language, remember the context of an entire conversation, and reply to customers with natural responses. Customers can even buy your products through the chatbot. You can also give your chatbot its own personality and run it on most messaging channels.

3. Rulai

AI Chatbot - Rulai
Image Credit: Rulai
Armed with deep-learning based natural language understanding and adaptive multi-taking capabilities, Ruali, an AI-powered chatbot for enterprise brands, can understand the context of a conversation, predict user behavior, grasp customer preferences, take actions, switch to different tasks, and ask customers for more clarification.
Rulai also integrates with most messaging channels, customer service software, enterprise business software, and cloud storage platforms. You can either build a Ruali chatbot from scratch with its drag-and-drop design console and let its AI adapt to your customers or you can implement a pre-trained chatbot that has been fed data from your specific industry.

4. LivePerson

AI Chatbot - LivePerson
Image Credit: LivePerson
By collecting over 20 years of messaging transcript data and feeding it to their AI-powered chatbot, LivePerson can automate almost every industry’s messaging and integrate with most messaging channels like your website, mobile app, Apple Business Chat, text messaging, Google Rich Business messaging, Line, Facebook Messenger, WhatsApp, and Google AdLingo.
LivePerson’s BotStudio also lets you build chatbots from scratch, without any coding knowledge, and its analytics dashboard can track metrics like real-time sentiment, bot containment rate, bot conversation time, total conversation time, average order value, and bot contained sales, allowing you to grasp the impact your chatbot has had on your business’ bottom line.

5. Inbenta

AI Chatbot - Inbenta
Image Credit: Inbenta
Designed specifically for enterprise brands, Inbenta’s chatbot leverages machine learning and its own natural language processing engine to detect the context of each customer conversation and accurately answer their questions. Inbenta also offers a dialog manager, which allows you to craft custom conversation flows and paths.
Additionally, when Inbenta’s chatbot realizes that one of your customers needs to talk to a human, it’ll escalate the conversation to the appropriate support agent. To make your chatbot seem more human, you create a custom avatar for it, too.

6. Ada

AI Chatbot - Ada
Image Credit: Ada
Trusted by customers like Medium, Shopify, and MailChimp, Ada is an AI-powered chatbot that features a drag-and-drop builder that you can use to train it, add GIFs to certain messages, and store customer data.
Ada can also integrate with most messaging channels and customer service software, send personalized content to your customers, ask for customer feedback, and report on your bots’ time, effort, and cost savings. According to their website, Ada has saved their customers over $100 million in savings and 1 billion minutes of customer service effort.

7. Vergic

AI Chatbot - Vergic
Image Credit: Vergic
Vergic offers an AI-powered chatbot that can serve as your businesses’ first line of customer support, handle transactional chats, and transfer more complicated problems to your actual customer service agents. It’s like a hybrid chatbot that can boost your employees’ productivity.
By leveraging natural language processing and natural language understanding, Vergic can also perform sentiment analysis, share documents, highlight pages, manage conversational workflows, and report on chatbot analytics.

ChatGPT and Intelligent Document Processing!!

 ChatGPT and Intelligent Document Processing!! Question: How chatgpt can helpful in IDP? Answer: As an AI language model, ChatGPT can be he...