Guide to Natural Language Processing - NLP

The Guide to Natural Language Processing

Download this Guide

What is Natural Language Processing?

Designing and Developing an NLP System

Limitations of NLP

Why NLP Is Important in the Enterprise

Keys to a Successful NLP Project

Incorporating NLP in Your Digital Solutions

The Future of Natural Language Processing

AndPlus And AI Solutions

In the mid-1960s, the Artificial Intelligence Lab at MIT developed an application called ELIZA, which was intended for conversational discourse with humans using simple rules and scripts. In one of its most famous experiments, ELIZA simulated a psychotherapist. Although the system had no actual understanding of the thoughts or feelings being conveyed, many users believed that ELIZA was guided by true machine intelligence.

ELIZA was one of the first examples of Natural Language Processing (NLP) systems. Regarded as an academic curiosity for many years, NLP is finally making its way into practical applications.

What is NLP? It’s a technology that combines linguistics, computer science, artificial intelligence, and statistical-inferencing to enable computers to process unstructured, human-language text (written or spoken) to derive meanings, execute commands, and perform other tasks.

Today, NLP technology can be found in a variety of applications, including virtual assistants such as Siri and Cortana. It can be found in search engines such as Google and “smart home” devices such as Amazon’s Echo line. NLP is also making its way into interactive robots and applications that mine unstructured legal, medical, and social media text.

Let’s describe in more detail what NLP is, how it works, and why it’s becoming important to the enterprise, along with some tips for executing an NLP project of your own:

chalk image of head with gears inside

In this guide, we'll break down a normally complex topic into a few segments and identify key components to implementing Natural Language Processing in the real world.

How NLP works
Creating an NLP system
The limitations of NLP
NLP's successes in the enterprise
Keys to a successful advanced technology project
Execution of your NLP project
The future of of natural language processing and AI in general

Download the Natural Language Processing Guide

Background: Artificial Intelligence

NLP is a branch of computer science that’s been around for over half a century. But it’s made significant progress only recently, thanks to advances in artificial intelligence (AI); specifically, artificial neural networks and machine learning techniques. AI attempts to mimic the operation of the human brain in a computer. It is commonly used in pattern-recognition tasks such as identifying objects in images.

Artificial neural networks attempt to model the way human brain cells operate, albeit in a limited scope. There are many types of artificial neural networks that differ according to how the “neurons” are organized and how signals flow among them.
Each neuron takes an input and manipulates it mathematically to generate an output that is passed on to one or more other neurons. The neural network designer defines the mathematical function associated with each neuron. The collective processing of the neurons converts a set of inputs (such as the pixels in an image) to the desired output (identifying an object in the image).
Machine-learning techniques take artificial neural networks a step further, by tweaking the mathematical functions of the neurons automatically over time to improve the accuracy of its results. Machine-learning algorithms must be “trained” using large amounts of annotated data so that it can go through many cycles of processing, evaluating, and tweaking.
In the case of NLP, the input is a block of text (either written text or text converted from speech), and the output is some desired characteristic of the text, such as its purpose (search query, command, review, etc.) and meaning or tone (positive, negative, angry, biased).

It’s somewhat surprising and counterintuitive, but few NLP systems are designed to parse actual sentences word-for-word to divine their meanings the way humans do. They don’t have the rules of grammar or dictionary definitions of the words programmed into them. Instead, they rely on attributes of a block of text such as word count, the co-occurrence of words, and statistics to derive the high-level meaning of the text.

Software developer working on natural language processing code

In a machine learning system, the parameters are tweaked automatically. These systems still require large quantities of training data, but they can teach themselves much faster than a designer teach by hand.

Once an ANN system has been trained, it must be tested using input data it has not seen in training, to determine if it has learned its lesson well. If it hasn’t, it’s time for more training, or perhaps a redesign from the ground up.

AI Systems in Action

Many industries and academic pursuits have started to deploy exciting applications powered by AI:

Science: Finding planets orbiting distant suns does not involve astronomers gazing through telescopes on starry nights. Even the most powerful Earth-bound telescopes are not up to the task. The orbiting Kepler telescope, however, was designed for just that purpose, and AI systems have been combing through the data Kepler sends to locate and characterize exoplanets.
Medicine: Medical imaging is undergoing an AI-fueled renaissance. AI systems are helping doctors diagnose patients by “seeing” subtle clues in X-rays, MRIs, retinal photographs, and CT scans that indicate the possible presence of disease. This approach helps prevent unnecessary invasive surgeries and biopsies.
Retail and Entertainment: Amazon, Netflix, and other providers can recommend products, movies, and services according to your buying and viewing history. There is always something new and interesting to buy, read, view, or do, and these providers and their AI-powered recommendation systems just might know your tastes better than you do.
Finance: From ATMs and mobile apps that read handwritten checks to systems that instantly detect and stop fraudulent transactions, AI has become an important tool in financial technology (“fintech”, for short).
Art: AI systems are being used to compose music and poetry, and to create many types of original visual art pieces.
Personal assistants: Led by Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa, mobile devices and smart home systems are able to understand and execute spoken commands—something that would be almost impossible without AI.

Building an NLP system, like any software, starts with requirements. What problem are you trying to solve? Where is your input data (text) coming from, and what information do you want from it?

For example:

Are you taking written or spoken words to build a search-engine query?
Are you building a system to respond to spoken commands?
Are you analyzing statements made by medical patients to help a physician narrow possible diagnoses?
Are you scouring social media for opinions about your (or your competitors’) products or services?
Are you analyzing political blogs to predict the outcome of an election?

From the goals and requirements, the developers can choose and configure one or more of the many NLP algorithms available. In the case of spoken words as input, the developer has the additional task of converting the spoken words to text—which may itself involve an AI algorithm.

The next step is the training cycle. It takes large volumes of data to train an AI algorithm. In this case, thousands of text passages must be tagged with the desired output; that is, the conclusion you want the algorithm to reach from each one. This alone can be a time-consuming endeavor if pre-tagged example-text does not exist in the amounts needed for good algorithm training. Humans might be needed to tag an existing database of text samples.

Natural Language Processing conceptual image

Natural Language Processing conceptual image

Regardless of the source of the training data, the algorithm’s parameters are adjusted over many training iterations until it can reliably produce the desired output.

After that, the algorithm is tested using input data it has not seen in the training cycle. This tells us if it can process new input with the same reliability with which it processed the training data. If all is well, the system can be deployed to production.

As you might expect, getting a computer to process human language is tricky. There are so many languages and dialects. Even within the same language, you have many ways to convey the same idea. On top of that there are regional variations (“soda” in the U.S. South is known as “pop” in the Midwest), slang terms, and common misspellings (“loose” vs. “lose,” “principle” vs. “principal”).

And then you have language ambiguities that are extremely difficult for computers to parse. Consider the simple sentence, “We can fish.” Does this mean we have the ability to catch fish, or that we are in the process of canning fish that have already been caught? Humans can figure it out from the context. It’s harder for computers, even well-trained AI systems.

Taken together, these issues make NLP extremely challenging. As a result, most NLP applications are quite narrow in scope. As much as you can do with Amazon’s Alexa system, it understands only a handful of commands. It will be a long time before a practical HAL or C-3PO comes along, able to comprehend any statement in any human language.

Despite its complexity and challenges, NLP has an important and growing role in modern business. Today’s enterprises deal with vast quantities of data. Although we have robust tools with which to process data rows and columns, much enterprise data is unstructured text. There are immense value and insight locked in that unstructured text data, so it’s a worthwhile exercise to develop AI-based tools that can extract actionable meaning from it.

NLP is a key component of digital transformation for many businesses. For example: having humans pore through all that unstructured text is labor-intensive (read: expensive) for the company, not to mention tedious and dull for the humans. Having an algorithm do it—even an algorithm that can only identify “positive” or “negative” reviews with 80% accuracy—is more cost-effective.

With the significant advances in NLP capabilities and it’s expanding availability, businesses of all sizes have increased adoption of NLP to identify new opportunities for increased business insight, expanded customer value, and new opportunities for business profitability:

Voice Systems: When text is not enough, or not a user interface option, NLP systems can be incorporated to recognize voice commands, elevating customer interactions with your service and brand to a higher level.
Sentiment Analysis: NLP can be leveraged to do more than simply provide informed responses. NLP can also discern if tweets or customer feedback about your company are good or bad so that you can address customer concerns? Sentiment analysis uses NLP to help businesses understand what’s being said about them on the web and social media.
Chatbots: With natural language processing, chatbots can be more “intelligent” and versatile to use while delivering a user experience that simply feels more human (without the attitude) These NLP enabled chatbots can help customers get the information they need more intuitively and much faster. The result is a better customer experience and service differentiation compared to your competition.
Fighting Spam and Organizing Inboxes:
Spam detection uses natural language processing to keep unwanted email and other messages out of employee inboxes. NLP can also be used to sort messages from certain contacts into separate folders or to prioritize message handling.
Language Translation: If you want to build a translation feature into your application, you’ll need natural language processing. However, the challenge in translation goes beyond the conversion of words from one language to another. There is an increased challenge of preserving the meaning of the communication. This complex technical issue is one reason that some NLP solutions are specifically focused on and better at translation.

Advanced “Conversational” Search: Conversational speech is squarely in the capabilities of NLP, but users are human, and, like all humans, they sometimes omit words, make spelling and punctuation mistakes, use colloquial terms, or use slang or “conversational” language. Today’s NLP solutions can take these challenges into account and deliver search results that are accurate, relevant, and valuable for the customer.
Information Extraction: Natural language processing can automatically summarize long documents or extract relevant keywords for searching. The legal industry makes use of these types of NLP applications, for example, to help lawyers sort through thousands of pages of documents in legal cases to find and compare relevant information.

Voice of Customer analysis: Using NLP, companies can identify trends in customer feedback and identify and address customer concerns, increase customer loyalty, and help identify business opportunities that drive revenue.

Optimize Audience Segmentation: NLP can identify customer groups across a broad range of unstructured data, such as social media posts, and improve the effectiveness of market research and audience communications. Improved segmentation also supports the creation of targeted advertising with these segmented audiences.

These applications represent a fraction of the applications for NLP in business today and a much smaller segment of the applications that will be in use only a few years from now.

Keys to a Successful NLP Project
If you’re new to NLP and contemplating an NLP project for your business, here are some things to consider that will help your project succeed:

Start small – It’s better to apply the technology to a small, low-risk project than to try to “boil the ocean” with some sprawling, long-term initiative. (This is true when adopting any new technology, not just NLP.)
Understand the business problem and requirements – Prevent scope creep by adopting a clear, unambiguous definition of the problem to be solved and the role of NLP in that solution. Then enforce those boundaries.

Understand the limitations of NLP – NLP does not always yield 100% accurate results. This is acceptable for some applications but can be considered a failure for others. Manage the expectations for the system and design “fallback” functionality (such as, “Get a human in the loop, pronto!”) for those times when NLP falls short.

Test, test, test – Increasing the accuracy of an NLP system means extensive testing. For systems involving speech processing, test with speakers using different accents and different ways of saying the same thing. For text processing, don’t limit testing to perfectly written prose. Unless you’re analyzing passages from professional writers, you will deal with poor grammar and spelling more often than not.

A final tip: NLP (and AI in general) is still advanced technology that requires some domain expertise to implement properly. Don’t be afraid to bring in outside experts to help. The extra cost will pay for itself many times over in reduced aggravation and avoided missteps.

At AndPlus, we have extensive knowledge of AI and NLP and will be happy to help you get the most value from it.

The technical challenges of developing an NLP solution can be many. Fortunately, there are many NLP service providers with API’s that can deliver rapid integration and access to a wide range of NLP functionality including; sentiment determination, intent,

Despite the many API options available today, creating digital products with imbedded NLP functionality can be a challenging process that requires development discipline and NLP expertise in order to select and effectively utilize the appropriate technology in order to meet the specified business application, current technology environment, anticipated deployment model, and the desired business outcome (to name a few).

In short, comparing capabilities and selecting appropriate technologies can be a dauting task for the development teams encountering NLP technologies for the first time. To illustrate, we’ve assembled descriptions of several leading NLP API sources:

SYSTRAN - is a collection of APIs for: Translation, multilingual dictionary lookups, Natural Language Processing (Entity recognition, Morphological analysis, Part of Speech tagging, Language Identification…),text extraction (from documents, audio files or images) SYSTRAN Platform enables utilization and analysis of both structured and unstructured multilingual content, such as user-generated content, social media, Web content and more.

AYLIEN - Text API is a package of Natural Language Processing, information retrieval and machine learning tools that allow developers to extract meaning and insights from documents.

RxNLP – RxNLP’s Text Mining and NLP APIs provide access to some advanced text analytics functionality over the cloud including: sentence clustering, text similarity, topic extraction, and automatic summary evaluation.

IBM - Connect to the IBM Watson Alchemy API to analyze text for sentiment, keywords and broader concepts. With Watson's suite of NLP offerings, including Watson Natural Language Understanding (NLU), you can surface concepts, categories, sentiment, and emotion, and apply knowledge of unique entities in your industry to your data.

Linguakit - Linguakit API helps you analyze and extract information from texts. The API offers technology based on years of research in Natural Language Processing in a very easy and scalable SaaS model through a RESTful API.

Text Summarization - Text Summarization API provides a professional text summarizer service which is based on advanced Natural Language Processing and Machine Learning technologies. This tool can be used to summarize short important texts from the URLs or documents users provide.

Twinword - Twinword Text Analysis provides a single API for many text analysis needs: sentiment analysis, topic tagging, lemmatizer (doing things properly with the use of a vocabulary and morphological analysis of words), and more. Twinword provides multiple NLP tools in a single plan that enables natural language processing to analyze and understand human sentences.

Outside Experts Are a Great Option

If you don’t already have in-house AI development expertise, in most cases you are better off bringing in consultants rather than trying to hire a full-time AI developer. Once you have an AI project with outside consultants under your belt, you can evaluate whether your future AI projects warrant the cost of a staff AI developer, and you will have a better idea of what to look for in an AI resource.

At AndPlus, we have a wide range of AI expertise on our development staff, and we know what it takes to make your AI projects succeed. From choosing the right AI approach to designing the algorithm and training and testing the system, we your full-service AI resource.

In many ways, NLP is still in its infancy and will continue to improve and mature as researchers and developers find innovative ways to make it more accurate and applicable to a wider range of tasks. Customizable NLP products and services are expected to go mainstream in the short term. Indeed, Amazon Web Services is already providing a service called Amazon Comprehend for text mining—call it “NLP as a service.” Look for more entrants to emerge in this important and lucrative market.

One thing is for certain, our work and personal lives will increasingly be affected by machines that can read and understand the spoken and written word. As this capability is refined and becomes widespread, new opportunities will emerge that deliver new value to customers and businesses alike.

With new use expectations developing as a result of touchless interactions between suppliers and customers, NLP capabilities will certainly play an increasing role in business. Considering this fact, now is the time to explore how NLP can create a competitive advantage to your business and deliver new value to your customers.

WORKING WITH A "DIGITAL SHERPA" FOR YOUR JOURNEY INTO ADVANCED TECHNOLOGIES

With proper, thorough planning, and the right guide to build and take you through an AI Roadmap, a project can go flawlessly, or nearly so. Be aware: You might have only one shot to get it right. A failed adoption of an advanced technology can lead to countless unforeseen consequences as you scale.

AndPlus can be your digital Sherpa. We've done it, we’ve seen it all, and we can do it for you.

Need a quick crash-course in the 'MVP' methodology?

A Minimum Viable Product (MVP) has only those features needed to validate its continued development. Its primary goal is to obtain this insight at a lower cost than that needed to develop a product with more features.

Feature Prioritization

Our process begins by identifying the primary goal that will address both our client's business goals and the end user's goals. We select the methods that the MVP will use to accomplish these goals. Our design team then defines the minimum scope of work and use this list of features to map the ideal user journey.

Early product prototypes are often developed at this stage in order to illustrate concepts and ensure that business objectives and user experiences are aligned and optimized.

Development

Once the user journey is mapped, the code starts flowing. Early prototypes/wireframes are brought to life by our engineering team. We use an Agile Scrum process that is custom tailored to our industry. And that's the kicker. We don't just utilize this same Agile framework straight from the textbook, we optimize the development process based upon more than a decade of development experience gained from hundreds of digital development projects.

Output

The fun part! Our sprints run in 2-week increments. You get actual working builds of your product every two weeks. These builds are tested, and iterated upon as the project moves forward. We pride ourselves on iterating these builds to perfection by launch day.

Our deep expertise and custom Agile development process enable AndPlus to iterate quickly, provide transparency, and deliver on time and on budget — helping our clients get to market faster.