concept feature in category AI

This is an excerpt from Manning's book Interpretable AI MEAP V03.
Supervised Learning is a type of machine learning system where the objective is to learn a mapping from an input to an output based on example input-output pairs. It requires labeled training data where inputs (also known as features) have a corresponding label (also known as target). Now how is this data represented? The input features are typically represented using a multi-dimensional array data structure or mathematically as a matrix X. The output or target is represented as a single-dimensional array data structure or mathematically as a vector y. The dimension of matrix X is typically m x n, where m represents the number of examples or labeled data and n represents the number of features. The dimension of vector y is typically m x 1 where m again represents the number of examples or labels. The objective is to learn a function f that maps from input features X to the target y. This is shown in Figure 1.5.
Figure 1.5: Illustration of Supervised Learning
![]()
In Figure 1.5 you can see how with supervised learning you are learning a function f that takes in multiple input features represented as X and provides an output that matches known labels or values represented as the target variable y. The bottom half of the figure shown an example where a labeled dataset is given and through supervised learning, you are learning how to map the input features to the output. The function f is a multivariate function since it maps from multiple input variables or features to a target. There are two broad classes of supervised learning problems:
Figure 2.4: Correlation Plot of the Features and the Target Variable for the Diabetes Dataset
![]()
In the previous chapter, we looked at tree ensembles especially Random Forest models and learned how to interpret them using model agnostic methods that are global in scope such as Partial Dependence Plots (PDPs) and feature interaction plots. We saw that PDPs are a great way of understanding how individual feature values impact the final model prediction at a global scale. We were also able to see how features interact with each other using the feature interaction plots and also how they can be used to expose potential issues such as bias. They are easy and intuitive to understand but a major drawback of PDPs is that it assumes features are independent of each other. Higher order feature interactions can also not be visualized using feature interaction plots.
In this chapter, we will look at more advanced model agnostic techniques that overcome these difficulties. We will specifically focus on techniques such as Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP) and Anchors. Unlike PDPs and feature interaction plots, these techniques are local in scope. This means that it can be used to interpret only a single instance or prediction. We will also now switch to interpreting neural networks that are inherently black box, specifically focusing on Deep Neural Networks (DNNs).

This is an excerpt from Manning's book AI-Powered Search MEAP V06.
For these reasons, it is now a common practice to use something called word embeddings to model a semantic meaning for term sequences in your index and queries. A word embedding for a term is a vector of features which represents the term’s conceptual meaning in a semantic space. Figure 2.11 demonstrates the terms now mapped to a dimensionally-reduced vector that can serve as a word embedding.
If you recall in section 2.3, we proposed thinking of a query for the phrase
apple juice
as a vector containing a feature for every word in any of our documents, with a value of1
for the termsapple
andjuice
, and a value of0
all other terms.
Listing 10.7. The judgment list, with features appended for just the query
social network
[Judgment(grade=1,qid=1,keywords=social network,doc_id=37799,features=[8.243603, 3.8143613, 2010.0],weight=1, Judgment(grade=0,qid=1,keywords=social network,doc_id=267752,features=[0.0, 6.0172443, 2013.0],weight=1, Judgment(grade=0,qid=1,keywords=social network,doc_id=38408,features=[0.0, 4.353118, 2010.0],weight=1, Judgment(grade=0,qid=1,keywords=social network,doc_id=28303,features=[3.4286604, 3.1086721, 1970.0],weight=1, Judgment(grade=1,qid=2,keywords=star wars,doc_id=11,features=[],weight=1, Judgment(grade=1,qid=2,keywords=star wars,doc_id=1892,features=[],weight=1, Judgment(grade=0,qid=2,keywords=star wars,doc_id=54138,features=[],weight=1, Judgment(grade=0,qid=2,keywords=star wars,doc_id=85783,features=[],weight=1, Judgment(grade=0,qid=2,keywords=star wars,doc_id=325553,features=[],weight=1]
Figure 10.6. Separating hyperplane impacted by range of one of the features.
![]()

This is an excerpt from Manning's book Zero to AI: A non-technical, hype-free guide to prospering in the AI era.
Eng: Sure, the list of features seems like a good start. And what target do you need from the model? Just the expected property price?
You: Correct, that’s all we need. The price will help us double-check the agents’ work, and we can also offer real-time quotes on our website for free.
Marketer: sure, we know that age plays a big role. These young millennials change companies all the time, while older people are more loyal. Also, if someone has been our client for a lot of time it’s a good indicator of loyalty.
Eng: Nice, we can look in the CRM and include a feature for “days since sign-up”, and one for age. Is age the only interesting demographic attribute?