Introduction to large language models : (Record no. 240856)

MARC details
000 -LEADER
fixed length control field 13783nam a22002177a 4500
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20260210155124.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 260209b |||||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9789363864740
040 ## - CATALOGING SOURCE
Transcribing agency AIMIT LIBRARY
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Edition number 1
Classification number 006.3
Item number CHAT
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Chakraborty, Tanmoy.
9 (RLIN) 254137
245 ## - TITLE STATEMENT
Title Introduction to large language models :
Remainder of title generative ai for text /
Statement of responsibility, etc. By Tanmoy Chakraborty.
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. New Delhi :
Name of publisher, distributor, etc. Wiley India Pvt Ltd ,
Date of publication, distribution, etc. 2025.
300 ## - PHYSICAL DESCRIPTION
Extent xxi,461 p.;
Other physical details PB
Dimensions 24 cm.
500 ## - GENERAL NOTE
General note Introduction to Large Language Models (LLMs) is a comprehensive guide for understanding the foundations and advancements of Generative AI for Text. Designed for educators and enthusiasts, the book starts with key linguistic concepts and progresses through NLP fundamentals—from word embeddings to pretrained foundational models.<br/><br/> <br/><br/>Readers will learn how LLMs process and generate language, overcome limitations, and enhance performance using techniques like prompt engineering, retrieval-augmented generation, and human alignment. The book uniquely presents cutting-edge research in a concise format, enriched with visual aids, exercises, and practical resources.<br/><br/> <br/><br/>Ideal for computer science faculty, this resource offers both theoretical insights and real-world applications, showcasing how LLMs like ChatGPT are transforming technology and advancing AI innovation.<br/><br/>
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Endorsement<br/><br/>Preface<br/><br/>Acknowledgement<br/><br/>Foreword<br/><br/>1 Introduction<br/><br/>1.1 What is a Language Model?<br/><br/>1.2 Evolution of Language Modelling Technologies<br/><br/>1.3 Scaling Laws in Language Models<br/><br/>1.4 Evolution of LLMs<br/><br/>1.4.1 The Emergence and Development of LLMs<br/><br/>1.4.2 Implications of Encoder-Decoder in LLM Development<br/><br/>1.4.3 Optimising Scale and Resource Efficiency in LLMs<br/><br/>1.5 Organisation of the Book<br/><br/>Additional Resources<br/><br/>Bibliography<br/><br/> <br/><br/>2 An Overview of Natural Language Processing and Neural Networks<br/><br/>Part I: Natural Language Processing<br/><br/>2.1 Computational Linguistics and Natural Language Processing<br/><br/>2.2 Overview of the Natural Language Processing Pipeline<br/><br/>2.3 Morphology<br/><br/>2.3.1 Morphemes<br/><br/>2.3.2 Stemming<br/><br/>2.3.3 Lemmatisation<br/><br/>2.3.4 Lexicon<br/><br/>2.4 Tokenisation<br/><br/>2.4.1 Advanced Techniques: Subword Tokenisation<br/><br/>2.5 Syntactics<br/><br/>2.6 Semantics<br/><br/>2.7 Introduction to Language Modelling<br/><br/>Part II: Neural Networks<br/><br/>2.8 The Perceptron<br/><br/>2.8.1 Definition<br/><br/>2.8.2 Implementing AND, OR, and XOR Logic<br/><br/>2.9 Multilayer Perceptron<br/><br/>2.9.1 Neural Networks<br/><br/>2.9.2 Types of Activation Functions<br/><br/>2.10 Training Neural Networks<br/><br/>2.10.1 Backpropagation<br/><br/>2.10.2 Batching<br/><br/>2.10.3 Hyperparameters<br/><br/>2.10.4 Regularisation<br/><br/>2.11 Vanishing and Exploding Gradients<br/><br/>2.12 Evaluation Metrics<br/><br/>2.13 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/>3 Word Embedding<br/><br/>3.1 Distributional Hypothesis<br/><br/>3.2 Vector Semantics<br/><br/>3.2.1 Defining and Measuring Semantic Similarity<br/><br/>3.3 Types of Word Embedding<br/><br/>3.3.1 Frequency-Based Embeddings<br/><br/>3.3.2 Word2Vec<br/><br/>3.3.3 Global Vectors for Word Representation<br/><br/>3.3.4 FastText<br/><br/>3.4 Bias in Word Embedding<br/><br/>3.5 Limitations of Word Embedding Methods<br/><br/>3.6 Applications of Word Embeddings<br/><br/>3.7 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>4 Statistical Language Model<br/><br/>4.1 Statistical Language Model<br/><br/>4.1.1 The Conditional Probability<br/><br/>4.1.2 The Chain Rule of Probability<br/><br/>4.1.3 The Markov Assumption<br/><br/>4.1.4 Unigram Language Model<br/><br/>4.1.5 Bigram Language Model<br/><br/>4.2 Smoothing<br/><br/>4.2.1 The Unknown Tokens<br/><br/>4.2.2 Smoothing<br/><br/>4.2.3 Back-Off<br/><br/>4.2.4 Interpolation<br/><br/>4.2.5 Good-Turing<br/><br/>4.3 Evaluation of Language Model<br/><br/>4.3.1 Extrinsic Evaluation<br/><br/>4.3.2 Intrinsic Evaluation<br/><br/>4.3.3 Human Evaluation<br/><br/>4.3.4 Evaluation Metrics<br/><br/>4.3.5 Benchmark Suits<br/><br/>4.4 Limitations of Statistical Language Models<br/><br/>4.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/>5 Neural Language Models<br/><br/>5.1 Convolutional Neural Networks<br/><br/>5.1.1 Components of CNNs: Kernel, Stride, Pooling, and Padding<br/><br/>5.1.2 Hierarchical and Dilated Convolutions<br/><br/>5.1.3 Applications of CNNs in NLP<br/><br/>5.2 Recurrent Neural Networks<br/><br/>5.2.1 Training RNNs<br/><br/>5.2.2 Applications of RNNs<br/><br/>5.2.3 Challenges in Sequence Modelling<br/><br/>5.2.4 RNN Variants: LSTM, GRU, and Bidirectional RNNs<br/><br/>5.3 Sequence-to-Sequence Models<br/><br/>5.3.1 Training Sequence-to-Sequence Models<br/><br/>5.3.2 Inference Decoding<br/><br/>5.3.3 Applications of Sequence-to-Sequence Models<br/><br/>5.4 Attention Mechanisms<br/><br/>5.4.1 Introduction to Attention<br/><br/>5.4.2 Advantages of Attention<br/><br/>5.4.3 Variants of Attention<br/><br/>5.5 Limitations of Neural Language Models<br/><br/>5.6 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>6 Transformers<br/><br/>6.1 Self-Attention<br/><br/>6.1.1 Multi-Head Self-Attention<br/><br/>6.2 Transformer Encoder Block<br/><br/>6.2.1 Components of the Transformer Encoder Block<br/><br/>6.2.2 Feed-Forward Neural Network<br/><br/>6.2.3 Layer Normalisation<br/><br/>6.2.4 Residual Connections<br/><br/>6.3 Transformer Decoder Block<br/><br/>6.3.1 Masked Multi-Head Self-Attention<br/><br/>6.3.2 Cross-Attention (Encoder-Decoder Attention)<br/><br/>6.4 Positional Embeddings<br/><br/>6.4.1 Types of Positional Embeddings<br/><br/>6.4.2 Rotary Position Embedding<br/><br/>6.5 Efficient Attention Mechanisms<br/><br/>6.5.1 KV Caching in Multi-Head Self-Attention<br/><br/>6.5.2 Multi-Query Attention<br/><br/>6.5.3 Grouped-Query Attention<br/><br/>6.5.4 Sliding Window Attention<br/><br/>6.6 An Alternate Formulation of Transformers<br/><br/>6.6.1 Residual Stream Perspective of Transformers<br/><br/>6.6.2 Attention Heads: Reading and Writing<br/><br/>6.6.3 Feed-Forward Networks: Transformation of Residual Streams<br/><br/>6.6.4 Prediction Head: Generating the Next Token<br/><br/>6.6.5 Decomposing the Transformer: Attention and Feed-Forward Contributions<br/><br/>6.6.6 Residual Networks as Shallow Ensembles<br/><br/>6.6.7 Interpreting the Mechanism of LLMs<br/><br/>6.7 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>7 Language Model Pretraining<br/><br/>7.1 Embeddings from Language Model<br/><br/>7.1.1 Architecture and Training of ELMo<br/><br/>7.1.2 Applications of ELMo<br/><br/>7.1.3 Limitations of ELMo<br/><br/>7.2 Evaluation Datasets<br/><br/>7.3 Encoder-Based Pretraining<br/><br/>7.3.1 Fundamentals of Encoder-Based Models<br/><br/>7.3.2 Training Paradigm<br/><br/>7.3.3 BERT Pretraining<br/><br/>7.3.4 Applications and Limitations<br/><br/>7.4 Decoder-Based Pretraining<br/><br/>7.4.1 Decoder-Based Architecture<br/><br/>7.4.2 Training Paradigm<br/><br/>7.4.3 GPT Pretraining<br/><br/>7.4.4 Applications and Limitations<br/><br/>7.5 Encoder-Decoder Based Pretraining<br/><br/>7.5.1 Architecture<br/><br/>7.5.2 Joint Pretraining Strategy<br/><br/>7.5.3 T5 Pretraining<br/><br/>7.5.4 Applications and Limitations<br/><br/>7.6 Emergence of Large Language Models<br/><br/>7.7 Limitations of Pretraining<br/><br/>7.8 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>8 Fine-Tuning and Alignment of LLMs<br/><br/>8.1 Moving from Pretraining to Fine-Tuning<br/><br/>8.2 Fine-Tuning on Various Task-Specific Applications<br/><br/>8.2.1 Sequence Classification<br/><br/>8.2.2 Pairwise Sequence Classification<br/><br/>8.2.3 Sequence Labelling<br/><br/>8.2.4 Learning Spans<br/><br/>8.2.5 Challenges in Classical Fine-Tuning Methods<br/><br/>8.3 Instruction Tuning<br/><br/>8.4 Alignment Methods<br/><br/>8.4.1 Reinforcement Learning from Human Feedback<br/><br/>8.4.2 Direct Preference Optimisation<br/><br/>8.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>9 Prompting Strategies in LLMs<br/><br/>9.1 Prompt Engineering<br/><br/>9.1.1 Prompt Shape<br/><br/>9.1.2 Manual Template Engineering<br/><br/>9.1.3 Automated Template Learning<br/><br/>9.1.4 Continuous Prompts<br/><br/>9.2 Prompt Application<br/><br/>9.2.1 In-Context Learning<br/><br/>9.2.2 Knowledge Probing<br/><br/>9.2.3 Classification-Based Tasks<br/><br/>9.2.4 Information Extraction<br/><br/>9.2.5 Reasoning in Natural Language Processing<br/><br/>9.2.6 Question Answering<br/><br/>9.2.7 Text Generation<br/><br/>9.2.8 Automatic Evaluation of Text Generation<br/><br/>9.3 Chain-of-Thoughts<br/><br/>9.4 Tree-of-Thoughts<br/><br/>9.5 Graph-of-Thoughts<br/><br/>9.6 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>10 Efficient Methods for Fine-Tuning LLMs<br/><br/>10.1 Model Compression with Knowledge Distillation<br/><br/>10.1.1 White-Box Knowledge Distillation<br/><br/>10.1.2 Meta Knowledge Distillation<br/><br/>10.1.3 Black-Box Knowledge Distillation<br/><br/>10.2 Model Compression Techniques<br/><br/>10.2.1 Model Pruning<br/><br/>10.2.2 Model Quantisation<br/><br/>10.3 Parameter-Efficient Fine-Tuning<br/><br/>10.3.1 Adapters<br/><br/>10.3.2 Prefix Tuning<br/><br/>10.3.3 Prompt Tuning<br/><br/>10.3.4 Selective PEFT Techniques<br/><br/>10.3.5 Reparameterisation-Based PEFT Techniques<br/><br/>10.3.6 Hybrid Approaches for Efficient Fine-Tuning<br/><br/>10.4 Efficient Strategies for Fine-Tuning LLMs<br/><br/>10.4.1 Mixed-Precision Tuning<br/><br/>10.4.2 Data Selection for Efficient Fine-Tuning<br/><br/>10.4.3 Prompt Compression<br/><br/>10.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>11 Augmented Large Language Models<br/><br/>11.1 Retrieval-Augmented Generation<br/><br/>11.1.1 Indexing in RAGs<br/><br/>11.1.2 Context Searching in RAGs<br/><br/>11.1.3 Prompting in RAGs<br/><br/>11.1.4 Inferencing in RAGs<br/><br/>11.1.5 Comparison of RAGs with LLMs<br/><br/>11.2 Evaluation of RAGs<br/><br/>11.2.1 Assessing of Retrieval Quality<br/><br/>11.2.2 Generation Quality<br/><br/>11.2.3 Knowledge Integration and Factuality Evaluation<br/><br/>11.2.4 Response Time and Efficiency<br/><br/>11.2.5 User Satisfaction<br/><br/>11.2.6 RAGAs Framework for RAG Evaluation<br/><br/>11.3 Tool Calling with LLMs<br/><br/>11.3.1 Autonomously Determining Which Tools to Use and Where<br/><br/>11.3.2 Examples of Different Tools<br/><br/>11.3.3 Evaluation of Code Generation Capabilities of Agents<br/><br/>11.3.4 Error Handling and Optimisation<br/><br/>11.4 LLM Augmentation with Agents<br/><br/>11.4.1 Reasoning in LLM Agents<br/><br/>11.4.2 Planning in LLM Agents<br/><br/>11.4.3 Handling Memory in LLM Agents<br/><br/>11.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>12 Multilingual and Multimodal LLMs<br/><br/>12.1 Multilingual Language Models<br/><br/>12.1.1 The Evolution of Multilingual NLP<br/><br/>12.1.2 The Need for Multilingual LLMs<br/><br/>12.1.3 Cross-Lingual Representation Learning<br/><br/>12.1.4 Applications<br/><br/>12.2 Multimodal Language Models<br/><br/>12.2.1 Integration of Diverse Modalities<br/><br/>12.2.2 Applications<br/><br/>12.3 Training Multilingual and Multimodal LLMs<br/><br/>12.3.1 Efficient Data Collection and Preprocessing<br/><br/>12.3.2 Model Training Strategies<br/><br/>12.4 Addressing Challenges in Multilingual and Multimodal LLMs<br/><br/>12.4.1 Challenges in Multilingual LLMs<br/><br/>12.4.2 Challenges in Multimodal LLMs<br/><br/>12.5 Future Directions and Emerging Trends<br/><br/>12.6 Limitations of Multilingual and Multimodal LLMs<br/><br/>12.7 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>13 Responsible LLMs<br/><br/>13.1 Inaccurate, Inappropriate, and Unethical Behaviour of LLMs<br/><br/>13.2 Responsible AI<br/><br/>13.3 Bias<br/><br/>13.3.1 Visibility of Bias<br/><br/>13.3.2 Source of Bias<br/><br/>13.4 Bias Mitigation<br/><br/>13.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>14 Advanced Topics in Large Language Models<br/><br/>14.1 Reasoning with LLMs<br/><br/>14.1.1 Advancements in Reasoning Capabilities<br/><br/>14.1.2 Challenges in Reasoning with LLMs<br/><br/>14.1.3 Types of Reasoning Tasks<br/><br/>14.1.4 How Do LLMs Approach Reasoning?<br/><br/>14.1.5 Evaluating Reasoning Abilities in LLMs<br/><br/>14.2 Handling Long Context in LLMs<br/><br/>14.2.1 Challenges in Processing Long Context<br/><br/>14.2.2 Training and Fine-Tuning Approaches to Extend Context Length<br/><br/>14.2.3 Evaluation of Long-Context LLMs<br/><br/>14.3 Model Editing<br/><br/>14.3.1 Conditions for Successful Editing<br/><br/>14.3.2 Methods for Model Editing<br/><br/>14.3.3 Metrics for Evaluation of Model Editing<br/><br/>14.4 Hallucination in LLMs<br/><br/>14.4.1 Definition<br/><br/>14.4.2 Sources of Hallucination<br/><br/>14.4.3 Metrics Measuring Hallucination<br/><br/>14.4.4 Hallucination Mitigation<br/><br/>14.5 Self-Evolving LLMs<br/><br/>14.5.1 Conceptual Framework<br/><br/>14.5.2 Evolution Objectives and Techniques<br/><br/>14.5.3 Challenges<br/><br/>14.6 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/> <br/><br/>15 LLMs in Action<br/><br/>15.1 An Overview of the Landscape<br/><br/>15.1.1 Tracing the Evolution and Importance of LLMs in Contemporary AI<br/><br/>15.1.2 Open-Source vs Closed-Source Paradigms: Benefits and Trade-offs<br/><br/>15.2 A Panoramic View of LLMs<br/><br/>15.2.1 General-Purpose Large Language Models<br/><br/>15.2.2 Language-Specific LLMs<br/><br/>15.2.3 Domain-Specific LLMs<br/><br/>15.2.4 Task-Specific LLMs<br/><br/>15.3 Diverse Applications of LLMs<br/><br/>15.3.1 Healthcare: Enhancing Diagnostics and Patient Care<br/><br/>15.3.2 Finance: Transforming Data Analysis and Risk Management<br/><br/>15.3.3 Legal: Streamlining Research and Case Management<br/><br/>15.3.4 Education: Personalised Learning and Academic Support<br/><br/>15.4 Emerging Trends and Future Directions in LLMs<br/><br/>15.4.1 Beyond Text: The Advent of Multimodal LLMs<br/><br/>15.4.2 Autonomous Agents: The LLM Leap in AI Evolution (AutoGPT)<br/><br/>15.5 Summary<br/><br/>Additional Resources<br/><br/>Exercises<br/><br/>Bibliography<br/><br/>Index
Statement of responsibility Dr. Tanmoy Chakraborty is an Associate Professor in the Department of Electrical Engineering at IIT Delhi and an Associate Faculty Member at the Yardi School of Artificial Intelligence. An ACM Distinguished Speaker (2023–2025) and former Ramanujan Fellow (2018–2023), he has held key academic roles, including heading the Infosys Centre for Artificial Intelligence at IIIT Delhi.<br/><br/>Dr. Chakraborty earned his Ph.D. as a Google India scholar at IIT Kharagpur and completed a postdoctoral fellowship at the University of Maryland, College Park. His research spans Natural Language Processing (NLP), Graph Neural Networks, and Social Computing, with a focus on creating frugal, explainable LLMs for applications in mental health and cyber-informatics.<br/><br/>He leads the Laboratory for Computational Social Systems (LCS2) and also recipient of multiple faculty awards from Google, Adobe, and Accenture,
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Word embedding
9 (RLIN) 254138
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Language model pretraining
9 (RLIN) 254139
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Prompting strategies in LLMs
9 (RLIN) 254140
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Book
Edition 1
Call number prefix 006.3 CHAT
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Collection code Home library Current library Shelving location Date acquired Cost, normal purchase price Inventory number Total Checkouts Full call number Barcode Date last seen Cost, replacement price Price effective from Koha item type
    Dewey Decimal Classification     MCA St Aloysius Institute of Management & Information Technology St Aloysius Institute of Management & Information Technology Artificial intelligence 02/03/2026 845.00 Bill.no:1288 Bill.dt:2026/01/23   006.3 CHAT MCA17367 05/23/2026 633.75 02/09/2026 Book