DATASCIENCE COURSE – DURATION: 6 MONTHS (360 HOURS)
Module 1: Python Programming for Data Science
- Python syntax & semantics
- Variables, type conversion
- Operators, control flow (if, else, loops)
- Functions, lambda, map, filter, reduce
- Data types: lists, tuples, sets, dictionaries
- List comprehension
- Modules: os, random, re, datetime
- File handling & Exception handling
- OOPs: Classes, objects, inheritance, polymorphism, encapsulation
Module 2: NumPy for Numerical Computing
- NumPy arrays creation & indexing
- Array attributes and reshaping
- Mathematical operations
- Broadcasting and vectorization
- Statistical & linear algebra operations
Module 3: Data Analysis with Pandas
- Series and DataFrames
- Data selection, filtering, and indexing
- Import/export (CSV, Excel, etc.)
- Handling missing data
- GroupBy, merge, join
- Pivot tables and reshaping
Module 4: Data Visualization
- Matplotlib: line, bar, pie, scatter, histogram, box plots
- Customizations: labels, legends, subplots, grids
- Seaborn:
- Categorical, distribution, regression plots
- Heatmaps, pairplots, violin plots
- Plotting with missing data
Module 5: Statistics for Data Science
- Descriptive & Inferential Statistics
- Central Tendency: Mean, Median, Mode
- Dispersion: Variance, Standard Deviation
- Five Number Summary, Boxplots
- Correlation, Covariance
- Probability Basics: Outcomes, Events
- Normal Distribution, Z-score, Skewness
- Hypothesis Testing: Null vs Alternate
- P-Value, Confidence Interval
- Z-test, T-test, Type I and Type II Errors
- Central Limit Theorem
Module 6: Supervised Machine Learning
- What is Machine Learning?
- ML Pipeline Overview
- Types of Learning
- Linear Regression, Polynomial Regression
- Logistic Regression
- Evaluation Metrics (MAE, RMSE, R², Confusion Matrix, Precision, Recall, F1)
- Regularization (L1, L2)
- SVM (Support Vector Machines)
- KNN (K-Nearest Neighbors)
- Decision Trees, Random Forest
- Ensemble Learning: Bagging (RF), Boosting (XGBoost, AdaBoost)
- Naïve Bayes Classifier
- Hyperparameter Tuning (GridSearch, RandomizedSearchCV)
Module 7: Unsupervised Machine Learning
- K-Means & K-Means++
- Hierarchical clustering (agglomerative, divisive)
- DBSCAN
- Dimensionality reduction (PCA, t-SNE – optional)
Module 8: Deep Learning
- Introduction to Deep Learning
- Understanding Neural Network Architecture
- Artificial Neural Networks (ANN)
- Forward Propagation, Backpropagation
- Cost Functions & Optimizers
- Recurrent Neural Networks (RNN) – Theory + Practical
- Long Short-Term Memory (LSTM)
- Gated Recurrent Unit (GRU)
- Bidirectional RNNs
- CNN Overview (for completeness)
Module 9: Natural Language Processing (NLP)
- NLP introduction and pipeline
- NLTK basics
- Tokenization, stop words, stemming, lemmatization
- Named Entity Recognition (NER)
- BOW, TF-IDF, N-grams
- Word Embeddings, Word2Vec
- Text preprocessing techniques
Module 10: Generative AI (Gen AI) & LLMs
- Introduction to Gen AI
- Encoder-decoder architecture
- Self-attention mechanism
- Evolution of LLMs (GPT, BERT, etc.)
- Tools: Ollama, Langchain, Hugging Face
- RAG (Retrieval-Augmented Generation)
- Vector DB: FAISS
- Chatbot creation & context summarization
Module 11: Projects & Case Studies
- 25+ practical end-to-end projects across:
- Data analysis, cleaning, transformation
- Machine learning (classification, regression)
- NLP and text mining
- Deep learning (image and sequence data)
- LLM & chatbot integrations