This is just a talk about Gen AI done with Claude for Twistag

f(x) = φ(wX + b)

At its core, all neural networks start with this simple linear transformation followed by a non-linear activation function φ. It's just a weighted sum of inputs plus a bias term, passed through φ.

↓

f(f(f(x))) // Deep Learning

x₁

x₂

x₃

h₁

h₂

h₃

h₄

h₁

h₂

h₃

By stacking these transformations and adding non-linear activation functions, we create deep neural networks that can learn complex patterns.

f(x₁...xₙ) = Attention(Q, K, V)

Attention mechanism:
1. Query (Q): What we're looking for
2. Key (K): What we match against
3. Value (V): What we retrieve

Attention = softmax(QK^T)V

This is how transformers "pay attention" to relevant parts of the input, allowing them to handle long-range dependencies and context.

predict(context) → word

"The cat sits on the" → dog → mat

When prediction fails, the weights (Ws) are adjusted through backpropagation. The network learns from its mistakes, continuously updating its parameters to make better predictions next time.

"that's it"