Differentiation and Integration — The Deep Dive

Q11: The Two Moves of Calculus — Zooming and Gathering

Manish: In the last chapter we saw the shape of calculus from above — two questions, two tools, one theorem binding them. Now I want to go inside the two tools. Not the rules. Not "here's the power rule, memorize it." I want to actually watch differentiation happen and watch integration happen, slowly enough that I could have invented them myself. Because that's the test, right? If I couldn't have invented it, I don't really understand it.

Response: That is exactly the right test. So here is the promise for this chapter: by the end, differentiation and integration will not feel like two procedures you apply. They will feel like two moves — two things a curious mind naturally does when it stares at a changing thing long enough.

And here is the secret that makes everything below simple:

Note

Differentiation is the act of zooming in — until change has nowhere left to hide. Integration is the act of slicing and gathering — until the pieces become a whole.

One is a microscope. The other is a basket.

Everything else is just doing these two moves carefully.


1. One Picture, Two Questions

Take any quantity that changes — your position on a road trip, water in a tank, money in an account — and draw it as a curve.

 value
   │                          ●  ← here, the curve is climbing fast
   │                    ╱
   │               ╱
   │           ╱
   │       ╱
   │   ╱  ← here, climbing slowly
   │ ╱
   └──────────────────────────── time

Standing in front of this curve, there are exactly two natural questions a person can ask:

  1. At this exact point — how steep is it? (How fast is the thing changing right here?)
  2. Across this whole stretch — how much is underneath? (How much has piled up in total?)

The first question, asked carefully, is differentiation. The second, asked carefully, is integration. Both questions are easy to ask and seem impossible to answer exactly — and calculus is nothing more than the honest way out of that "impossible."

Let's take them one at a time.


2. Differentiation, Part 1: The Honest Failure

Start with the simplest changing thing we can fully see: a curve where the output is the square of the input.

f(x)=x2f(x) = x^2

At x=3x = 3, the value is 99. The question: how fast is this curve rising at exactly x=3x = 3?

The schoolbook way to measure "how fast" is rise over run — pick two points, see how much you climbed, divide by how far you walked:

steepness=riserun=change in valuechange in input\text{steepness} = \frac{\text{rise}}{\text{run}} = \frac{\text{change in value}}{\text{change in input}}

But here is the trap. Rise-over-run needs two points. We are asking about one point. Use only the point x=3x=3 and you get:

riserun=00— meaningless.\frac{\text{rise}}{\text{run}} = \frac{0}{0} \quad \text{— meaningless.}

At a single frozen point, nothing rises and nothing runs. The question "how steep, right here?" looks unanswerable. Ordinary math has genuinely given up.

Note

This failure is not a technicality. It is the whole reason differentiation exists. If two points worked, we'd never have needed calculus. The entire invention is one clever way to ask a two-point question about a one-point place.


3. Differentiation, Part 2: The Sneak-Up

Here's the clever way. We can't use one point — fine. So we use two points, but we make the second one sneak closer and closer to the first, and we watch what the answer does.

Second point at x=4x = 4? The curve goes from 99 to 1616:

16943=71=7\frac{16 - 9}{4 - 3} = \frac{7}{1} = 7

Too crude — that's the average steepness over a whole unit. Move closer. Then closer. Then closer:

from x=3x=3 to... rise run rise ÷ run
44 169=716 - 9 = 7 11 77
3.13.1 9.619=0.619.61 - 9 = 0.61 0.10.1 6.16.1
3.013.01 9.06019=0.06019.0601 - 9 = 0.0601 0.010.01 6.016.01
3.0013.001 0.0060010.006001 0.0010.001 6.0016.001
3.00013.0001 0.000600010.00060001 0.00010.0001 6.00016.0001

Now read that last column out loud: 7, 6.1, 6.01, 6.001, 6.00017,\ 6.1,\ 6.01,\ 6.001,\ 6.0001\ldots

The answer is walking straight toward 6 and slowing down right on its doorstep. It never gets to use the doorstep — at distance exactly zero we'd be back to the forbidden 0/00/0. But there is no doubt whatsoever about where it is heading.

Note

The steepness of x2x^2 at the point 33 is 6 — defined as the value the rise-over-run approaches as the second point slides home. That "value approached" is exactly the limit from the last chapter, doing the one job it was built for.

You just differentiated. No rule, no formula — a table, a pattern, and the honesty to say "I can't stand on the point, but I can see where the path leads."

        the sneak-up, drawn:

        │            ╱ ← line through 3 and 4: slope 7 (too steep)
        │          ╱╱ ← through 3 and 3.1: slope 6.1
        │        ╱╱╱ ← through 3 and 3.01: slope 6.01
        │      ●━━━  ← the lines settle onto ONE line: slope 6
        │    ╱       the TANGENT — the curve's direction at that point
        └────┴──────
             3

Each two-point line is called a secant (it cuts the curve). The line they settle into is the tangent (it touches the curve at exactly one point). The derivative is the slope of that tangent line.


4. Differentiation, Part 3: Why the Answer Is 2x2x — Watch a Square Grow

Doing the sneak-up at x=3x=3 gave 66. Try it at x=5x=5 and you'd get 1010. At x=1x=1, you'd get 22. There is a clear pattern: the steepness of x2x^2 at any point xx is always 2x2x.

School hands you this as "the power rule" and moves on. But why 2x2x? Here is the proof you can see with your eyes — no algebra, just a square growing.

x2x^2 is literally the area of a square with side xx. So asking "how fast does x2x^2 grow?" is asking: if I nudge the side a tiny bit, how much new area appears?

Nudge the side from xx to x+hx + h, where hh is tiny:

        x        h
   ┌─────────┬────┐
   │         │////│
   │  old    │////│  ← strip:  x · h
 x │  square │////│
   │  x²     │////│
   ├─────────┼────┤
 h │/////////│ ▒▒ │  ← strip:  x · h     ▒ corner: h · h
   └─────────┴────┘

The new area is exactly three pieces:

new area=xh+xhtwo strips+h2corner=2xh+h2\text{new area} = \underbrace{xh + xh}_{\text{two strips}} + \underbrace{h^2}_{\text{corner}} = 2xh + h^2

Now the rate — new area divided by the nudge:

2xh+h2h=2x+h\frac{2xh + h^2}{h} = 2x + h

And as the nudge hh sneaks toward zero... the answer walks straight toward 2x\mathbf{2x}.

Note

The two strips shrink in proportion to hh — they are the real, first-class change. The little corner h2h^2 shrinks much faster (a tiny number times a tiny number) — it dies out before it can matter. Differentiation is exactly this: keep what shrinks like hh, let go of what shrinks faster. Every derivative rule you will ever meet is some version of "the strips survive, the corner vanishes."

This is also why, at x=3x=3, the answer was 66: a bigger square has longer sides, so the same nudge paints longer strips. Of course the rate grows with xx. The formula 2x2x isn't a spell — it's two strips.


5. What You Actually Get: A Slope-Reading Machine

One more shift in seeing, and differentiation is fully yours.

We didn't just find one number (66 at the point 33). We found a rule that answers the question everywhere: at any input xx, the steepness is 2x2x. In other words:

Note

Differentiation takes a function (the values) and hands back a new function (the steepness at every point). Feed in x2x^2, get back 2x2x. Feed in position-over-time, get back speed-over-time. It is a machine that turns "where things are" into "how things are changing."

And this new function talks to you, point by point:

derivative here is... the curve here is... meaning
positive climbing the quantity is growing
negative falling the quantity is shrinking
zero flat a peak, a valley, a turning point
large steep changing violently
small gentle barely changing

That third row is quietly one of the most important facts in all of applied math: tops of hills and bottoms of valleys are exactly where the derivative reads zero. Want the cheapest, fastest, best, least error? Find where the derivative is zero. This single observation is the seed of all optimization — and, a few chapters from now, of how a neural network learns.


6. Integration, Part 1: The Opposite Predicament

Now turn the question around. Forget steepness. New problem:

Note

Water flows into a tank all day, but the flow rate keeps changing — a trickle at dawn, a gush at noon, easing off by night. You have the full record of the rate at every moment. Question: how much water is in the tank at the end?

If the rate were constant, this is multiplication: rate × time, done. But the rate is never the same twice. Multiplication needs one number, and we have infinitely many.

Notice the beautiful symmetry of our two predicaments:

  • Differentiation failed because division (riserun\frac{\text{rise}}{\text{run}}) broke at a single point.
  • Integration fails because multiplication (rate × time) breaks across a changing stretch.

Calculus answers both with the same kind of move: stop demanding the exact answer in one step — build a sequence of honest approximations and watch where they're heading.


7. Integration, Part 2: Slice, Pretend, Add, Refine

Here is the move, in four words: slice, pretend, add, refine.

Slice the day into chunks — say, one hour each. Pretend the rate is constant within each chunk (it isn't, but a lie this small is manageable). Inside each chunk, multiplication works again: rate × 1 hour. Add the chunks. You get an estimate of the total.

 flow
 rate │        ┌───┐
      │     ┌──┤   ├──┐               each bar = rate × width
      │     │  │   │  ├──┐            = water added in that slice
      │  ┌──┤  │   │  │  ├──┐
      │──┤  │  │   │  │  │  ├──
      └──┴──┴──┴──┴──┴──┴──┴──── time
         the bars hug the true curve... roughly

The estimate is wrong, of course — within each hour the rate drifted while we pretended it stood still. So refine: use half-hour slices. The pretending gets less wrong. Minute slices: less wrong still. Second slices, millisecond slices...

slice width number of slices the estimate is...
1 hour 24 rough
1 minute 1,440 good
1 second 86,400 excellent
shrinking toward 0 growing without bound walking toward one exact value

There it is again — the same sneak-up, the same trusty limit. The estimates approach a single number, and that number is the integral: the exact total, built from infinitely many vanishingly thin honest pieces.

Note

The integral sign \int is literally a stretched-out letter S, for sum — Leibniz chose it himself. And dxdx is the width of one vanishingly thin slice. So when you read f(x)dx\int f(x)\, dx you are reading a sentence: "Sum up — height f(x)f(x) times sliver-width dxdx — across all the slivers." It was never cryptic notation. It is a picture of the bars, written in ink.

And since each slice contributes (height × width) — a thin rectangle — the total is exactly the area under the rate curve, the picture from last chapter, now earned from the ground up.


8. Integration, Part 3: One You Can Check by Hand

Trust, then verify. Take a rate we fully understand: a car accelerating so its speed at time tt is v=2tv = 2t (after 1 second, 2 m/s; after 3 seconds, 6 m/s). How far does it travel in the first 3 seconds?

Speed is changing constantly, so no single multiplication works. But draw the picture:

 speed
   6 │            ╱│
     │          ╱//│
   4 │        ╱////│      the area under v = 2t
     │      ╱//////│      from 0 to 3 is a TRIANGLE:
   2 │    ╱////////│
     │  ╱//////////│      area = ½ · base · height
   0 └──────────────      = ½ · 3 · 6 = 9
     0    1    2   3  time

The slicing process — thin bars hugging that ramp ever tighter — converges to the triangle's area: 1236=9\frac{1}{2} \cdot 3 \cdot 6 = \mathbf{9} meters.

Now watch this. We know from §4 that the derivative of t2t^2 is 2t2t. So ask: which function has 2t2t as its rate? Answer: t2t^2. Evaluate it at t=3t=3:   32=9\;3^2 = \mathbf{9}.

The slicing gave 99. The reverse-derivative gave 99. Same answer, two completely different roads. That is not a coincidence. That is the Fundamental Theorem of Calculus, seen working in front of you.


9. Why They Undo Each Other — Seen, Not Stated

Last chapter declared that differentiation and integration are inverses. Now we can actually see why, and it takes one paragraph.

Imagine the accumulated total as a quantity that grows while you sweep from left to right under the curve — like a paint roller filling in the area:

      │     ╱──────╲
      │   ╱█████████╲▏←  the sweeping edge
      │  ╱██████████ ▏
      │ ╱███████████ ▏   ask: as the edge advances one sliver,
      └──────────────────  how fast does the painted area GROW?

         it grows by (height of curve) × (sliver width)
         → the GROWTH RATE of the area = the HEIGHT of the curve

As the edge advances, the area grows by one thin strip — and the strip's size is just the curve's height right there. In other words:

Note

The rate at which the accumulated area grows is the height of the curve you're accumulating. Rate-of-growth is the derivative; accumulated area is the integral. So the derivative of the integral hands you back the original function. Gathering and zooming are the same staircase — walked up, then walked down.

The car makes it concrete. Your speedometer is the derivative of your odometer (it reads how fast the mileage is climbing right now). Your odometer is the integral of your speedometer (it accumulates speed, moment by moment, into distance). Two gauges, one truth, read in opposite directions — and every car on the road carries a working model of the Fundamental Theorem on its dashboard.

flowchart LR
    O["ODOMETER<br/>total distance<br/>(the accumulation)"]
    S["SPEEDOMETER<br/>speed right now<br/>(the rate)"]
    O -->|"differentiate<br/>(zoom in on the instant)"| S
    S -->|"integrate<br/>(gather every moment)"| O

10. The Honest Asymmetry: Why Integration Feels Harder

One practical truth, told now so it never surprises you: in actual practice, differentiation is mechanical, integration is an art. It helps to know why, because the reason is structural, not personal.

  • Differentiation is a forward move. You're given the function and you do something to it — nudge, divide, let the corner die. There's a recipe, and the recipe always terminates. Like demolition: any building can be knocked down by following procedure.
  • Integration is a backward move. The cleanest way to integrate is to ask "whose derivative is this?" — you're trying to un-do a process by recognizing its fingerprints. Like reconstruction: given the rubble, rebuild the exact building. Some you recognize instantly, some take ingenuity, and some provably cannot be written in closed form at all — for those, mathematicians simply slice-and-sum numerically, which always works, because slicing is the definition.
Note

Not because you're missing something, but because recognizing is harder than doing, in math as in life. You can square a number in your head; finding a square root takes searching. Differentiation squares; integration searches.


11. The Two Moves at Five Depths

In the house tradition — the same truth, descending:

At the school level:

Differentiation finds the slope of a curve at a point; integration finds the area under a curve between two points.

At the deeper mathematical level:

Differentiation extracts a function's local linear behavior — the straight line it secretly is, up close. Integration assembles global quantity from local data, gathering infinitely many vanishing contributions into a finite whole.

At the philosophical level:

Differentiation trusts that the complicated, up close, becomes simple — that every curve looks straight if you zoom in far enough. Integration trusts the reverse: that the infinite, gathered carefully, becomes finite — that a whole can be honestly known through its vanishing parts.

At the poetic level:

The derivative is a question whispered to a single instant. The integral is the patience to listen to every instant in turn. The Fundamental Theorem is the discovery that the whisper and the listening were one conversation.

At the first-principle level:

Both moves are the same act of disciplined approach: build a sequence of answerable questions whose answers converge, and define the unanswerable one by where they're heading. Zooming converges onto a slope. Slicing converges onto an area. Calculus is the art of the converging question.


12. What to Carry Out of This Chapter

Note
  1. Differentiation = the sneak-up. Two points, slide one home, watch rise-over-run settle. The derivative is where it settles — the slope of the tangent line.
  2. The rules are shortcuts for the sneak-up. x22xx^2 \to 2x is just two strips survive, the corner dies. Every rule you'll meet is a pre-computed limit, nothing more.
  3. Integration = slice, pretend, add, refine. Thin honest rectangles, hugging the curve ever tighter, converging on the exact total — the area under the rate.
  4. They undo each other because the painted area grows exactly as fast as the curve is tall. Speedometer and odometer. One truth, two directions.
  5. When stuck, return to the picture. Every differentiation question is secretly "how steep, right here?" Every integration question is secretly "how much, all together?" The symbols are just those two pictures, written down.

13. The Bridge Forward

We now own the two moves — and one row of the table from §5 is about to become the hero of this whole book: the derivative is zero at the bottom of a valley.

Because here is what's coming. A neural network's error is a function — feed in the network's settings, out comes how wrong it is. Training a network means finding the bottom of that error valley. And the way down? At every step, ask the derivative: which direction is downhill, right here? — then step that way, and ask again. Millions of tiny sneak-ups, walking a machine toward understanding. That walk is called gradient descent, and it is differentiation — the same square-growing, corner-vanishing move from §4 — running at planetary scale inside every model you have ever talked to.

Calculus studies how the shape and position change. Differentiation reads the instant; integration gathers the whole. Optimization follows the derivative downhill — and that is where machines learn.

The microscope and the basket are in hand. Next, we point the microscope at a valley floor.


← Back: What Is Calculus, Really? | Back to Index | Next: Gradient Descent →