Back to Calculator Index
This page is a little less active than the others, but I want to share the analysis that I did to achieve this solution. First, what it is, then the calculator, and then what I've done with it.
In a general sense, this article also applies to many kinds of numerical computing, curve-fitting and modeling. The best approach is kind of just to do everything and see what's best; "best" being determined by the numerical range and accuracy needed, and how much compute time or memory you can spend doing it. Think of this article as an applied example, where many methods are tried, most are eliminated, and a few succeed as the best options. It's tempting to think the most elegant solution might be the best, but real calculations are hardware-dependent, and often a more crude method ends up the best.
The NCP18XH103F03RB, from Murata, is a cheap, plentiful, and accurate NTC thermistor. The manufacturer's tabulated data (as of 2013 when I downloaded it) is here. This is pretty easy to work with, but you need to measure the resistance in circuit. It would be helpful if it could be simplified for, say, a simple resistor divider ratio. Better still: if we're measuring with an ADC and microcontroller, the conversion function should use little memory and CPU.
This uses a linear interpolation on the manufacturer's data to find the resistance or temperature. R1 and Vref are inputs which affect only the value of Vo. R25 is listed for reference. (In the future, I may put a list of thermistors in the selector box. E-mail me suggestions for preferred parts!) The last three fields (Thermistor Value at Temperature, Temperature, and Output Voltage) act as a group, where the other two are calculated from the one that changed.
The very first, simplest, naivest approach, would be to do the opposite of what we've just done: reverse the voltage divider equation (Vo = Vref R2 / (R1 + R2)), solving for R2. Then look up R2 in the table. This at least has the advantage of most generality: we can plug in any resistor value, and any thermistor table, on the fly!
But this stinks for an embedded system. The sensor probably won't ever change, we need several division steps (which goes slow on almost every MCU), the resistor value is huge (between, say, 0 and a million!), and has to be looked up in a table (also of big numbers). At least the table is sorted and one-to-one, so a binary search will suffice, but geez, that's a lot of work. (I'd estimate that you'd need about 600 bytes for the array, 200 bytes for the code, and 2000 cycles to run it.)
(For what it's worth, memory footprints and instruction timings will be relative to an AVR 8-bit core, justified with a wave-of-the-hand where implementations aren't provided, and with an instruction count when they are. More powerful systems (like an ARM core) will probably use fewer cycles to compute the same steps, and run much faster anyway. Pick whichever has the best overall cost: from production all the way back to development time.)
So first, I simply looked at the function. Above is the ADC count (as seen by a 12 bit ideal ADC) for a voltage divider using a 4.75kΩ pull-up and a 10k thermistor. Importantly, it's still a one-to-one function, so we have that going for us. But a linear best-fit wouldn't be nearly accurate enough (±10°C or so!) for this 1% part.
A word about approximations. Having decided not to do it brute-force, we'll have to look at something at least partially pre-computed. That is, the data in memory is not the R(T) table, but something calculated from it. We should prefer working in ADC counts, so the following will use that.
The simplest of these, for a computer, is a dumb lookup table. The ADC has 212 output values, so we only need a 4k array (of at least 8 bits precision). Well, "only". Memory is at a premium for embedded systems. Though if you have the free space, and need the near-instantaneous conversion time (probably all of about 20 clock cycles), this is a fine way to "compute" many kinds of things, quickly!
The table doesn't need to cover the complete span. We could range-reduce it, and we could use fewer points (less frequent sampling of the curve). The range is naturally limited, because the thermistor data only extends to certain temperatures, and therefore resistances and voltage-divider outputs. The range is about 400 to 4000 counts. But that only saves about 10%, not a big deal. We could also cut out a hole in the middle, and fill in with a straight line segment—assuming it's a good enough fit. (Judging by the results below, this could save about 50%, not bad.).
Or we could use fewer points. If we reduce the count by a power of 2, we only need to arithmetic-shift the address. The downside is, we still need relatively many points in order to constrain the error. In fact, the oversampling rate is proportional to the derivative, and the peak derivative (found at the highest and lowest extremes) is about 0.1°C/count. For an error under 1°C, we need a sample about every 10 counts. An array of 512 bytes would about do it. Together, we could get down to about 256 bytes this way.
We could cheat this even further, by making the step size variable, say among different powers of 2. But then we need to keep track of which step sizes go where, which may take up more array size, or code complexity. It could be worthwhile in a number of applications, though. For example, you could build a tree which has different scale sizes, referenced by depth in the tree, and compacted by node and leaf structure. (Expanded to higher dimensions, similar methods give rise to BSP (Binary Space Partitioning), octrees and other juicy algorithms!)
Whereas a look-up table might be considered a zeroeth order interpolation, we can get away with fewer points if we linearly interpolate between them. Now the number of points is limited, not by the first derivative, but by the second. Since the curve isn't changing all that quickly, this will save a lot of memory! The obvious drawback is, interpolation requires division, which sucks on any platform that doesn't have a hardware division unit.
We can avoid division if we store the pre-calculated slopes in another array, but this doubles the memory requirement. On the other hand, not needing to compute a division will save some code space, perhaps balancing it. Meanwhile, the execution time goes way down! Simple time-memory tradeoffs, like these, are quite typical of computational optimizations.
If we pick points that are evenly spaced (ideally, a power of 2), we again run into the sampled derivative problem. If we sample optimally, we have to search for the nearest points, which is also very slow.
Here's a typical result, for 12 points (about 48 bytes, though it could be packed tighter), cherry-picked for the least error for arbitrarily spaced points. Note that the points are clustered closer together towards the ends, where the slope is steeper. Code is not given, but the points yielding this graph are listed.
Okay, if zeroeth and first order lookups aren't good enough, what about 2nd or higher? This is the space of quadratric and cubic splines (Bezier curves), at least for the general parametric-curve case. N+1 points are required to define each segment of an order-N curve, so we need more data per segment. But we need fewer segments, because they fit the original data that much better.
Now, whereas the linear case requires a division to calculate the slope (if we don't provide one, precalculated), the quadratic case requires a square root! This isn't a difficult thing to compute, but it is even slower. We would definitely want to precalculate these. The coordinates and coefficients (offset, linear and quadratic) together will take up more than twice as much data (per segment) as the linear case. I would expect the better fit leads to a more-than-four-fold reduction in the number of segments, so it should be a win on memory.
I haven't evaluated a solution based on this, but it would be interesting to work with. It will take some complexity in matching up the derivatives, so that the curve is CN continuous (which, to be fair, isn't a necessary constraint to this problem, and if better RMS or peak error can be had by violating it, well, there's not really anything wrong with that... it just won't look as smooth). It will definitely save on the number of points (probably only needing a dozen nodes and control points), and shouldn't be too hard to compute (only needing a few multiplications).
If we have a priority to save memory, we might reconsider the naive approach, but, cleaning it up a little bit. First, we reverse the voltage divider equation, but round it to as few bits as necessary. (It turns out, there's reasonable accuracy fitting the quotient into 16 bits. Great news for 8 and 16 bit MCUs!) Then we take the logarithm, then the reciprocal, and rescale the result (and maybe some other adjustments). What makes this physics-based is, we're taking advantage of what type of NTC thermistor this is: "high gain" NTCs are made from a random network of semiconducting crystals, whose resistance varies exponentially with reciprocal temperature (that is, as exp(1/T)). The slope of that exponent is given by the B value, which is 3380 for this part (based on its resistances at 25 and 50°C, anyway; but more on that later). At worst, a low-order Taylor series can mop up the remaining error.
Now hold on, log? Isn't that floating point? From some massive library? Exponentially slower than everything else?! Yes... if you did it the easy way. But what's the meaning of an integer logarithm, anyway?
In fact, computing the log isn't much worse than long division: it involves iterating an operation, and shifting out bits. On the most basic level, an integer log is just counting the first occupied bit position: what order of magnitude the number is. We can measure that very easily (some CPUs even come with bit-counting instructions). That gets a number 0-15, which obviously isn't accurate enough, so we need to know the fractional part of the exponent, too.
Once we've counted the bits of the input term, left-align it. Now, align your mental reading of what that number represents: we'll be using it in 1.15 fixed point. Thus, by aligning it so there's a 1 in the 1's position, we've divided out the whole-valued exponent (2exp), and are left with a 1.xxx format mantissa. Now, much as with long division, we iterate and shift. But instead of doing an arithmetic shift, we square the mantissa. On every step, shift the overflow (carry) bit into the output; also, every time carry is true, shift the mantissa right (i.e., divide by 2). Eventually the output's fractional part is complete (to as many bits as needed, 12 in this case), and the mantissa will be an accumulation of squaring and halving steps. (We can discard the mantissa at this point—or, save it as a log remainder, should we have need for such a thing.) This is, conveniently enough, the base-2 logarithm, which differs from any other log by only a constant ratio. And for a variety of computing applications, may be rather useful on its own.
So, all this is to say: while the logarithm is a pretty intensive bit of math to compute, it's not exponentially bad on a processor with hardware multiply. The AVR can compute each iteration of the log in maybe 20 cycles, so that the whole operation takes about 360 cycles for 16 bits. That's slow, but not as slow as pulling it through a floating point library, and about the same time as two divisions.
But why should condensed-matter physics be so easy? If we simply take the reciprocal (and do a linear adjustment to get it back on the right scale), we'll have the "physics based" solution. That looks like this:
Da heck...?! It's not horrendous, but... that's a lot of effort to go through, to get a 7.15°C worst case error. Why?
There's a very good reason why manufacturers specify B for the resistance ratio at two particular temperatures: it varies with temperature. In reality, the semiconductor is not one pure sample, nor are the grain boundaries always the same widths and angles and areas throughout. The usual solution is to use higher-order terms: R is a polynomial in log(1/T), and B is only the first coefficient in a power series, which usually needs two or three terms to be usefully accurate.
So we're kind of screwed by that. However, the intrepid physicist also realizes his medium is cantankerous, and prepares his trusty old bludgeon, the power series.
In fact, as long as we don't need to go too close to the axis, we can toss the 1/T behind the approximation, too. With a second order (quadratic) correction, it's pretty good (give or take 1°C), and a third order (cubic) is as good as we can hope (the rounding error at the ends is comparable).
Despite our hackery with the power series, this formula quite possibly has minimal degrees of freedom: only four parameters for the cubic. Both sets of coefficients are given below, and the cubic error is plotted.
The remaining error is high order (4th or worse)—notice the error makes four zero crossings. But we're well within the performance of the thermistor already, so this isn't worth solving.
My first instinct: those curves look suspiciously like the beginnings of hyperbolic asymptotes. Let's try a "sum of reciprocals", rational function, and see what happens. Now, this breaks the first implementation rule already: division sucks! But, maybe if we only have to do one, it won't be too bad?
I found this formula to give a good fit, using the following coefficients: (The units are provided below for reference, for all you dimensional-analysis freaks out there. "#" means a unit of ADC counts, or consider counts as dimensionless.)
|e||-0.01835||°C / #|
Note that the formula shows two divisions; these can be merged into one, at the price of three additional multiplications (a times x + d, c times x + b, x + b times x + d).
It's noteworthy that, after using a 2nd order rational equation (that is, with everything summed together on a common denominator, the highest order in the numerator or denominator is 2), there are six zero crossings in the error: suggesting that we've achieved a remarkably good fit (a 6th order residual), so this was a good guess indeed! But, with that expensive division, it's still a pretty good bit of computing to get there.
This was the first approach I implemented, giving not-terrible performance in AVR assembler. Here's the procedure. Notice the expression was rearranged for best computation; the parameters were renamed to reflect the new combinations and values. (The parameters may be slightly different values than what you'd calculate from the above list, because I've been refining the spreadsheet while writing this page, but not making updates to the code. What's in the code examples is still quite good accuracy, though.)
; ; countToTemp ; ; Converts a 12 bit ratiometric ADC count into a ; temperature (in 10.6 fixed point format, °C), ; using a rational approximation: ; T = (q * x - r) / (s + x * (t - x)) - e * x + f ; ; Executes in approximately 420 cycles. ; ; Input: ; r17:r16: 12 bit ADC count to convert ; Returns: ; r1:r0: 10.6 fixed point, temperature in °C ; countToTemp: push zh push zl push r19 ; r19 = temp push r18 ; r18 = zero or shift counter push r7 push r6 ; r7:r6 = (t - x), then q, push r5 push r4 ; r5:r4 = denominator push r3 push r2 ; r3:r2 = numerator, then quotient, ; then running sum ldi zh, HIGH(tmprtable << 1) ; pointer to coefficients ldi zl, LOW(tmprtable << 1) clr r19 ; sign flag for division ; make the denominator, (s + x*(t - x)) lpm r6, z+ lpm r7, z+ ; get t sub r6, r16 sbc r7, r17 ; r7:r6 = t - x rcall tmprmulx ; multiply r7:r6 by r17:r16, ; add PGM(Z+), result in r5:r4 ; now r5:r4 = x*(t - x) + s ; make the numerator, (q*x - r) lpm r6, z+ lpm r7, z+ ; get q push r5 push r4 rcall tmprmulx ; q*x + (-r) movw r2, r4 ; in r3:r2 pop r4 pop r5 ; save numerator sign (for following unsigned division) brpl tmprsignout ser r19 ; r19 was CLR'd earlier com r2 com r3 sub r2, r19 ; two's complement = (NOT) + 1 sbc r3, r19 ; = (NOT) - (-1) tmprsignout: ; calculate quotient ; register use: ; r1:r0 = remainder ; r3:r2 = numerator (being shifted into remainder) ; (r3 will be used for initialization to shave off 8 bits ; of zeros, so only r2 actually needs to be shifted in) ; r5:r4 = denominator (subtracting from remainder) ; r7:r6 = quotient (shifting in carries from remainder) ; r18 = shift counter clr r1 mov r0, r3 ; r1:r0 = remainder (initialize ; with 8 bits of numerator) ldi r18, 19 ; = 16 bits + 11 fixed point adjust - ; 8 already shifted lsl r2 ; shift numerator top into carry tmprdivloop: rol r0 rol r1 ; rotate into remainder sub r0, r4 sbc r1, r5 ; check remainder brcc tmprdivcc add r0, r4 adc r1, r5 ; nope, add it back (sets carry) tmprdivcc: rol r6 rol r7 ; shift carry into quotient lsl r2 ; and shift numerator into carry dec r18 ; and keep going until done brne tmprdivloop com r6 com r7 ; complemented carry was shifted in, ; so complement the quotient tst r19 brne tmprsignout2 ; invert sign com r6 com r7 sub r6, r19 ; two's complement = (NOT) - (-1) sbc r7, r19 tmprsignout2: ; make the linear term push r7 push r6 lpm r6, z+ lpm r7, z+ ; get u rcall tmprmulx ; r5:r4 = x*u + (-v) pop r0 pop r1 sub r0, r4 ; T = quot - (x*u + (-v)) sbc r1, r5 ; r1:r0 = return temperature ; (10.6 fixed point) pop r2 pop r3 pop r4 pop r5 pop r6 pop r7 pop r18 pop r19 pop zl pop zh ret tmprtable: .dw 4125 ; t .dw 4036 ; s .dw 1842 ; q .dw -21404 ; r .dw 301 ; u .dw -4693 ; v tmprmulx: mul r16, r6 ; low * low (only need top byte) mov r4, r1 clr r5 ; r5:r4 = result mul r17, r6 ; high * low add r4, r0 adc r5, r1 mul r16, r7 ; low * high add r4, r0 adc r5, r1 mul r17, r7 ; high * high (high byte should be zero) add r5, r0 lpm r0, z+ lpm r1, z+ ; get s or r add r4, r0 adc r5, r1 ; r5:r4 = denominator ret ; END PROC countToTemp
This uses about 96 words PROGMEM and 16 bytes stack. The execution speed is definitely better than other methods, but that division is a killer: 63% of the time is spent doing the division. So even just one hurts bad! In contrast, that multiply subroutine is about 30 cycles, including calling overhead. We could crank quite a few multiplies in the span of just one division.
To go with this routine, there is FixedSix.asm, which contains a routine to convert the 10.6 fixed point result into an ASCII string. I don't see it's worth copying the code inline here, but it may be useful.
So with multiplies being so (relatively) cheap, and the physicist's old friend the power series being so effective, why not go for the gold? No more screwing around, just shove the whole damn thing into a power series! (Terminology note: I probably shouldn't be calling it a "Taylor series", because that series is given by the derivatives of a continuous function around a point. We don't even have derivatives here, only a data series—at best, finite differences. Calling this a generic power series is probably most accurate. Now, if we were approximating a continuous function, like cos x, we might generate the coefficients using a Taylor series approach.)
The big problem is, polynomial functions don't fit very well to something that—as we've seen—fits better to a rational function, or even something with a transcendental operator (log) in it. This is basically rehashing the problem from earlier, where the reciprocal (1/x) function can be replaced by a power series; but it's worse, because this covers the full range of the data series, not just a relatively flat piece of curve, used in one of the steps.
There are methods to figure out what order of polynomial is necessary to fit to a data set of certain properties: range and domain, derivatives, tolerance and so on.
I... didn't use any of these. I just put in enough terms until it looked about right. Hey, I'm an engineer...
A note about solutions. All of these results have been computed using Excel's Solver plug-in. This is a fantastic tool—well, when it works, at least. But let's be fair, these are difficult problems to solve. We need to help it along wherever we can, otherwise the convergence will be slow at the very least; or, often, it just completely gives up and refuses to touch some variables.
The first thing I tried was as brutish of a power series as I could: just coefficients on powers of x. This hardly went anywhere. The sizes of coefficients are wildly different; every single parameter affects every other one, so the path of descent is extremely slow and complicated (if it even decides there's a path at all). What to do?
Talk to the approximator's best old friend: Chebyshev. Namely, there is a series of orthogonal polynomials, of increasing order, which are well-behaved on the interval [0, 1]. If we scale the problem's inputs and outputs around this domain and range, then it should be relatively trivial to adjust the coefficients of each Chebyshev polynomial (probably in the -1 to 1 range) to obtain a good fit. Bingo: each term behaves relatively independently of each other, and the solver works on very reasonably sized parameters.
The main downside is, I've burdened it unnecessarily, again, by adding rounding errors. This really hurts a solver that's expecting continuous behavior; it can adjust parameters all it wants, over a sufficiently small range, and find absolutely no derivatives (all zeroes, that is). Zero derivative, oh, that must mean it's solved, right? Excellent! ...Yyyyeah, that.
In any case, the solution still goes pretty well, as you can see:
The maximum error isn't fantastic, but it's comparable to the other methods presented here. The result is relatively smooth, which is good for working with precision temperature differences. And, the worst case is at low temperature, where worse error can be tolerated. Here's the code listing:
; ; countToTemp ; ; Converts a 12 bit ratiometric ADC count into a ; temperature (in 12.4 fixed point format, °C), ; using a 7th order polynomial approximation. ; ; 74 words, approximately 310 cycles. ; ; Input: ; r17:r16: 12 bit ADC count to convert ; Returns: ; r1:r0: 12.4 fixed point, temperature in °C ; countToTemp: push zh ; Register use: push zl ; Z --> tmprtable push r20 ; r20 = temp (in MAC) push r19 push r18 ; r19:r18 = additive constant ; (from lpm), temp push r5 ; r17:r16 = input operand push r4 push r3 ; r5:r4:r3 = MAC accumulator ; (bytes 1, 2, 3; byte 0 unused) ; input conditioning ldi r20, high(0x0800) add r17, r20 ; signed offset (r17:r16 + 0x0800) ldi r20, 4 tmprshift: lsl r16 rol r17 dec r20 ; left-align brne tmprshift ; initialize MAC (Multiply-ACcumulate) chain ldi zh, HIGH(tmprtable << 1) ; load pointer to coefficients ldi zl, LOW(tmprtable << 1) clr r5 clr r4 rcall tmprmac ; r5:r4 <= ((r5:r4 + [z+]:[z+]) rcall tmprmac ; * r17:r16) >> 16 rcall tmprmac ; the first two MACs are normal... ldi r19, 4 tmprsloop: lsl r3 ; but the third needs a correction rol r4 ; (note: r3 is still MAC accumulator rol r5 ; byte 1, get that in the shift too) dec r19 brne tmprsloop rcall tmprmac ; the remaining MACs are normal rcall tmprmac rcall tmprmac rcall tmprmac lpm r18, z+ lpm r19, z+ add r4, r18 adc r5, r19 tmprstempleave: movw r0, r4 ; add the final offset, move, ; and we're done pop r3 pop r4 pop r5 pop r18 pop r19 pop r20 pop zl pop zh ret tmprtable: ; Coefficients for Murata thermistor .dw -14285 ; coeff 0 .dw 1850 ; 1 .dw 2802 ; 2 .dw -1640 ; 3 .dw -6812 ; 4 .dw 676 ; 5 .dw -1829 ; 6 .dw 735 ; 7 ; Calculate r5:r4 <= ((r5:r4 + [z+]:[z+]) * r17:r16) >> 16. ; Takes 29 cycles (not counting rcall). tmprmac: lpm r18, z+ lpm r19, z+ add r18, r4 ; r19:r18 = multiplicand, adc r19, r5 ; r17:r16 = multiplier (input) ; Using r5:r4:r3 = accumulator (bytes 1-3, byte 0 not needed), r20 = zero clr r20 muls r17, r19 ; high * high (only high bytes movw r4, r0 ; are signed) mul r16, r18 ; low * low = byte 1:byte 0 mov r3, r1 ; only need byte 1 mulsu r19, r16 ; high * low sbc r5, r20 ; sign extend hack add r3, r0 adc r4, r1 adc r5, r20 mulsu r17, r18 ; low * high sbc r5, r20 ; sign extend hack add r3, r0 adc r4, r1 adc r5, r20 ret ; END PROC countToTemp
As you can see, it's a lot cleaner than the rational approximation. About 27 words go towards piddly house-keeping actions; the rest either goes to multiplication (20 words), the coefficients (8 words), or the program of multiplying coefficients and shifting bits around as needed (21 words). If the intermediate bit-shifts could be avoided, the MAC chain would be more compact, in a loop. Instead, since it's three and four RCALLs, I decided to leave it unrolled.
To go with, there's also FixedFour.asm, which converts the 12.4 fixed point result into an ASCII string. Just to mix things up, it uses a fixed point hack to implement the division-by-10 steps: dividing by a constant is equivalent to multiplying by a constant divided by the radix (i.e., shifted over a bunch). Downside is, it takes more multiplication—which goes plenty fast on this platform, but ends up using more instructions. A cleaner pattern, or a loop or something, is probably possible.
An Excel® spreadsheet is available here: Thermistor_Power_Series.xlsx which shows how the coefficients are obtained. An exact numerical simulation of the program (i.e., including rounding) is used for best results.
Closed-Form Solutions: this spreadsheet could be greatly simplified by getting rid of the need for Solver at all. Going back to the discussion of Taylor series, one could consider the data series as a sampling of a real function, and therefore approximate derivatives on it by using suitably smoothed difference functions on the data. This is notoriously tricky—in signal processing, you want to avoid differencing samples, because whereas the signal changes gradually from sample to sample, random noise is a random chance per sample. You can filter the noise, but this fundamentally changes the function: a low-pass filter is a convolution, which produces a different Taylor series. (Offhand, I don't know what the effect is, when taking the Taylor series of a function convolved with a filter function. This appears to be an "interesting" question.)
In any case, there are existing solutions for deriving the best-fit polynomial from a data set. If it is necessary to obtain these coefficients from an arbitrary table (say, making it possible to use multiple different sensors, or apply calibration), such a method could be used.
Back to Calculator Index