shortening analogRead()

In order to ensure the correct order of reading the registers isn't it sufficient to use the following:

 return ADCL | (ADCH << 8);

This eliminates two uint vars and two stores and two reads in wiring_analog.c

#if defined(ADCSRA) && defined(ADCL)
...
//	low  = ADCL;
//	high = ADCH;
//	return (high << 8) | low;
        return (ADCL | ADCH<<8);
#else
	// we dont have an ADC, return 0
//	low  = 0;
//	high = 0;
        return(0);
#endif

Is there some compiler optimisation that could mess that up ?

	// without a delay, we seem to read from the wrong channel
	delay(1);

Huh? That's already five times slower than sleeping the chip for the full conversion time.

Datasheet says changing Mux after starting a conversion will only take effect for the next conversion, which seems logical. I dont see how else it can get the "wrong" channel.

:~

I'm not certain the "optimized" version will necessarily read the registers in the correct order. In any case this code:

int foo ()
{
uint8_t low, high;
 
  low  = ADCL;
  high = ADCH;

  return (high << 8) | low;
}

void setup() {  
}

void loop() {
  int a = foo ();

  Serial.println (a);
}

Generates:

void loop();
int foo ()
{
uint8_t low, high;
 
  low  = ADCL;
  c0:	80 91 78 00 	lds	r24, 0x0078
  high = ADCH;
  c4:	20 91 79 00 	lds	r18, 0x0079
}

void loop() {
  int a = foo ();

  Serial.println (a);
  c8:	72 2f       	mov	r23, r18
  ca:	60 e0       	ldi	r22, 0x00	; 0
  cc:	90 e0       	ldi	r25, 0x00	; 0
  ce:	68 2b       	or	r22, r24
  d0:	79 2b       	or	r23, r25
  d2:	88 e9       	ldi	r24, 0x98	; 152
  d4:	91 e0       	ldi	r25, 0x01	; 1
  d6:	4a e0       	ldi	r20, 0x0A	; 10
  d8:	50 e0       	ldi	r21, 0x00	; 0
  da:	0e 94 f9 02 	call	0x5f2	; 0x5f2 <_ZN5Print7printlnEii>
}
  de:	08 95       	ret

The entire function call is optimized away, the two variables are optimized away, so it can't get much better than that.

My understanding is that it does, so can be coded as

int foo ()
{
  return ADC;
}

It appears you are correct, but I don't know if the compiler would necessarily guarantee that the code will always be generated this way:

Code:

int foo ()
{
 return ADC;
}

void setup() {  
}

void loop() {
  int a = foo ();

  Serial.println (a);
}

Generates:

000000c0 <loop>:
  c0:	80 91 78 00 	lds	r24, 0x0078
  c4:	20 91 79 00 	lds	r18, 0x0079
  c8:	72 2f       	mov	r23, r18
  ca:	60 e0       	ldi	r22, 0x00	; 0
  cc:	90 e0       	ldi	r25, 0x00	; 0
  ce:	68 2b       	or	r22, r24
  d0:	79 2b       	or	r23, r25
  d2:	88 e9       	ldi	r24, 0x98	; 152
  d4:	91 e0       	ldi	r25, 0x01	; 1
  d6:	4a e0       	ldi	r20, 0x0A	; 10
  d8:	50 e0       	ldi	r21, 0x00	; 0
  da:	0e 94 f9 02 	call	0x5f2	; 0x5f2 <_ZN5Print7printlnEii>
  de:	08 95       	ret

I was thinking so because of a statement in section 16.3 of the ATmega328 datasheet,

Note that when using “C”, the compiler handles the 16-bit access.

But maybe that's assuming too much, as (a) that section of the datasheet is about timer registers (so does the same thing apply to other 16-bit registers), and (b) it references a "C" compiler, but I'm not aware if a specific C compiler is being referred to. Seem a bit odd to make such a statement and not reference a specific compiler. Maybe all C compilers do this? That'd be great but not sure it's something I'd absolutely bank on.

OTOH, I've been assuming the compiler does guarantee it, and coding accordingly, and haven't been bitten to my knowledge, but not sure if I've tried anything other than timer registers. I did code ADC that way once, but it was only a one-time test.

I agree that statement does not have a rigorous meaning without the actual compiler being specified so it has to be taken with a pinch of salt.

Nicks simplified example based on the library code allows the function call to be removed but the shifting and the or-ing still needs to be done . This ends up taking quite a few more instructions.

int foo ()
{
uint8_t low, high;
 
  low  = ADCL;
  c0:	80 91 78 00 	lds	r24, 0x0078
  high = ADCH;
  c4:	20 91 79 00 	lds	r18, 0x0079
}

void loop() {
  int a = foo ();

  Serial.println (a);
  c8:	72 2f       	mov	r23, r18
  ca:	60 e0       	ldi	r22, 0x00	; 0
  cc:	90 e0       	ldi	r25, 0x00	; 0
  ce:	68 2b       	or	r22, r24
  d0:	79 2b       	or	r23, r25

Oddly , the way I'd coded it without the explicit variables produces equally inefficient code:

2b6: 20 91 78 00 lds r18, 0x0078
2ba: 40 91 79 00 lds r20, 0x0079
2be: 94 2f mov r25, r20
2c0: 80 e0 ldi r24, 0x00 ; 0
2c2: 30 e0 ldi r19, 0x00 ; 0
2c4: 28 2b or r18, r24
2c6: 39 2b or r19, r25
2c8: c9 01 movw r24, r18
2ca: 1f 91 pop r17
2cc: 08 95 ret

Andy's code seems to produce the shortest result:

     2b6:       20 91 78 00     lds     r18, 0x0078
     2ba:       30 91 79 00     lds     r19, 0x0079
     2be:       c9 01           movw    r24, r18
     2c0:       1f 91           pop     r17
     2c2:       08 95           ret

digging a bit more finds ADC is volatile uint16_t which saves manipulating all the 8 bit values.

#define ADC     _SFR_MEM16(0x78)
#define _SFR_MEM16(mem_addr) _MMIO_WORD(mem_addr)
#define _MMIO_WORD(mem_addr) (*(volatile uint16_t *)(mem_addr))

So it seems it is just down to the compiler "knowing" that those volatiles need reading in a special order.

Thanks Andy, that's trimmed a bit of dead wood from my sampling loop.