[PATCH 2/4] Consolidate and unify state change handling

Discussion:

Andri Yngvason

2014-09-18 16:38:29 UTC

Signed-off-by: Andri Yngvason <***@marel.com>
---
drivers/net/can/flexcan.c | 66
+++--------------------------------------------
1 file changed, 3 insertions(+), 63 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 2700865..96a0755 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -559,74 +559,15 @@ static int flexcan_poll_bus_err(struct net_device
*dev, u32 reg_esr)
static void do_state(struct net_device *dev,
struct can_frame *cf, enum can_state new_state)
{
- struct flexcan_priv *priv = netdev_priv(dev);
struct can_berr_counter bec;

__flexcan_get_berr_counter(dev, &bec);

- switch (priv->can.state) {
- case CAN_STATE_ERROR_ACTIVE:
- /*
- * from: ERROR_ACTIVE
- * to : ERROR_WARNING, ERROR_PASSIVE, BUS_OFF
- * => : there was a warning int
- */
- if (new_state >= CAN_STATE_ERROR_WARNING &&
- new_state <= CAN_STATE_BUS_OFF) {
- netdev_dbg(dev, "Error Warning IRQ\n");
- priv->can.can_stats.error_warning++;
-
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_TX_WARNING :
- CAN_ERR_CRTL_RX_WARNING;
- }
- case CAN_STATE_ERROR_WARNING: /* fallthrough */
- /*
- * from: ERROR_ACTIVE, ERROR_WARNING
- * to : ERROR_PASSIVE, BUS_OFF
- * => : error passive int
- */
- if (new_state >= CAN_STATE_ERROR_PASSIVE &&
- new_state <= CAN_STATE_BUS_OFF) {
- netdev_dbg(dev, "Error Passive IRQ\n");
- priv->can.can_stats.error_passive++;
-
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_TX_PASSIVE :
- CAN_ERR_CRTL_RX_PASSIVE;
- }
- break;
- case CAN_STATE_BUS_OFF:
- netdev_err(dev, "BUG! "
- "hardware recovered automatically from BUS_OFF\n");
- break;
- default:
- break;
- }
+ can_change_state(dev, cf, new_state,
+ can_get_err_dir(bec.rxerr, bec.txerr));

- /* process state changes depending on the new state */
- switch (new_state) {
- case CAN_STATE_ERROR_WARNING:
- netdev_dbg(dev, "Error Warning\n");
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_TX_WARNING :
- CAN_ERR_CRTL_RX_WARNING;
- break;
- case CAN_STATE_ERROR_ACTIVE:
- netdev_dbg(dev, "Error Active\n");
- cf->can_id |= CAN_ERR_PROT;
- cf->data[2] = CAN_ERR_PROT_ACTIVE;
- break;
- case CAN_STATE_BUS_OFF:
- cf->can_id |= CAN_ERR_BUSOFF;
+ if (unlikely(new_state == CAN_STATE_BUS_OFF))
can_bus_off(dev);
- break;
- default:
- break;
- }
}

static int flexcan_poll_state(struct net_device *dev, u32 reg_esr)
@@ -658,7 +599,6 @@ static int flexcan_poll_state(struct net_device
*dev, u32 reg_esr)
return 0;

do_state(dev, cf, new_state);
- priv->can.state = new_state;
netif_receive_skb(skb);

dev->stats.rx_packets++;
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Andri Yngvason

2014-09-18 16:25:54 UTC

Permalink

Wolfgang Grandegger

2014-09-19 21:10:48 UTC

Permalink

Post by Andri Yngvason
---
drivers/net/can/flexcan.c | 66
+++--------------------------------------------
1 file changed, 3 insertions(+), 63 deletions(-)
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 2700865..96a0755 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -559,74 +559,15 @@ static int flexcan_poll_bus_err(struct net_device
*dev, u32 reg_esr)
static void do_state(struct net_device *dev,
struct can_frame *cf, enum can_state new_state)
{
- struct flexcan_priv *priv = netdev_priv(dev);
struct can_berr_counter bec;
__flexcan_get_berr_counter(dev, &bec);
- switch (priv->can.state) {
- /*
- * from: ERROR_ACTIVE
- * to : ERROR_WARNING, ERROR_PASSIVE, BUS_OFF
- * => : there was a warning int
- */
- if (new_state >= CAN_STATE_ERROR_WARNING &&
- new_state <= CAN_STATE_BUS_OFF) {
- netdev_dbg(dev, "Error Warning IRQ\n");
- priv->can.can_stats.error_warning++;
-
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;

Hm, can_change_state() handles the equal case differently. In the
SJA1000 manual I found:

"Errors detected during reception or transmission will affect the error
counters according to the CAN 2.0B protocol
specification. The error status bit is set when at least one of the
error counters has reached or exceeded the CPU
warning limit of 96. An error interrupt is generated, if enabled."

If both are equal we do not known if rx or tx has caused the state
change and therefore setting "CAN_ERR_CRTL_TX_WARNING |
CAN_ERR_CRTL_RX_WARNING" seems more logical, indeed. But maybe it simply
does not happen. Any other opinions?

Post by Andri Yngvason
- }
- case CAN_STATE_ERROR_WARNING: /* fallthrough */
- /*
- * from: ERROR_ACTIVE, ERROR_WARNING
- * to : ERROR_PASSIVE, BUS_OFF
- * => : error passive int
- */
- if (new_state >= CAN_STATE_ERROR_PASSIVE &&
- new_state <= CAN_STATE_BUS_OFF) {
- netdev_dbg(dev, "Error Passive IRQ\n");
- priv->can.can_stats.error_passive++;
-
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_PASSIVE;
- }
- break;
- netdev_err(dev, "BUG! "
- "hardware recovered automatically from BUS_OFF\n");
- break;
- break;
- }
+ can_change_state(dev, cf, new_state,
+ can_get_err_dir(bec.rxerr, bec.txerr));

Saves a lot of lines :).

Post by Andri Yngvason
- /* process state changes depending on the new state */
- switch (new_state) {
- netdev_dbg(dev, "Error Warning\n");
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;
- break;
- netdev_dbg(dev, "Error Active\n");
- cf->can_id |= CAN_ERR_PROT;
- cf->data[2] = CAN_ERR_PROT_ACTIVE;
- break;
- cf->can_id |= CAN_ERR_BUSOFF;
+ if (unlikely(new_state == CAN_STATE_BUS_OFF))
can_bus_off(dev);
- break;
- break;
- }
}
static int flexcan_poll_state(struct net_device *dev, u32 reg_esr)
@@ -658,7 +599,6 @@ static int flexcan_poll_state(struct net_device
*dev, u32 reg_esr)
return 0;
do_state(dev, cf, new_state);
- priv->can.state = new_state;
netif_receive_skb(skb);
dev->stats.rx_packets++;
--

To validate the correct behaviour could you please send messages while
the cable is disconnected. Then reconnect the cable and see how the
error state decreases. You can monitor the behaviour with ""candump -td
-e any,0:0,#FFFFFFFF" in another shell.

Wolfgang.

--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Andri Yngvason

2014-09-21 14:47:05 UTC

Permalink

Post by Wolfgang Grandegger

---
...
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;

Hm, can_change_state() handles the equal case differently. In the
"Errors detected during reception or transmission will affect the err=

Post by Wolfgang Grandegger
counters according to the CAN 2.0B protocol
specification. The error status bit is set when at least one of the
error counters has reached or exceeded the CPU
warning limit of 96. An error interrupt is generated, if enabled."
If both are equal we do not known if rx or tx has caused the state
change and therefore setting "CAN_ERR_CRTL_TX_WARNING |
CAN_ERR_CRTL_RX_WARNING" seems more logical, indeed. But maybe it sim=

ply

Post by Wolfgang Grandegger
does not happen. Any other opinions?

I think that not specifically handling the equal case would be wrong. L=
et's
consider the following sequence of events:
* txerr reaches warning level
* rxerr reaches warning level
If they are both equal at this point, you will only get a second
CAN_ERR_CRTL_TX_WARNING in the current implementation, whereas in the=20
proposed
implementation, the user would get
CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING and because the user=20
can know
the prior error state message, he can find out which state actually cha=
nged.

But this is all based on the premise that txerr hasn't progressed since=
=2E=20
In fact,
because we cannot assume that txerr stays in place until rxerr catches=20
up, this
is what we should be doing:
enum can_state errcount_to_state(unsigned int count)
{
if (unlikely(count > 127))
return CAN_STATE_ERROR_PASSIVE;

if (unlikely(count > 96))
return CAN_STATE_ERROR_WARNING;

return CAN_STATE_ERROR_ACTIVE;
}

enum can_err_dir can_get_err_dir(unsigned int txerr, unsigned int rxerr=
)
{
enum can_err_dir dir;

enum can_state tx_state =3D errcount_to_state(txerr);
enum can_state rx_state =3D errcount_to_state(rxerr);

if (tx_state > rx_state)
return CAN_ERR_DIR_TX;

if (tx_state < rx_state)
return CAN_ERR_DIR_RX;

return CAN_ERR_DIR_TX | CAN_ERR_DIR_RX;
}

However, now that we've introduced errcount_to_state(), it seems to me=20
that it would
be simpler to dump the proposed CAN_ERR_DIR enum in favour of passing=20
the two states
directly to can_change_state().

Post by Wolfgang Grandegger

- }
- case CAN_STATE_ERROR_WARNING: /* fallthrough */
- /*
- * from: ERROR_ACTIVE, ERROR_WARNING
- * to : ERROR_PASSIVE, BUS_OFF
- * =3D> : error passive int
- */
- if (new_state >=3D CAN_STATE_ERROR_PASSIVE &&
- new_state <=3D CAN_STATE_BUS_OFF) {
- netdev_dbg(dev, "Error Passive IRQ\n");
- priv->can.can_stats.error_passive++;
-
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_PASSIVE;
- }
- break;
- netdev_err(dev, "BUG! "
- "hardware recovered automatically from BUS_OFF\n");
- break;
- break;
- }
+ can_change_state(dev, cf, new_state,
+ can_get_err_dir(bec.rxerr, bec.txerr));

Saves a lot of lines :).

Indeed ;)

Post by Wolfgang Grandegger

- /* process state changes depending on the new state */
- switch (new_state) {
- netdev_dbg(dev, "Error Warning\n");
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?

To validate the correct behaviour could you please send messages whil=

Post by Wolfgang Grandegger
the cable is disconnected. Then reconnect the cable and see how the
error state decreases. You can monitor the behaviour with ""candump -=

Post by Wolfgang Grandegger
-e any,0:0,#FFFFFFFF" in another shell.

I'm using PCAN-USB Pro to generate errors on the bus. It works quite we=
ll.
I can generate tx errors by sending from the device and then have the p=
can
ruin a few frames. rx errors can be generated by having an other device=
on
the bus outputting random data and then let the pcan corrupt the frames=
=2E

Sadly the error generation mechanism only works on windows. :(

I've tried the "disconnected cable" method too in the past. It usually
puts mscan into bus-off quite fast.

Thanks for the comments!

Andri
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Wolfgang Grandegger

2014-09-21 15:30:48 UTC

Permalink

=20

Post by Wolfgang Grandegger

---
...
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;

Hm, can_change_state() handles the equal case differently. In the
"Errors detected during reception or transmission will affect the er=

ror

mply

Post by Wolfgang Grandegger
does not happen. Any other opinions?

I think that not specifically handling the equal case would be wrong.=

Let's

* txerr reaches warning level
* rxerr reaches warning level
If they are both equal at this point, you will only get a second
CAN_ERR_CRTL_TX_WARNING in the current implementation, whereas in the
proposed
implementation, the user would get
CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING and because the use=

can know
the prior error state message, he can find out which state actually
changed.

The question is what error (rx or tx) error did triger the error state
change interrupt. I doubt that such an interrupt is triggered when one
error counter catches up, .e.g. txer was > 128 and rxerr exceeded 128.
It's even not sure that all the controllers act the same way. Therefore
also keeping the current behaviour would be fine for me.

But this is all based on the premise that txerr hasn't progressed sin=

ce.

In fact,
because we cannot assume that txerr stays in place until rxerr catche=

up, this
enum can_state errcount_to_state(unsigned int count)
{
if (unlikely(count > 127))
return CAN_STATE_ERROR_PASSIVE;
=20
if (unlikely(count > 96))
return CAN_STATE_ERROR_WARNING;
=20
return CAN_STATE_ERROR_ACTIVE;
}
=20
enum can_err_dir can_get_err_dir(unsigned int txerr, unsigned int rxe=

rr)

{
enum can_err_dir dir;
=20
enum can_state tx_state =3D errcount_to_state(txerr);
enum can_state rx_state =3D errcount_to_state(rxerr);
=20
if (tx_state > rx_state)
return CAN_ERR_DIR_TX;
=20
if (tx_state < rx_state)
return CAN_ERR_DIR_RX;
=20
return CAN_ERR_DIR_TX | CAN_ERR_DIR_RX;
}
=20
However, now that we've introduced errcount_to_state(), it seems to m=

that it would
be simpler to dump the proposed CAN_ERR_DIR enum in favour of passing
the two states
directly to can_change_state().

D'accord.

Post by Wolfgang Grandegger

Saves a lot of lines :).

Indeed ;)

Post by Wolfgang Grandegger

- /* process state changes depending on the new state */
- switch (new_state) {
- netdev_dbg(dev, "Error Warning\n");
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?

To validate the correct behaviour could you please send messages whi=

Post by Wolfgang Grandegger
the cable is disconnected. Then reconnect the cable and see how the
error state decreases. You can monitor the behaviour with ""candump =

-td

Post by Wolfgang Grandegger
-e any,0:0,#FFFFFFFF" in another shell.

I'm using PCAN-USB Pro to generate errors on the bus. It works quite =

well.

I can generate tx errors by sending from the device and then have the=

pcan

ruin a few frames. rx errors can be generated by having an other devi=

ce on

the bus outputting random data and then let the pcan corrupt the fram=

es.

Short-circuiting the CAN low and high lines is a simple method to

Sadly the error generation mechanism only works on windows. :(
=20
I've tried the "disconnected cable" method too in the past. It usuall=

puts mscan into bus-off quite fast.

Sending a message whithout cable should never trigger an bus-off. The t=
x
error counter never exceeds 128.

Here is an example output of "candump -candump -td -e any,0:0,#FFFFFFFF=
"
for a recovery from error passive state due to no ack/cable (reconnect
after 5s) for a SJA1000 on an on EMS PCI card:

(000.201913) can0 1C [0]
(000.212241) can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
controller-problem{tx-error-warning}
state-change{tx-error-warning}
error-counter-tx-rx{{96}{0}}
(000.003544) can0 20000204 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
state-change{tx-error-passive}
error-counter-tx-rx{{128}{0}}
(004.901842) can0 1D [7] 1D F6 33 52 31 4B DE
(000.000116) can0 20000200 [8] 00 08 00 00 00 00 7F 00 ERRORFRAME
state-change{tx-error-warning}
error-counter-tx-rx{{127}{0}}
(000.000678) can0 1E [6] 42 05 14 82 23 B6
...
(000.201927) can0 49 [4] 2F 1A 97 25
(000.000096) can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
state-change{back-to-error-active}
error-counter-tx-rx{{95}{0}}
(000.202184) can0 4A [8] 7F 87 0E FE 03 BA 78 91

This is from my related patch-set.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Andri Yngvason

2014-09-21 17:27:28 UTC

Permalink

Post by Wolfgang Grandegger

---
...
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;

Hm, can_change_state() handles the equal case differently. In the
"Errors detected during reception or transmission will affect the e=

rror

imply

Post by Wolfgang Grandegger
does not happen. Any other opinions?

I think that not specifically handling the equal case would be wrong=

=2E Let's

* txerr reaches warning level
* rxerr reaches warning level
If they are both equal at this point, you will only get a second
CAN_ERR_CRTL_TX_WARNING in the current implementation, whereas in th=

proposed
implementation, the user would get
CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING and because the us=

can know
the prior error state message, he can find out which state actually
changed.

The question is what error (rx or tx) error did triger the error stat=

change interrupt. I doubt that such an interrupt is triggered when on=

error counter catches up, .e.g. txer was > 128 and rxerr exceeded 128=

=2E

It's even not sure that all the controllers act the same way. Therefo=

also keeping the current behaviour would be fine for me.

Also, because of the state !=3D priv->state assert, the equal case won'=
t=20
happen
when the state increases, but it might happen when it goes down. Perhap=
s
that should be changed?

But in the case where the state goes down, there will definitely be an
interrupt generated. E.g. rx_state =3D warn, tx_state =3D passive and t=
hen when
tx_state -> warn, we will have the controller's state go to warn from=20
passive,
and then rx_state =3D=3D tx_state. So, if we only want to send which st=
ate
changed, we actually have to keep copies of each counter's current (las=
t)
state, as is done in priv->state, for the whole controller.

I think it would be easier, simpler and more useful to just send the=20
current,
state of each counter whenever the state changes. Consider this:
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 02492d2..6199571 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -273,6 +273,118 @@ static int can_get_bittiming(struct net_device=20
*dev, struct can_bittiming *bt,
return err;
}

+static void can_update_error_counters(enum can_state new_state)
+{
+ if (state < priv->state)
+ return;
+
+ switch (new_state) {
+ case CAN_STATE_ERROR_ACTIVE:
+ netdev_warn(dev, "%s: oops, did we come from a state less than=
=20
error-active?",
+ __func__);
+ break;
=2E..
+}
+
+static int can_txstate_to_frame(enum can_state state)
+{
+ switch(state)
+ {
+ case CAN_STATE_ERROR_ACTIVE:
+ return CAN_ERR_CRTL_TX_ACTIVE;
=2E..
+}
+
+static int can_rxstate_to_frame(enum can_state state)
+{
+ switch(state)
+ {
+ case CAN_STATE_ERROR_ACTIVE:
+ return CAN_ERR_CRTL_RX_ACTIVE;
=2E..
+}
+
+void can_change_state(struct net_device *dev, struct can_frame *cf,
+ enum can_state new_state, enum can_state tx_state,
+ enum can_state rx_state)
+{
+ struct can_priv *priv =3D netdev_priv(dev);
+
+ if (unlikely(state =3D=3D priv->state)) {
+ netdev_warn(dev, "%s: oops, state did not change", __func__);
+ return;
+ }
+
+ can_update_error_counters(new_state);
+
+ if (unlikely(state =3D=3D CAN_STATE_BUS_OFF)) {
+ cf->can_id |=3D CAN_ERR_BUSOFF;
+ } else {
+ cf->can_id |=3D CAN_ERR_CRTL;
+ /* Absolute: */
+ cf->data[1] |=3D can_txstate_to_frame(tx_state)
+ | can_rxstate_to_frame(rx_state);
+ /* Alternatively, the difference:
+ * if (tx_state > rx_state)
+ * cf->data[1] |=3D can_txstate_to_frame(tx_state);
+ * if (tx_state < rx_state)
+ * cf->data[1] |=3D can_rxstate_to_frame(rx_state);
+ * else
+ * cf->data[1] |=3D can_txstate_to_frame(tx_state)
+ * | can_rxstate_to_frame(rx_state);
+ * Or even, disregarding the equal case:
+ * cf->data[1] |=3D (tx_state > rx_state) ?
+ * can_txstate_to_frame(tx_state) :
+ * can_rxstate_to_frame(rx_state);
+ */
+
+ }
+
+ priv->state =3D state;
+}
+EXPORT_SYMBOL_GPL(can_change_state);
+
/*
* Local echo of CAN messages
*

But this is all based on the premise that txerr hasn't progressed si=

nce.

In fact,
because we cannot assume that txerr stays in place until rxerr catch=

up, this
enum can_state errcount_to_state(unsigned int count)
{
if (unlikely(count > 127))
return CAN_STATE_ERROR_PASSIVE;
if (unlikely(count > 96))
return CAN_STATE_ERROR_WARNING;
return CAN_STATE_ERROR_ACTIVE;
}
enum can_err_dir can_get_err_dir(unsigned int txerr, unsigned int rx=

err)

{
enum can_err_dir dir;
enum can_state tx_state =3D errcount_to_state(txerr);
enum can_state rx_state =3D errcount_to_state(rxerr);
if (tx_state > rx_state)
return CAN_ERR_DIR_TX;
if (tx_state < rx_state)
return CAN_ERR_DIR_RX;
return CAN_ERR_DIR_TX | CAN_ERR_DIR_RX;
}
However, now that we've introduced errcount_to_state(), it seems to =

that it would
be simpler to dump the proposed CAN_ERR_DIR enum in favour of passin=

the two states
directly to can_change_state().

D'accord.

Post by Wolfgang Grandegger
To validate the correct behaviour could you please send messages wh=

ile

Post by Wolfgang Grandegger
the cable is disconnected. Then reconnect the cable and see how the
error state decreases. You can monitor the behaviour with ""candump=

-td

Post by Wolfgang Grandegger
-e any,0:0,#FFFFFFFF" in another shell.

I'm using PCAN-USB Pro to generate errors on the bus. It works quite=

well.

I can generate tx errors by sending from the device and then have th=

e pcan

ruin a few frames. rx errors can be generated by having an other dev=

ice on

the bus outputting random data and then let the pcan corrupt the fra=

mes.

Short-circuiting the CAN low and high lines is a simple method to

Ahh, yes, I tried that too. That's what triggered bus-off. I got it=20
mixed up in
my head. :)

Sadly the error generation mechanism only works on windows. :(
I've tried the "disconnected cable" method too in the past. It usual=

puts mscan into bus-off quite fast.

Sending a message whithout cable should never trigger an bus-off. The=

error counter never exceeds 128.
Here is an example output of "candump -candump -td -e any,0:0,#FFFFFF=

=46F"

for a recovery from error passive state due to no ack/cable (reconnec=

(000.201913) can0 1C [0]
(000.212241) can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFR=

AME

controller-problem{tx-error-warning}
state-change{tx-error-warning}
error-counter-tx-rx{{96}{0}}
(000.003544) can0 20000204 [8] 00 20 00 00 00 00 80 00 ERRORFR=

AME

controller-problem{tx-error-passive}
state-change{tx-error-passive}
error-counter-tx-rx{{128}{0}}
(004.901842) can0 1D [7] 1D F6 33 52 31 4B DE
(000.000116) can0 20000200 [8] 00 08 00 00 00 00 7F 00 ERRORFR=

AME

state-change{tx-error-warning}
error-counter-tx-rx{{127}{0}}
(000.000678) can0 1E [6] 42 05 14 82 23 B6
...
(000.201927) can0 49 [4] 2F 1A 97 25
(000.000096) can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFR=

AME

state-change{back-to-error-active}
error-counter-tx-rx{{95}{0}}
(000.202184) can0 4A [8] 7F 87 0E FE 03 BA 78 91
This is from my related patch-set.

Okay, I'll try that but the -e flag won't help much because candump exp=
ects
the PROT abuse.

Andri

PS.: I must admit that I don't actually know why it's useful to know wh=
ich
error counter changed; tx or rx. I think it would be much simpler to se=
nd
the max of both and be done with it. Can anyone point out a case where =
this
helps?
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Wolfgang Grandegger

2014-09-23 20:33:48 UTC

Permalink

=20

Post by Wolfgang Grandegger

---
...
- cf->can_id |=3D CAN_ERR_CRTL;
- cf->data[1] =3D (bec.txerr > bec.rxerr) ?
- CAN_ERR_CRTL_RX_WARNING;

Hm, can_change_state() handles the equal case differently. In the
"Errors detected during reception or transmission will affect the =

error

Post by Wolfgang Grandegger
counters according to the CAN 2.0B protocol
specification. The error status bit is set when at least one of th=

Post by Wolfgang Grandegger
error counters has reached or exceeded the CPU
warning limit of 96. An error interrupt is generated, if enabled."
If both are equal we do not known if rx or tx has caused the state
change and therefore setting "CAN_ERR_CRTL_TX_WARNING |
CAN_ERR_CRTL_RX_WARNING" seems more logical, indeed. But maybe it simply
does not happen. Any other opinions?

I think that not specifically handling the equal case would be wron=

Let's
* txerr reaches warning level
* rxerr reaches warning level
If they are both equal at this point, you will only get a second
CAN_ERR_CRTL_TX_WARNING in the current implementation, whereas in t=

proposed
implementation, the user would get
CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING and because the u=

ser

can know
the prior error state message, he can find out which state actually
changed.

The question is what error (rx or tx) error did triger the error sta=

change interrupt. I doubt that such an interrupt is triggered when o=

error counter catches up, .e.g. txer was > 128 and rxerr exceeded 12=

It's even not sure that all the controllers act the same way. Theref=

ore

also keeping the current behaviour would be fine for me.

Also, because of the state !=3D priv->state assert, the equal case wo=

n't

happen
when the state increases, but it might happen when it goes down. Perh=

aps

that should be changed?
=20
But in the case where the state goes down, there will definitely be a=

interrupt generated. E.g. rx_state =3D warn, tx_state =3D passive and=

then when

tx_state -> warn, we will have the controller's state go to warn from
passive,
and then rx_state =3D=3D tx_state. So, if we only want to send which =

state

changed, we actually have to keep copies of each counter's current (l=

ast)

state, as is done in priv->state, for the whole controller.

Well, that's definitely to sophisticated

I think it would be easier, simpler and more useful to just send the
current,
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 02492d2..6199571 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -273,6 +273,118 @@ static int can_get_bittiming(struct net_device
*dev, struct can_bittiming *bt,
return err;
}
=20
+static void can_update_error_counters(enum can_state new_state)
+{
+ if (state < priv->state)
+ return;
+
+ switch (new_state) {
+ netdev_warn(dev, "%s: oops, did we come from a state less th=

error-active?",
+ __func__);
+ break;
...
+}
+
+static int can_txstate_to_frame(enum can_state state)
+{
+ switch(state)
+ {
+ return CAN_ERR_CRTL_TX_ACTIVE;
...
+}
+
+static int can_rxstate_to_frame(enum can_state state)
+{
+ switch(state)
+ {
+ return CAN_ERR_CRTL_RX_ACTIVE;
...
+}
+
+void can_change_state(struct net_device *dev, struct can_frame *cf,
+ enum can_state new_state, enum can_state tx_state,
+ enum can_state rx_state)
+{
+ struct can_priv *priv =3D netdev_priv(dev);
+
+ if (unlikely(state =3D=3D priv->state)) {
+ netdev_warn(dev, "%s: oops, state did not change", __func__)=

;

+ return;
+ }
+
+ can_update_error_counters(new_state);
+
+ if (unlikely(state =3D=3D CAN_STATE_BUS_OFF)) {
+ cf->can_id |=3D CAN_ERR_BUSOFF;
+ } else {
+ cf->can_id |=3D CAN_ERR_CRTL;
+ /* Absolute: */
+ cf->data[1] |=3D can_txstate_to_frame(tx_state)
+ | can_rxstate_to_frame(rx_state);
+ * if (tx_state > rx_state)
+ * cf->data[1] |=3D can_txstate_to_frame(tx_state);
+ * if (tx_state < rx_state)
+ * cf->data[1] |=3D can_rxstate_to_frame(rx_state);
+ * else
+ * cf->data[1] |=3D can_txstate_to_frame(tx_state)
+ * | can_rxstate_to_frame(rx_state);
+ * cf->data[1] |=3D (tx_state > rx_state) ?
+ * can_rxstate_to_frame(rx_state);
+ */
+
+ }
+
+ priv->state =3D state;
+}
+EXPORT_SYMBOL_GPL(can_change_state);
+
/*
* Local echo of CAN messages
*

=46or simplicity, I vote for setting (CAN_ERR_CRTL_TX_WARNING |
CAN_ERR_CRTL_RX_WARNING) if the tx and rx error counters a equal.

But this is all based on the premise that txerr hasn't progressed s=

ince.

In fact,
because we cannot assume that txerr stays in place until rxerr catc=

hes

xerr)

that it would
be simpler to dump the proposed CAN_ERR_DIR enum in favour of passi=

the two states
directly to can_change_state().

D'accord.

Post by Wolfgang Grandegger
To validate the correct behaviour could you please send messages w=

hile

Post by Wolfgang Grandegger
the cable is disconnected. Then reconnect the cable and see how th=

Post by Wolfgang Grandegger
error state decreases. You can monitor the behaviour with ""candum=

p -td

Post by Wolfgang Grandegger
-e any,0:0,#FFFFFFFF" in another shell.

I'm using PCAN-USB Pro to generate errors on the bus. It works quit=

well.
I can generate tx errors by sending from the device and then have t=

pcan
ruin a few frames. rx errors can be generated by having an other
device on
the bus outputting random data and then let the pcan corrupt the fr=

ames.

Short-circuiting the CAN low and high lines is a simple method to

Ahh, yes, I tried that too. That's what triggered bus-off. I got it
mixed up in
my head. :)

Sadly the error generation mechanism only works on windows. :(
I've tried the "disconnected cable" method too in the past. It usua=

lly

puts mscan into bus-off quite fast.

Sending a message whithout cable should never trigger an bus-off. Th=

e tx

error counter never exceeds 128.
Here is an example output of "candump -candump -td -e any,0:0,#FFFFF=

=46FF"

for a recovery from error passive state due to no ack/cable (reconne=

(000.201913) can0 1C [0]
(000.212241) can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORF=

RAME

controller-problem{tx-error-warning}
state-change{tx-error-warning}
error-counter-tx-rx{{96}{0}}
(000.003544) can0 20000204 [8] 00 20 00 00 00 00 80 00 ERRORF=

RAME

state-change{back-to-error-active}
error-counter-tx-rx{{95}{0}}
(000.202184) can0 4A [8] 7F 87 0E FE 03 BA 78 91
This is from my related patch-set.

Okay, I'll try that but the -e flag won't help much because candump e=

xpects

the PROT abuse.
=20
Andri
=20
PS.: I must admit that I don't actually know why it's useful to know =

which

error counter changed; tx or rx. I think it would be much simpler to =

send

the max of both and be done with it. Can anyone point out a case wher=

e this

helps?

I agree that it would be much simpler not to distinguish between rx and
tx state changes. This is for historical reasons. Oliver, do you
remember why we adapted that solution?

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Oliver Hartkopp

2014-09-23 22:31:49 UTC

Permalink

Post by Wolfgang Grandegger

PS.: I must admit that I don't actually know why it's useful to know which
error counter changed; tx or rx. I think it would be much simpler to send
the max of both and be done with it. Can anyone point out a case where this
helps?

I agree that it would be much simpler not to distinguish between rx and
tx state changes. This is for historical reasons. Oliver, do you
remember why we adapted that solution?

No. Indeed I was not even aware of the fact that error counters should be set
into any kind of relation.

When the error counters change, the error message should be fired.
And when the thresholds e.g. for CAN_ERR_CRTL_*X_WARNING are triggered these
flags should be set accordingly.

So can_state errcount_to_state() makes perfectly sense.

But I don't know why to compare tx error counters to rx error counters either.

Regards,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Wolfgang Grandegger

2014-09-24 06:28:19 UTC

Permalink

On Wed, 24 Sep 2014 00:31:49 +0200, Oliver Hartkopp

Post by Oliver Hartkopp

Post by Wolfgang Grandegger

I agree that it would be much simpler not to distinguish between rx and
tx state changes. This is for historical reasons. Oliver, do you
remember why we adapted that solution?

No. Indeed I was not even aware of the fact that error counters should

Post by Oliver Hartkopp
set
into any kind of relation.
When the error counters change, the error message should be fired.
And when the thresholds e.g. for CAN_ERR_CRTL_*X_WARNING are triggered these
flags should be set accordingly.

Well, unfortunately it's not that simple. Normally just the state change
is
triggered and the software has to find out the direction. On most
controllers
we therefore fiddle with the RX and TX error counters. Or do you suggest
to
monitor (cache) the error counters? They are not available on some CAN
controllers.

Post by Oliver Hartkopp
So can_state errcount_to_state() makes perfectly sense.
But I don't know why to compare tx error counters to rx error counters either.

See above.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html