Re: SARA-R4 unreliable LTE-M connection problem



Hello,

just some updates on this, because I'm still struggling with that connection.

Am Donnerstag, 24. Juni 2021, 07:47:19 CEST schrieb Alexander Dahl:
Hei hei,

just answering myself with what I found so far …

Am Tue, Jun 15, 2021 at 04:52:58PM +0200 schrieb Alexander Dahl:
Hello everyone,

I need some help to further debug a mobile broadband modem connection
problem.

We are using a mikroe LTE IoT Click board [1] with an u-blox
SARA-R410M-02B
cellular modem (LTE-M, NB-IoT) to connect some custom embedded ARM SoC
based hardware to the internet. The LTE module is connected through the
serial UART only.

This is somewhat cumbersome. RTS/CTS for hardware flow control is
currently not working, probably due to issues with the old 4.9 kernel
on i.MX6. However I have trouble reading the datasheets for the
SARA-R4 modules regarding flow control. In general you can choose
between hardware (RTS/CTS), software (xon-xoff) and none, and
different hardware variants seem to support different things. Options
for ModemManager and NetworkManager overlap at least for rts/cts.

In some places hardware flow control is highly recommended, while in
other places it seems the module does not support that.

The setting automatically chosen by ModemManager/NetworkManager at
least works sometimes, the log indicated however it tried to switch to
hardware flow control, although that could not have worked correctly
according to oscilloscope measurements of the rts/cts lines.

The current software stack is running on a custom ptxdist based board
support package (BSP) with Linux kernel 4.9.201, ModemManager 1.16.6,
NetworkManager 1.30.4, and pppd 2.4.9. I have full control over the
software, and can apply and test patches if needed.

I will updat the kernel to v5.10 series to rule out problems with
rts/cts, we need that to test with a different, SARA-R5 based module.
Maybe that helps somehow?

Meanwhile I'm running kernel v5.10.41 and I tested with the LTE IoT 5 Click 
board with a SARA-R510M8S module. Could not get this to talk over serial UART 
at all.

With a LTE IoT 9 Click board with a Cinterion EXS62-W modem module, I had a 
segfault with ModemManager 1.16.6 which was fixed meanwhile.

Provider is Deutsche Telekom (DT), we are using some special SIM cards in
some so called Business Smart Connect plan.

They were actually kind and willing to help, provided me with some
helpful information on roaming etc.

The symptoms we face are like this: after reboot of the whole system,
NetworkManager successfully connects. We see that both in Linux, there's a
ppp0 device with the correct IPv4 address, route setup looks fine,
resolv.conf looks fine, `mmcli -m 0` shows the modem is connected,
`journalctl -u ModemManager` looks fine and so does `journalctl -u
NetworkManager`. We can see the modem is connected in the dashboard
provided by DT [2].

However we can't receive any data. :-/

Problem persists.

Strange thing is: problem is the same now with both the LTE IoT Click (u-blox 
SARA-R4) and LTE IoT 9 Click (Cinterion EXS62-W). I think that means it is no 
hardware problem?

Connection can be established, I see data on the modems RX line, but nothing 
on the TX line, which means the host sends data to the modem, but gets back 
nothing.

Can this be a problem with the baudrate? I mean it's a classical serial UART, 
so I would assume baudrates of host and modem must match. In AT command mode 
the baudrate is 115200 and ModemManager can successfuly send AT commands and 
receives reponses. Is it possible NetworkManager and/or pppd change this 
somehow?

I probably try without NetworkManager/ModemManager, but with a terminal and 
pppd manually next week. Must admit I don't fully understand yet, what should 
happen and why it does not. Any hints appreciated.

Greets
Alex

After roughly 10 minutes (some random time between 9 and 11 minutes) we
get a disconnect (LCP terminated by peer), and this happens always, every
time. Sometimes after automatic reconnect, we can send/receive data then,
but reconnect is not always successful. :-/

See nm journal output of such a disconnect:
    Feb 01 00:10:17 unit pppd[230]: LCP terminated by peer
    Feb 01 00:10:17 unit pppd[230]: nm-ppp-plugin: status 8 / phase 
'network'
    Feb 01 00:10:17 unit NetworkManager[230]: LCP terminated by peer
    Feb 01 00:10:17 unit pppd[230]: Connect time 9.8 minutes.
    Feb 01 00:10:17 unit pppd[230]: nm-ppp-plugin: status 5 / phase
    'establish'
    Feb 01 00:10:17 unit NetworkManager[230]: Connect time 9.8 minutes.
    Feb 01 00:10:17 unit NetworkManager[230]: Sent 22602 bytes, received 0

bytes.

    Feb 01 00:10:17 unit pppd[230]: Sent 22602 bytes, received 0 bytes.
    Feb 01 00:10:17 unit NetworkManager[130]: <info>  [1612138217.8665]
    device

(ppp0): state change: disconnected -> unmanaged (reason
'connection-assumed', sys-iface-state: 'external')

    Feb 01 00:10:20 unit pppd[230]: nm-ppp-plugin: status 11 / phase

'disconnect'

    Feb 01 00:10:20 unit NetworkManager[230]: Connection terminated.
    Feb 01 00:10:20 unit pppd[230]: Connection terminated.
    Feb 01 00:10:21 unit pppd[230]: Modem hangup
    Feb 01 00:10:21 unit pppd[230]: nm-ppp-plugin: status 1 / phase 'dead'
    Feb 01 00:10:21 unit NetworkManager[230]: Modem hangup
    Feb 01 00:10:21 unit NetworkManager[130]: <info>  [1612138221.8974]
    device

(ttymxc4): state change: activated -> failed (reason
'ip-config-unavailable', sys-iface-state: 'managed')

    Feb 01 00:10:21 unit pppd[230]: Exit.
    Feb 01 00:10:21 unit pppd[230]: nm-ppp-plugin: cleaning up
    Feb 01 00:10:21 unit NetworkManager[130]: <error> [1612138221.9571] 
kill

child process 'pppd' (230): failed due to unexpected return value -1 by
waitpid (No child processes, 10) after sending SIGTERM (15)

This is all not the root cause of the problem. Provider disconnects
inactive (no data transmitted/received) connections after some
timeout, things go wrong later, usually with the SARA-R4 not answering
to AT commands anymore and ModemManager dropping the modem eventually.

What I could not get up to now is logs from a successful connection,
to compare with the failing connections. This could shine some light
on the root cause? :-(

I'm currently struggling to debug the whole thing. I see at least 4
components interacting (kernel, mm, nm, pppd), and I'm not sure where to
start debugging, but I think nm is worth a try.

I get logs as shown above, however I could not get NetworkManager to
increase log level. I tried to set it in
/etc/NetworkManager/NetworkManager.conf like> 
this:
    root@unit:~ cat /etc/NetworkManager/NetworkManager.conf
    [main]
    plugins=ifupdown,keyfile
    rc-manager=file
    
    [ifupdown]
    managed=false
    
    [logging]
    domains="MB:DEBUG,PPP:DEBUG"

This worked:

    level=DEBUG

The connection itself is defined like this:
    root@unit:~ cat /etc/NetworkManager/system-connections/gsm-ttymxc4
    [connection]
    id=gsm-ttymxc4
    type=gsm
    interface-name=ttymxc4
    permissions=
    autoconnect=yes

Must be:

    autoconnect=true

    autoconnect-retries=0
    
    [gsm]
    apn=iot.telekom.net
    
    [ipv4]
    dns-search=
    method=auto
    
    [ipv6]
    addr-gen-mode=stable-privacy
    dns-search=
    method=auto

Set ipv6 to method=ignore for now, because I read about problems with
IPv6 with u-blox modules …

I'm a little puzzled about that log message:
    Feb 01 00:59:34 unit NetworkManager[342]: <warn>  [1612141174.8636] 
config:
invalid logging configuration: Unknown log level 'DEBUG"'

Can certain log levels set deactivated by meson options on build?

Yes, indeed. I had to change meson build option "more_logging" from
false to true to enable debug level log messages.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]