Discussion:
sssd responders fail regularly on busy servers
(too old to reply)
Patrick Coleman
2016-03-23 15:51:57 UTC
Permalink
Hi,

We run sssd to bind a number of machines to LDAP for auth. On a subset
of these machines, we have software that makes several thousand IPv6
route changes per second.

Recently, we found that on these hosts the sssd_nss responder process
fails several times a day[1], and will not recover until sssd is
restarted. strace[2] of the main sssd process indicates that sssd is
receiving many, many netlink messages - so many, in fact, that sssd
cannot process them fast enough and is receiving ENOBUFS from
recvmsg(2).

The messages that are received seem to get forwarded[3] to the sssd
responders over the unix socket and flood them until they fail.

From what I can see, the netlink code in
src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink
notifications with the aim of detecting things like wifi network
changes. This isn't something we'd find useful on our servers and
seems to have performance implications - is there any easy way of
turning off this functionality in sssd that I've missed?

We see this issue running sssd 1.11.7.

Cheers,

Patrick


1. The failures look something like this. I have replaced our sss
domain with "ourdomain"
/var/log/sssd/sssd_nss.log

(Tue Mar 22 02:58:01 2016) [sssd[nss]] [accept_fd_handler] (0x0100):
Client connected!
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups] (0x0100):
Requesting info for [systemuser] from [<ALL>]
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search]
(0x0100): Requesting info for [***@ourdomain]
(Tue Mar 22 02:59:04 2016) [sssd[nss]]
[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information
from Data Provider
Error: 3, 5, (null)
Will try to return what we have in cache
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(no further messages until service restarts)

2. strace sssd -i
sendmsg(12, {msg_name(0)=NULL,
msg_iov(2)=[{"l\1\0\1\0\0\0\0M\10\0\0e\0\0\0\1\1o\0\35\0\0\0/org/fre"...,
120}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 120
gettimeofday({1458663090, 595705}, NULL) = 0
epoll_wait(5, {{EPOLLIN, {u32=150507384, u64=150507384}}}, 1, 1563) = 1
recvmsg(12, {msg_name(0)=NULL,
msg_iov(1)=[{"l\2\1\1\0\0\0\0M\10\0\0\10\0\0\0\5\1u\0M\10\0\0\0\2\1\1\0\0\0\0"...,
2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},
MSG_CMSG_CLOEXEC) = 24
gettimeofday({1458663090, 595978}, NULL) = 0
recvmsg(12, 0xffcdcac0, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource
temporarily unavailable)
gettimeofday({1458663090, 596110}, NULL) = 0
gettimeofday({1458663090, 596151}, NULL) = 0
epoll_wait(5, {{EPOLLIN|EPOLLERR, {u32=150512424, u64=150512424}}}, 1, 1563) = 1
recvmsg(11, 0xffcdcb3c, 0) = -1 ENOBUFS (No buffer space available)
gettimeofday({1458663090, 596330}, NULL) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
write(2, "(Tue Mar 22 09:11:30 2016) [sssd"..., 65(Tue Mar 22 09:11:30
2016) [sssd] [netlink_fd_handler] (0x0020): ) = 65
write(2, "Error while reading from netlink"..., 36Error while reading
from netlink fd
) = 36
gettimeofday({1458663090, 596538}, NULL) = 0
epoll_wait(5, {{EPOLLIN, {u32=150512424, u64=150512424}}}, 1, 1563) = 1
recvmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000400},
msg_iov(1)=[{"\224\0\0\0\31\0\0\0\0\0\0\0\0\0\0\0\n\200\0\0\1\0\0\1\0\2\0\0\10\0\17\0"...,
4096}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=0, uid=0, gid=0}}, msg_flags=0}, 0) =
148
gettimeofday({1458663090, 596679}, NULL) = 0
sendmsg(12, {msg_name(0)=NULL,
msg_iov(2)=[{"l\1\0\1\0\0\0\0N\10\0\0e\0\0\0\1\1o\0\35\0\0\0/org/fre"...,
120}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 120
gettimeofday({1458663090, 596777}, NULL) = 0

3. /var/log/sssd/sssd.log
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24
(Wed Mar 23 06:59:07 2016) [sssd] [route_msg_debug_print] (0x1000):
route idx 209591 flags 0X200 family 10 addr
fd0a:9b09:1f7:0:218:aff:fe34:3d08/128
(Wed Mar 23 06:59:07 2016) [sssd] [network_status_change_cb] (0x2000):
A networking status change detected signaling providers to reset
offline status
(Wed Mar 23 06:59:07 2016) [sssd] [service_signal] (0x0020): Could not
signal service [ourdomain].
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24
(Wed Mar 23 06:59:07 2016) [sssd] [route_msg_debug_print] (0x1000):
route idx 209591 flags 0X200 family 10 addr
fd0a:9b09:1f7:0:218:aff:fe33:3b66/128
(Wed Mar 23 06:59:07 2016) [sssd] [network_status_change_cb] (0x2000):
A networking status change detected signaling providers to reset
offline status
(Wed Mar 23 06:59:07 2016) [sssd] [service_signal] (0x0020): Could not
signal service [ourdomain].
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24

$ grep network_status_change_cb sssd.log | grep '06:59:12' | wc -l
1245
Patrick Coleman
2016-03-23 15:49:01 UTC
Permalink
Hi,

We run sssd to bind a number of machines to LDAP for auth. On a subset
of these machines, we have software that makes several thousand IPv6
route changes per second.

Recently, we found that on these hosts the sssd_nss responder process
fails several times a day[1], and will not recover until sssd is
restarted. strace[2] of the main sssd process indicates that sssd is
receiving many, many netlink messages - so many, in fact, that sssd
cannot process them fast enough and is receiving ENOBUFS from
recvmsg(2).

The messages that are received seem to get forwarded[3] to the sssd
responders over the unix socket and flood them until they fail.

From what I can see, the netlink code in
src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink
notifications with the aim of detecting things like wifi network
changes. This isn't something we'd find useful on our servers and
seems to have performance implications - is there any easy way of
turning off this functionality in sssd that I've missed?

We see this issue running sssd 1.11.7.

Cheers,

Patrick


1. The failures look something like this. I have replaced our sss
domain with "ourdomain"
/var/log/sssd/sssd_nss.log

(Tue Mar 22 02:58:01 2016) [sssd[nss]] [accept_fd_handler] (0x0100):
Client connected!
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups] (0x0100):
Requesting info for [systemuser] from [<ALL>]
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search]
(0x0100): Requesting info for [***@ourdomain]
(Tue Mar 22 02:59:04 2016) [sssd[nss]]
[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information
from Data Provider
Error: 3, 5, (null)
Will try to return what we have in cache
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [server_setup] (0x0080):
CONFDB: /var/lib/sss/db/config.ldb
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [monitor_common_send_id]
(0x0100): Sending ID: (nss,1)
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [check_file] (0x0020): lstat
for [/var/lib/sss/pipes/private/sbus-dp_ourdomain] failed: [2][No such
file or directory].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sbus_client_init] (0x0020):
check_file failed for [/var/lib/sss/pipes/private/sbus-dp_ourdomain].
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_dp_init] (0x0010): Failed
to connect to monitor services.
(Tue Mar 22 02:59:14 2016) [sssd[nss]] [sss_process_init] (0x0010):
fatal error setting up backend connector
(no further messages until service restarts)

2. strace sssd -i
sendmsg(12, {msg_name(0)=NULL,
msg_iov(2)=[{"l\1\0\1\0\0\0\0M\10\0\0e\0\0\0\1\1o\0\35\0\0\0/org/fre"...,
120}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 120
gettimeofday({1458663090, 595705}, NULL) = 0
epoll_wait(5, {{EPOLLIN, {u32=150507384, u64=150507384}}}, 1, 1563) = 1
recvmsg(12, {msg_name(0)=NULL,
msg_iov(1)=[{"l\2\1\1\0\0\0\0M\10\0\0\10\0\0\0\5\1u\0M\10\0\0\0\2\1\1\0\0\0\0"...,
2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},
MSG_CMSG_CLOEXEC) = 24
gettimeofday({1458663090, 595978}, NULL) = 0
recvmsg(12, 0xffcdcac0, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource
temporarily unavailable)
gettimeofday({1458663090, 596110}, NULL) = 0
gettimeofday({1458663090, 596151}, NULL) = 0
epoll_wait(5, {{EPOLLIN|EPOLLERR, {u32=150512424, u64=150512424}}}, 1, 1563) = 1
recvmsg(11, 0xffcdcb3c, 0) = -1 ENOBUFS (No buffer space available)
gettimeofday({1458663090, 596330}, NULL) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
write(2, "(Tue Mar 22 09:11:30 2016) [sssd"..., 65(Tue Mar 22 09:11:30
2016) [sssd] [netlink_fd_handler] (0x0020): ) = 65
write(2, "Error while reading from netlink"..., 36Error while reading
from netlink fd
) = 36
gettimeofday({1458663090, 596538}, NULL) = 0
epoll_wait(5, {{EPOLLIN, {u32=150512424, u64=150512424}}}, 1, 1563) = 1
recvmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000400},
msg_iov(1)=[{"\224\0\0\0\31\0\0\0\0\0\0\0\0\0\0\0\n\200\0\0\1\0\0\1\0\2\0\0\10\0\17\0"...,
4096}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=0, uid=0, gid=0}}, msg_flags=0}, 0) =
148
gettimeofday({1458663090, 596679}, NULL) = 0
sendmsg(12, {msg_name(0)=NULL,
msg_iov(2)=[{"l\1\0\1\0\0\0\0N\10\0\0e\0\0\0\1\1o\0\35\0\0\0/org/fre"...,
120}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 120
gettimeofday({1458663090, 596777}, NULL) = 0

3. /var/log/sssd/sssd.log
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24
(Wed Mar 23 06:59:07 2016) [sssd] [route_msg_debug_print] (0x1000):
route idx 209591 flags 0X200 family 10 addr
fd0a:9b09:1f7:0:218:aff:fe34:3d08/128
(Wed Mar 23 06:59:07 2016) [sssd] [network_status_change_cb] (0x2000):
A networking status change detected signaling providers to reset
offline status
(Wed Mar 23 06:59:07 2016) [sssd] [service_signal] (0x0020): Could not
signal service [ourdomain].
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24
(Wed Mar 23 06:59:07 2016) [sssd] [route_msg_debug_print] (0x1000):
route idx 209591 flags 0X200 family 10 addr
fd0a:9b09:1f7:0:218:aff:fe33:3b66/128
(Wed Mar 23 06:59:07 2016) [sssd] [network_status_change_cb] (0x2000):
A networking status change detected signaling providers to reset
offline status
(Wed Mar 23 06:59:07 2016) [sssd] [service_signal] (0x0020): Could not
signal service [ourdomain].
(Wed Mar 23 06:59:07 2016) [sssd] [message_type] (0x0200): netlink
Message type: 24

$ grep network_status_change_cb sssd.log | grep '06:59:12' | wc -l
1245
Lukas Slebodnik
2016-03-23 15:58:47 UTC
Permalink
Post by Patrick Coleman
Hi,
We run sssd to bind a number of machines to LDAP for auth. On a subset
of these machines, we have software that makes several thousand IPv6
route changes per second.
Recently, we found that on these hosts the sssd_nss responder process
fails several times a day[1], and will not recover until sssd is
restarted. strace[2] of the main sssd process indicates that sssd is
receiving many, many netlink messages - so many, in fact, that sssd
cannot process them fast enough and is receiving ENOBUFS from
recvmsg(2).
The messages that are received seem to get forwarded[3] to the sssd
responders over the unix socket and flood them until they fail.
From what I can see, the netlink code in
src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink
notifications with the aim of detecting things like wifi network
changes. This isn't something we'd find useful on our servers and
seems to have performance implications - is there any easy way of
turning off this functionality in sssd that I've missed?
We see this issue running sssd 1.11.7.
Cheers,
Patrick
1. The failures look something like this. I have replaced our sss
domain with "ourdomain"
/var/log/sssd/sssd_nss.log
Client connected!
Requesting info for [systemuser] from [<ALL>]
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search]
(Tue Mar 22 02:59:04 2016) [sssd[nss]]
[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information
from Data Provider
Error: 3, 5, (null)
The real error is in sssd_$domain.log

neither sssd.log nor sssd_nss.log will help you.

@see https://fedorahosted.org/sssd/wiki/Troubleshooting

LS
Patrick Coleman
2016-03-23 17:47:41 UTC
Permalink
Hi Lukas,
Post by Lukas Slebodnik
Post by Patrick Coleman
$ grep network_status_change_cb sssd.log | grep '06:59:12' | wc -l
1245
I believe the root cause of our problem is that sssd is receiving more
than a thousand netlink notifications a second - because of IPv6 route
updates - per above. While this might be something we can reduce on
our end, it would be great if there was some way to unsubscribe sssd
from receiving these.

Any ideas here appreciated - eg. if netlink is only used to get a more
rapid notification of network changes, could we remove the call to
setup_netlink in src/monitor/monitor.c in a local build? Does sssd
rely on netlink for anything else?
Post by Lukas Slebodnik
Post by Patrick Coleman
(Wed Mar 23 06:59:07 2016) [sssd] [route_msg_debug_print] (0x1000): route idx 209591 flags 0X200 family 10 addr fd0a:9b09:1f7:0:218:aff:fe33:3b66/128
(Wed Mar 23 06:59:07 2016) [sssd] [network_status_change_cb] (0x2000): A networking status change detected signaling providers to reset offline status
If I interpret this correctly, every single netlink notification seems
to be causing sssd to send a message over dbus?
Post by Lukas Slebodnik
The real error is in sssd_$domain.log
neither sssd.log nor sssd_nss.log will help you.
I see nothing wrong in the domain logs. I've included samples of
normal operation before the failure this morning in the domain log
below in any case. If there's anything further I can provide let me
know.

Cheers,

Patrick


(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_initgr_rfc2307_next_base] (0x0400): Searching for groups with
base [dc=meraki,dc=com]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
[(&(memberuid=blinken)(objectClass=posixGroup)(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][dc=meraki,dc=com].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no
errmsg set
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_groups_next_base] (0x0400): Searching for groups with base
[dc=meraki,dc=com]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
[(&(gidNumber=1111)(objectClass=posixGroup)(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][dc=meraki,dc=com].
(Wed Mar 23 06:54:04 2016) [sssd[be[meraki]]]
[sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no
errmsg set
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_groups_process] (0x0400): Search for groups, returned 1
results.
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_attrs_get_sid_str] (0x0080): No [objectSID] attribute while
id-mapping. [0][Success]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_primary_name] (0x0400): Processing object blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Processing group blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Original USN value is not available for [blinken].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_process_ghost_members] (0x0400): Group has 0 members
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Storing info for group blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sysdb_search_group_by_name] (0x0400): No such entry
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sysdb_search_group_by_gid] (0x0400): No such entry
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [acctinfo_callback]
(0x0100): Request processed. Returned 0,0,Success
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [be_get_account_info]
(0x0100): Got request for [4099][1][name=blinken]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [be_req_set_domain]
(0x0400): Changing request domain from [ourdomain] to [ourdomain]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_initgr_next_base] (0x0400): Searching for users with base
[dc=meraki,dc=com]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
[(&(uid=blinken)(objectclass=posixAccount)(&(uidNumber=*)(!(uidNumber=0))))][dc=meraki,dc=com].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no
errmsg set
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): Save user
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_attrs_get_sid_str] (0x0080): No [objectSID] attribute while
id-mapping. [0][Success]
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sdap_get_primary_name] (0x0400): Processing object blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): Processing user blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): Original memberOf is not available for [blinken].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): Original USN value is not available for [blinken].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): User principal is not available for [blinken].
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]] [sdap_save_user]
(0x0400): Storing info for user blinken
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sysdb_search_user_by_name] (0x0400): No such entry
(Wed Mar 23 06:54:04 2016) [sssd[be[ourdomain]]]
[sysdb_search_user_by_uid] (0x0400): No such entry
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_initgr_rfc2307_next_base] (0x0400): Searching for groups with
base [dc=meraki,dc=com]
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
[(&(memberuid=blinken)(objectClass=posixGroup)(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][dc=meraki,dc=com].
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no
errmsg set
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_groups_next_base] (0x0400): Searching for groups with base
[dc=meraki,dc=com]
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
[(&(gidNumber=1111)(objectClass=posixGroup)(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][dc=meraki,dc=com].
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_generic_ext_done] (0x0400): Search result: Success(0), no
errmsg set
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_groups_process] (0x0400): Search for groups, returned 1
results.
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_attrs_get_sid_str] (0x0080): No [objectSID] attribute while
id-mapping. [0][Success]
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_get_primary_name] (0x0400): Processing object blinken
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Processing group blinken
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Original USN value is not available for [blinken].
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sdap_process_ghost_members] (0x0400): Group has 0 members
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]] [sdap_save_group]
(0x0400): Storing info for group blinken
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sysdb_search_group_by_name] (0x0400): No such entry
(Wed Mar 23 06:54:05 2016) [sssd[be[ourdomain]]]
[sysdb_search_group_by_gid] (0x0400): No such entry
Lukas Slebodnik
2016-03-23 19:18:14 UTC
Permalink
Post by Patrick Coleman
Hi Lukas,
Post by Patrick Coleman
$ grep network_status_change_cb sssd.log | grep '06:59:12' | wc -l
1245
I believe the root cause of our problem is that sssd is receiving more
than a thousand netlink notifications a second - because of IPv6 route
updates - per above. While this might be something we can reduce on
our end, it would be great if there was some way to unsubscribe sssd
from receiving these.
Any ideas here appreciated - eg. if netlink is only used to get a more
rapid notification of network changes, could we remove the call to
setup_netlink in src/monitor/monitor.c in a local build? Does sssd
rely on netlink for anything else?
ahh, netlink might explain your observation.

It isn't possible to change it in runtime IIRC.
You might recompile sssd with --with-libnl=no

LS
Patrick Coleman
2016-03-24 17:55:25 UTC
Permalink
Post by Lukas Slebodnik
ahh, netlink might explain your observation.
It isn't possible to change it in runtime IIRC.
You might recompile sssd with --with-libnl=no
Thanks Lukas, rebuilding the package without netlink seems to have
fixed the issue. Will test further next week.

Appreciate the quick response.

Cheers,

Patrick

Continue reading on narkive:
Loading...