Dear all,
I’m facing a similar problem to the one mentioned in Job gets stuck in a queue after a fresh install, however the advice given there was not able to resolve it and in some of the details, the behaviour differs.
I have installed PBS (master version from Github, version passing the CI test in Travis) on Ubuntu 20.04 using the commands mentioned in INSTALL. The meant use is to use the PBS on a local machine to debug the scripts before submitting them on cluster and to schedule the jobs on the very same local machine.
I was able to configure users, queues, etc. The behavior is quite similar to the one in Job gets stuck in a queue after a fresh install - with only difference, the log files seems to suggest there the PBS is unable to authenticate (Error 15008), unfortunately I was unable to resolve this issue.
badin@fermi:~$ sudo /etc/init.d/pbs start
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=457933), Proxy Name:fermi:17001, Threads:4
PBS comm
PBS mom
PBS sched
pgrep: cannot allocate 4611686018427387903 bytes
Connecting to PBS dataservice...connected to PBS dataservice@fermi
Licenses valid for 1000000 Floating hosts
PBS server
Everything seems to work …
badin@fermi:~$ sudo /etc/init.d/pbs status
pbs_server is pid 458087
pbs_mom is pid 457943
pbs_sched is pid 457955
pbs_comm is 457933
However, after submitting a simple job into queue, the job stays in the queue …
badin@fermi:~/...$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
1012.fermi job_name badin 0 Q batch
and the server, scheduler and communicator are sometimes down, sometimes stay up but the job stays in the queue.
server_logs:
02/01/2021 15:10:41;0002;Server@fermi;Svr;Log;Log opened
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;pbs_version=20.0.0
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;pbs_build=mach=N/A:security=N/A:configure_args=N/A
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;hostname=fermi;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;ipv4 interface lo: localhost
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;ipv4 interface enp4s0: fermi
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;ipv6 interface lo: ip6-loopback
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;ipv6 interface enp4s0: fermi
02/01/2021 15:10:41;0006;Server@fermi;Fil;Server@fermi;Version 20.0.0, started, initialization type = 1
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;pbs_status_db exit code 1
02/01/2021 15:10:41;0002;Server@fermi;Svr;Server@fermi;Starting PBS dataservice
02/01/2021 15:10:44;0002;Server@fermi;Svr;Server@fermi;connected to PBS dataservice@fermi
02/01/2021 15:10:44;0086;Server@fermi;Svr;pbs_python_ext_quick_start_interpreter;--> Python Interpreter quick started, compiled with version:'3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]' <--
02/01/2021 15:10:44;0086;Server@fermi;Svr;pbs_python_ext_quick_start_interpreter;--> Inserted Altair PBS Python modules dir '/opt/pbs/lib/python/altair' '/opt/pbs/lib/python/altair/pbs/v1'<--
02/01/2021 15:10:44;0002;Server@fermi;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
02/01/2021 15:10:44;0000;Server@fermi;Svr;Server@fermi;Supported authentication method: resvport
02/01/2021 15:10:44;0004;Server@fermi;Svr;Server@fermi;node_fail_requeue value changed to 310
02/01/2021 15:10:44;0004;Server@fermi;Svr;Server@fermi;svr_max_job_sequence_id set to val 9999999
02/01/2021 15:10:44;0004;Server@fermi;Req;default;'throughput_mode' is being deprecated, it is recommended to use 'job_run_wait'
02/01/2021 15:10:44;0004;Server@fermi;Svr;Server@fermi;Licenses valid for 1000000 Floating hosts
02/01/2021 15:10:44;0002;Server@fermi;Svr;Act;Account file /var/spool/p
2/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Recovered queue batch
02/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Recovered queue prior
02/01/2021 15:10:44;0100;Server@fermi;Job;1012.fermi;enqueuing into batch, state Q hop 1
02/01/2021 15:10:44;0086;Server@fermi;Job;1012.fermi;Requeueing job, substate: 10 Requeued in queue: batch
02/01/2021 15:10:44;0080;Server@fermi;Svr;Server@fermi;No jobs to open
02/01/2021 15:10:44;0002;Server@fermi;Svr;Server@fermi;Recovered 1 jobs
02/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Found hook PBS_cray_atom type=pbs
02/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Found hook PBS_power type=pbs
02/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Found hook PBS_alps_inventory_check type=pbs
02/01/2021 15:10:44;0086;Server@fermi;Svr;Server@fermi;Found hook pbs_cgroups type=site
02/01/2021 15:10:44;0080;Server@fermi;Hook;print_hook;ALLHOOKS hook[0] = {PBS_cray_atom, order=100, type=1, enabled=0 user=0, debug=(0) fail_action=(2), event=(execjob_begin,execjob_end), alarm=300, freq=120}
02/01/2021 15:10:44;0080;Server@fermi;Hook;print_hook;ALLHOOKS hook[1] = {PBS_power, order=2000, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(periodic,execjob_begin,execjob_prologue,execjob_epilogue,execjob_end,exechost_periodic,exechost_startup), alarm=180, freq=300}
02/01/2021 15:10:44;0080;Server@fermi;Hook;print_hook;ALLHOOKS hook[2] = {PBS_alps_inventory_check, order=1, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(exechost_periodic), alarm=90, freq=300}
02/01/2021 15:10:44;0080;Server@fermi;Hook;print_hook;ALLHOOKS hook[3] = {pbs_cgroups, order=100, type=0, enabled=0 user=0, debug=(0) fail_action=(2), event=(execjob_begin,execjob_epilogue,execjob_end,execjob_launch,execjob_attach,execjob_resize,execjob_abort,execjob_postsuspend,execjob_preresume,exechost_periodic,exechost_startup), alarm=90, freq=120}
02/01/2021 15:10:44;0080;Server@fermi;Hook;print_hook;periodic hook[0] = {PBS_power, order=2000, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(periodic,execjob_begin,execjob_prologue,execjob_epilogue,execjob_end,exechost_periodic,exechost_startup), alarm=180, freq=300}
02/01/2021 15:10:44;0086;Server@fermi;Svr;pbs_python_ext_quick_shutdown_interpreter;--> Stopping Python interpreter <--
02/01/2021 15:10:44;0d80;Server@fermi;TPP;Server@fermi(Main Thread);TPP authentication method = resvport
02/01/2021 15:10:44;0c06;Server@fermi;TPP;Server@fermi(Main Thread);TPP leaf node names = 192.168.0.100:15001,127.0.0.1:15001,192.168.0.100:15001
02/01/2021 15:10:44;0d80;Server@fermi;TPP;Server@fermi(Main Thread);Initializing TPP transport Layer
02/01/2021 15:10:44;0d80;Server@fermi;TPP;Server@fermi(Main Thread);Max files allowed = 16384
02/01/2021 15:10:44;0d80;Server@fermi;TPP;Server@fermi(Main Thread);TPP initialization done
02/01/2021 15:10:44;0d80;Server@fermi;TPP;Server@fermi(Main Thread);Connecting to pbs_comm fermi:17001
02/01/2021 15:10:44;0002;Server@fermi;Svr;Server@fermi;Server pid = 471827 ready; using ports Server:15001 MOM:15002 RM:15003
02/01/2021 15:10:44;0c06;Server@fermi;TPP;Server@fermi(Thread 0);Thread ready
02/01/2021 15:10:44;0c06;Server@fermi;TPP;Server@fermi(Thread 0);Registering address 192.168.0.100:15001 to pbs_comm fermi:17001
02/01/2021 15:10:44;0c06;Server@fermi;TPP;Server@fermi(Thread 0);Connected to pbs_comm fermi:17001
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all resource attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all resource attributes, number set <51>
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all queue attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all queue attributes, number set <56>
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all job attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all job attributes, number set <110>
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all server attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all server attributes, number set <101>
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all reservation attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all reservation attributes, number set <48>
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;BEGIN setting up all vnode attributes
02/01/2021 15:10:44;0106;Server@fermi;Svr;Server@fermi;DONE setting up all vnode attributes, number set <36>
02/01/2021 15:10:44;0080;Server@fermi;Svr;Server@fermi;successfully set up signal.default_int_handler
02/01/2021 15:10:44;0001;Server@fermi;Svr;net_restore_handler;net restore handler called
02/01/2021 15:10:45;0100;Server@fermi;Req;;Type 0 request received from root@localhost, sock=16
02/01/2021 15:10:45;0100;Server@fermi;Req;;Type 95 request received from root@localhost, sock=17
02/01/2021 15:10:45;0100;Server@fermi;Req;;Type 0 request received from root@localhost, sock=17
02/01/2021 15:10:45;0100;Server@fermi;Req;;Type 95 request received from root@localhost, sock=18
02/01/2021 15:10:45;0100;Server@fermi;Req;;Type 98 request received from root@localhost, sock=16
with the last message denoting error code 15008:
02/01/2021 15:10:45;00a0;Server@fermi;Req;req_reject;Reject reply code=15008, aux=0, type=98, from root@localhost
mom_logs:
02/01/2021 15:10:41;0002;pbs_mom;Svr;Log;Log opened
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;pbs_version=20.0.0
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;pbs_build=mach=N/A:security=N/A:configure_args=N/A
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;hostname=fermi;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;ipv4 interface lo: localhost
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;ipv4 interface enp4s0: fermi
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;ipv6 interface lo: ip6-loopback
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;ipv6 interface enp4s0: fermi
02/01/2021 15:10:41;0100;pbs_mom;Svr;parse_config;file config
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;Adding IP address 127.0.1.1 as authorized
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;Adding IP address 192.168.0.100 as authorized
02/01/2021 15:10:41;0002;pbs_mom;n/a;set_restrict_user_maxsys;setting 999
02/01/2021 15:10:41;0002;pbs_mom;n/a;read_config;max_check_poll = 120, min_check_poll = 10
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;Adding IP address 127.0.0.1 as authorized
02/01/2021 15:10:41;0002;pbs_mom;Svr;set_checkpoint_path;Using default checkpoint path.
...
02/01/2021 15:10:41;0002;pbs_mom;n/a;ncpus;hyperthreading enabled
02/01/2021 15:10:41;0002;pbs_mom;n/a;initialize;pcpus=32, OS reports 32 cpu(s)
02/01/2021 15:10:41;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP authentication method = resvport
02/01/2021 15:10:41;0c06;pbs_mom;TPP;pbs_mom(Main Thread);TPP leaf node names = 192.168.0.100:15003,127.0.0.1:15003,192.168.0.100:15003
02/01/2021 15:10:41;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Initializing TPP transport Layer
02/01/2021 15:10:41;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Max files allowed = 16384
02/01/2021 15:10:41;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP initialization done
02/01/2021 15:10:41;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Connecting to pbs_comm fermi:17001
02/01/2021 15:10:41;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Thread ready
02/01/2021 15:10:41;0006;pbs_mom;Fil;pbs_mom;Version 20.0.0, started, initialization type = 0
02/01/2021 15:10:41;0002;pbs_mom;Svr;pbs_mom;Mom pid = 471684 ready, using ports Server:15001 MOM:15002 RM:15003
02/01/2021 15:10:41;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Registering address 192.168.0.100:15003 to pbs_comm fermi:17001
02/01/2021 15:10:41;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Connected to pbs_comm fermi:17001
02/01/2021 15:10:41;0001;pbs_mom;Svr;net_restore_handler;net restore handler called
02/01/2021 15:10:43;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at fermi:15001, stream:0
02/01/2021 15:10:43;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 192.168.0.100:15001 on stream 0
02/01/2021 15:10:43;0002;pbs_mom;Svr;im_eof;Server closed connection.
02/01/2021 15:10:47;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at fermi:15001, stream:1
02/01/2021 15:10:47;0002;pbs_mom;Svr;pbs_mom;ReplyHello from server at 192.168.0.100:15001
02/01/2021 15:15:28;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 192.168.0.100:15001 on stream 1
02/01/2021 15:15:28;0002;pbs_mom;Svr;im_eof;Server closed connection.
02/01/2021 15:15:28;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at fermi:15001, stream:2
02/01/2021 15:15:28;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 192.168.0.100:15001 on stream 2
02/01/2021 15:15:28;0002;pbs_mom;Svr;im_eof;Server closed connection.
sched_logs:
02/01/2021 15:40:15;0002;pbs_sched;Svr;Log;Log opened
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;pbs_version=20.0.0
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;hostname=fermi;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp4s0: fermi
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: ip6-loopback
02/01/2021 15:40:15;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp4s0: fermi
02/01/2021 15:40:15;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
02/01/2021 15:40:15;0006;pbs_sched;Fil;pbs_sched;Version 20.0.0, started, initialization type = 0
02/01/2021 15:40:15;0002;pbs_sched;Svr;sched_main;/opt/pbs/sbin/pbs_sched startup pid 476837
02/01/2021 15:40:15;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
02/01/2021 15:40:15;0080;pbs_sched;Req;;Launching 16 worker threads
02/01/2021 15:40:19;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in connect_svrpool, Couldn't register the scheduler default with the configured servers
02/01/2021 15:40:21;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in connect_svrpool, Couldn't register the scheduler default with the configured servers
02/01/2021 15:40:23;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in connect_svrpool, Couldn't register the scheduler default with the configured servers
02/01/2021 15:40:25;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in connect_svrpool, Couldn't register the scheduler default with the configured servers
02/01/2021 15:40:27;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in connect_svrpool, Couldn't register the scheduler default with the configured servers
comm_logs:
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Log;Log opened
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;pbs_version=20.0.0
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;pbs_build=mach=N/A:security=N/A:configure_args=N/A
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;hostname=fermi;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;ipv4 interface lo: localhost
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;ipv4 interface enp4s0: fermi
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;ipv6 interface lo: ip6-loopback
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;ipv6 interface enp4s0: fermi
02/01/2021 15:40:15;0002;Comm@fermi;Svr;Comm@fermi;/opt/pbs/sbin/pbs_comm ready (pid=476815), Proxy Name:fermi:17001, Threads:4
02/01/2021 15:40:15;0000;Comm@fermi;Svr;Comm@fermi;Supported authentication method: resvport
02/01/2021 15:40:15;0c06;Comm@fermi;TPP;Comm@fermi(Thread 1);Thread ready
02/01/2021 15:40:15;0c06;Comm@fermi;TPP;Comm@fermi(Thread 0);Thread ready
02/01/2021 15:40:15;0c06;Comm@fermi;TPP;Comm@fermi(Thread 2);Thread ready
02/01/2021 15:40:15;0c06;Comm@fermi;TPP;Comm@fermi(Thread 3);Thread ready
02/01/2021 15:40:15;0c06;Comm@fermi;TPP;Comm@fermi(Thread 1);tfd=14, Leaf registered address 192.168.0.100:15003
02/01/2021 15:40:18;0c06;Comm@fermi;TPP;Comm@fermi(Thread 2);tfd=16, Leaf registered address 192.168.0.100:15001
02/01/2021 15:46:14;0c06;Comm@fermi;TPP;Comm@fermi(Thread 2);tfd=16, Connection from leaf 192.168.0.100:15001 down
02/01/2021 15:46:14;0c06;Comm@fermi;TPP;Comm@fermi(Thread 1);tfd=14, Connection from leaf 192.168.0.100:15003 down
02/01/2021 15:46:14;0001;Comm@fermi;Svr;Comm@fermi;stop_me, Caught signal 15
My configuration:
badin@fermi:~/gmgr -c "p s"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch Priority = 50
set queue batch enabled = True
set queue batch started = True
#
# Create and define queue prior
#
create queue prior
set queue prior queue_type = Execution
set queue prior Priority = 1000
set queue prior enabled = True
set queue prior started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = localhost
set server acl_hosts += fermi
set server acl_users = badin@localhost
set server acl_users += badin@fermi
set server acl_users += root@localhost
set server acl_roots = badin@localhost
set server acl_roots += badin@fermi
set server acl_roots += root@localhost
set server managers = badin@fermi
set server managers += root@localhost
set server operators = badin@fermi
set server operators += root@localhost
set server default_queue = batch
set server log_events = 511
set server mailer = /usr/sbin/sendmail
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server node_pack = False
set server flatuid = True
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server max_concurrent_provision = 5
set server max_job_sequence_id = 9999999
/etc/pbs.conf:
PBS_EXEC=/opt/pbs
PBS_SERVER=fermi
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=16
PBS_SCP=/usr/bin/scp
/etc/hosts:
127.0.0.1 localhost
127.0.1.1 fermi
192.168.0.100 fermi
/etc/hosts_equiv:
+ badin
/var/spool/pbs/mom_priv/config:
$clienthost fermi
$restrict_user_maxsysid 999
Check for host resolvability:
badin@fermi:~$ pbs_hostn -v fermi
primary name: fermi (from gethostbyname())
aliases: -none-
address length: 4 bytes
address: 127.0.1.1 (16842879 dec) name: fermi
address: 192.168.0.100 (1677764800 dec) name: fermi
nmaps:
badin@fermi:~$ nmap 127.0.0.1
Starting Nmap 7.80 ( https://nmap.org ) at 2021-02-01 16:05 CET
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000035s latency).
Not shown: 992 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
111/tcp open rpcbind
587/tcp open submission
631/tcp open ipp
5432/tcp open postgresql
15002/tcp open onep-tls
15003/tcp open unknown
badin@fermi:~$ nmap 192.168.0.100
Starting Nmap 7.80 ( https://nmap.org ) at 2021-02-01 16:05 CET
Nmap scan report for fermi (192.168.0.100)
Host is up (0.000036s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
15002/tcp open onep-tls
15003/tcp open unknown
badin@fermi:~$ nmap fermi
Starting Nmap 7.80 ( https://nmap.org ) at 2021-02-01 16:06 CET
Nmap scan report for fermi (127.0.1.1)
Host is up (0.000036s latency).
Other addresses for fermi (not scanned): 192.168.0.100
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
15002/tcp open onep-tls
15003/tcp open unknown
Ubuntu firewalls:
badin@fermi:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
To Action From
-- ------ ----
22/tcp ALLOW IN Anywhere
15002 ALLOW IN Anywhere
15003 ALLOW IN Anywhere
15001 ALLOW IN Anywhere
17001 ALLOW IN Anywhere
22/tcp (v6) ALLOW IN Anywhere (v6)
15002 (v6) ALLOW IN Anywhere (v6)
15003 (v6) ALLOW IN Anywhere (v6)
15001 (v6) ALLOW IN Anywhere (v6)
17001 (v6) ALLOW IN Anywhere (v6)
pbsnodes -a
badin@fermi:~$ pbsnodes -a
fermi
Mom = fermi
ntype = PBS
state = free
pcpus = 32
resources_available.arch = linux
resources_available.host = fermi
resources_available.mem = 65832120kb
resources_available.ncpus = 16
resources_available.vnode = fermi
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = Mon Feb 1 16:03:56 2021
The 192.168.0.100 is assigned by the local router. I understand that the problem lies in
02/01/2021 15:46:14;0c06;Comm@fermi;TPP;Comm@fermi(Thread 2);tfd=16, Connection from leaf 192.168.0.100:15001 down
02/01/2021 15:46:14;0c06;Comm@fermi;TPP;Comm@fermi(Thread 1);tfd=14, Connection from leaf 192.168.0.100:15003 down
but I do not understand what is causing it nor how to solve it. I would kindly appreciate any help to resolve this issue.