Conceptual
Cannot connect to the server
Always ask three questions:
-
Source:
a)ping 127.0.0.1 to check physical connection;
b)Ping other website
c)Ping the default gateway, basically is the same like 192.168.8.x, then the default is 192.168.8.1
-
Destination:
a)Ping the same subnet to check is the whole network or just the computer
b)Could be the service on the computer fail down
-
protocol:
a)Ping, telnet, curl
System Access Troubleshooting
Server is not reachable
Ping server name, if the name is unknown, use
nslookup
[email protected] ~ % nslookup google.com
Server: 192.168.1.254
Address: 192.168.1.254#53
Non-authoritative answer:
Name: google.com
netstat -rnv
查詢 Gateway
[[email protected] etc]# netstat -rnv
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 0 0 0 enp0s3
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
Cannot connect to a website or application
一些 networking commands
telnet IP port
[[email protected] etc]# telnet 142.250.179.78 80
Trying 142.250.179.78...
Connected to 142.250.179.78.
Escape character is '^]'.
如果是 connected 說明 service is running
Cannot SSH as root/user
如果
telnet ip 22
可以connect 的話
-
說明可能是沒有 Root 登陸的權限.
如果 /etc/ssh/ssh_config 裡面 PermitRootLogIn 是 yes 的話,那麼十之八九是密碼錯了
cd /var/log 下有個檔案 secure ,可以通過這個檢視
- 也可能是因為該使用者不存在,
如, id aws 顯示該使用者存在, ali 則是 no such userid user
[[email protected] log]# id aws
uid=1002(aws) gid=1002(aws) groups=1002(aws)
[[email protected] log]# id ali
id: ‘ali’: no such user
Firewall
- 可以通過
檢視 iptable 是運作的ps -ef
- 然後
systemctl status firewalld
檢視到狀态是 active 的
用
或者systemctl stop firewall
來關閉防火牆後,再重試 telnetsystemctl disable firewall
- stop 和 disable 的差別是,後者在 reboot 後依然是關閉的
[[email protected] log]# ps -ef | grep iptable
root 2013 1511 0 12:26 pts/0 00:00:00 grep --color=auto iptable
[[email protected] log]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor p>
Active: active (running) since Thu 2021-09-09 23:21:06 EDT; 13h ago
Docs: man:firewalld(1)
Main PID: 837 (firewalld)
Tasks: 2 (limit: 4928)
Memory: 26.9M
CGroup: /system.slice/firewalld.service
└─837 /usr/libexec/platform-python -s /usr/sbin/firewalld --nofork ->
Sep 09 23:21:05 localhost.localdomain systemd[1]: Starting firewalld - dynamic >
Sep 09 23:21:06 localhost.localdomain systemd[1]: Started firewalld - dynamic f>
Sep 09 23:21:07 localhost.localdomain firewalld[837]: WARNING: AllowZoneDriftin
Filesystem Troubleshooting
Cannot cd into a directory
absolute path & relative path 的問題
Cannot find a file
- 用 find 尋找
find / -name "name"
此處 /
表示全局,非常重要
[[email protected] log]# find / -name "ssh_config"
/etc/ssh/ssh_config
/usr/etc/ssh/ssh_config
-
用 locate 查找
但是注意,因為 locate 引用的是 database 裡的資料, 一天更新一次,不是即時更新的. 是以需要先
upadatedb
[[email protected] log]# cd ~
[[email protected] ~]# locate aya_test
[[email protected] ~]# updatedb
[[email protected] ~]# locate aya_test
/var/log/aya_test
Cannot create links
- Inode
Each file has an inode (index node). Inode is like the database of the file. It is like the Passport or ID card without your name. Because it contains many things, except two: the name of the file & the content of the file.
-
Soft links
Sof links 跟 Windows 系統中的快捷鍵特别像
-
Hard links
Hard links 就像原件的副本一樣.即使把原來的檔案删了, hard links file 也依然存在
建立連接配接 - 先建立一個文檔 test
[[email protected] troubleshooting]# vim test
[[email protected] troubleshooting]# cat test
#!/bin/bash
#This is a file for testing
- 然後用
ln -s RESOURCE absolute path TARGET absolute path
- 再去目标檔案中查找
- cat test_1 可以看到内容一緻
[[email protected] troubleshooting]# ln -s /root/troubleshooting/test /tmp/test_1
[[email protected]calhost troubleshooting]# cd /tmp/
[[email protected] tmp]# ls -ltr
total 0
drwx------. 3 root root 17 Sep 10 10:59 systemd-private-4b0288c1199f499f8e5089f9bfa3e9e0-chronyd.service-baejoi
lrwxrwxrwx. 1 root root 26 Sep 11 05:25 test_1 -> /root/troubleshooting/test
[[email protected] tmp]# cat test_1
#!/bin/bash
#This is a file for testing
*!!!注意,這裡一定要用絕對路徑
總結
Cannot write to a file
Cannot change permission or ownership
root 身份在 /home/aws/ 的使用者下建立 helloworld/
ls -ltr
可以看出來,這個檔案屬于 root
[[email protected] ~]# cd /home/aws
[[email protected] aws]# mkdir helloworld
[[email protected] aws]# ls -ltr
total 1452
-rwxr-xr-x. 1 aws aws 1486618 Sep 11 06:31 messages
drwxr-xr-x. 2 root root 6 Sep 11 12:14 helloworld
是以 aws 使用者無法 touch file
[[email protected] ~]$ cd helloworld/
[[email protected] helloworld]$ touch hello
touch: cannot touch 'hello': Permission denied
以 root 身份更改 ownership
chown user file/dir
[[email protected] aws]# chown aws helloworld/
[[email protected] aws]# ls -ltr
total 1452
-rwxr-xr-x. 1 aws aws 1486618 Sep 11 06:31 messages
drwxr-xr-x. 2 aws root 22 Sep 11 12:22 helloworld
所有者從 root 變成了 aws. 再嘗試
touch file
就成功了
[[email protected] ~]$ cd helloworld/
[[email protected] helloworld]$ touch hello.py
[[email protected] helloworld]$ ls
hello.py
!!!注意,如果 parent directory 的 ownership 改變了,即使這個檔案是 user 建立的, 也無法操作
比如, 在 helloworld/ 中建立 hello.py . 該檔案屬于 aws 所有
[[email protected] helloworld]$ ls -ltr
total 0
-rw-rw-r--. 1 aws aws 0 Sep 11 12:22 hello.py
但是如果将 helloworld/ 的所屬遷移回 root
[[email protected] aws]# ls -ltr
total 1452
-rwxr-xr-x. 1 aws aws 1486618 Sep 11 06:31 messages
drwxr-xr-x. 2 root root 22 Sep 11 12:22 helloworld
那麼即使 hello.py 是 aws 建立的,也沒有權限删除
[[email protected] helloworld]$ rm hello.py
rm: cannot remove 'hello.py': Permission denied
Disk space full or Add more disk
iostat is I/O status 的指令.
Create a link to another filesystem. 舉個例子: ln -s /usr/var/log/sap /temp/sap
Adding new disk and creating partition
- 添加 disk
-
fdisk -l | more
Disk /dev/sda: 8 GiB, 8589934592 bytes, 16777216 sectors
建立的可能是 /dev/sdb , sdc , etc
- create partition by
fdisk /dev/sdb
- create physical volume by
pvcreate /dev/sdb
- create volume group by
vgcreate name(oracle_vg) /dev/sdb
- Create logical volume by
, 這個 oracle_vg 表示, associated with oracle_lvlvcreate name(oracle_lv) --size 1G oracle_vg
- 格式化 logical volume by
mkfs .xfs /dev/oracle_vg/oracle_lv
- 建立新的檔案夾
mkdir oracle
- 将 logical volume 挂載上去
再用mount /dev/oracle_vg/oracle_lv /oracle
檢視df -h
Extend disk with LVM
前面的步驟同上
隻是在建立了 physical volume sdd
pvcreate /dev/sdd
後
-
vgextend oracle_vg /dev/sdd
- extend logical volume by
lvextend -L+1G /dev/mapper/oracle_vg/oracle_lv
How to delete old files
[[email protected]t test]# find /root/test -type f -mtime +90 -exec ls -l {} \;
-rw-r--r--. 1 root root 0 Mar 1 2021 /root/test/a
-rw-r--r--. 1 root root 0 Mar 1 2021 /root/test/b
-rw-r--r--. 1 root root 0 Mar 1 2021 /root/test/c
-rw-r--r--. 1 root root 0 Mar 1 2021 /root/test/d
find /root/test -type f -mtime +90 -exec mv {} {}.old \;
這裡
mv {} {}.old \;
表示不管找到的什麼,都在後面打上 .old 的尾巴
[[email protected] test]# find /root/test -type f -mtime +90 -exec mv {} {}.old \;
[[email protected] test]# ls -ltr
total 4
-rw-r--r--. 1 root root 0 Mar 1 2021 d.old
-rw-r--r--. 1 root root 0 Mar 1 2021 c.old
-rw-r--r--. 1 root root 0 Mar 1 2021 b.old
-rw-r--r--. 1 root root 0 Mar 1 2021 a.old
-rwxr-xr-x. 1 root root 76 Sep 12 02:49 delteodlfile
Filesystem is corruption
Your system runs very slow; Identify the system
注意:
- fsck 的時候是 dev/sda1, 而不是查 mount 的目錄
- fsck 失敗的原因是因為需要先 umount
[[email protected] ~]# fsck /boot
fsck from util-linux 2.32.1
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_repair(8).
[[email protected] ~]# fsck /dev/sda1
fsck from util-linux 2.32.1
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_repair(8).
/etc/fstab Corruption
[[email protected] etc]# vim fstab
#
# /etc/fstab
# Created by anaconda on Sun May 9 08:50:43 2021
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
/dev/mapper/cs-root / xfs defaults 0 0
UUID=677759ee-ef4a-4346-a958-1b27081de187 /boot xfs defaults 0 0
/dev/mapper/cs-swap none swap defaults 0 0
~
第一列是 block device, 第二列 /boot 等是 mount point, 第三列 type of the file, 第四列 mount options, 第五列(0)是 backup operation (如果是1表示 the dump utility should back up the partition). 第六列(0)是表示 fsck 對這個 device 不能進行檢測
如果是 fstab 裡面的内容出了問題,需要進入到 rescue 的模式中
然後在 /mnt/sysimage/etc 中修改 fstab