11-Software Security
11.1 Software Security Issues
Introducing Software Security and Defensive Programming
- Many vulnerabilities result from poor programming practices
- Consequences from insufficient checking/validation of data and error codes in programs
- Unvalidated input
- Cross-site scripting
- Buffer overflow
- Injection flaws
- Improper error handling
- …
Software Error Categories
- Insecure interaction between components
- e.g., SQL injection, cross-site scripting, open redirect
- Risky resource management
- e.g., classical buffer overflow, path traversal, download of code without integrity check
- Porous defenses
- e.g., missing authentication for critical function, authorization, or encryption of sensitive data
Software Security: Software Quality/Reliability?
- Software Quality and Reliability
- Concern: accidental failure of a program
- Unanticipated input
- System interaction
- Use of incorrect code
- Not the total number of bugs, but how often they are triggered
- Improvement: structured design and testing to identify and eliminate bugs
- Software Security
- Attacker targets specific bugs that result in a failure that can be exploited
- Triggered by inputs that differ dramatically from what is usually expected
- Unlikely to be identified by common testing approaches
Defensive or Secure Programming
- The process of designing and implementing software: continue to function even when under attack
- Software written using this process
- Detect erroneous conditions resulting from some attack
- Either continue executing safely, or fail gracefully
- Key rule: never assume anything
- Check all assumptions and handle any possible error states
- Typical programmers
- Attention on the steps needed for success
- Follow the normal flow of execution of the program
- But not consider every potential point of failure
- Often make assumptions: type of inputs and environment
- Defensive Programming
- The assumptions need to be validated by the program
- All potential failures handled gracefully and safely
- But, increase codes and time spent ➔ Conflicts with business pressures
- Typical programmers: when changes are required
- Focus on the changes required and what needs to be achieved
- Defensive programming
- Must carefully check any assumptions made
- Check and handle all possible errors
- Carefully check any interactions with existing code
- Requiring a changed mindset to traditional programming practices
Security by Design
- Security and reliability are common design goals in most engineering disciplines
- Software development has not reached high level of maturity
- Recent years have seen increasing efforts to improve secure software development processes
- Software Assurance Forum for Excellence in Code (SAFECode)
- Outlining industry best practices for software assurance
- Providing practical advice for secure software development
- Incorrect handling is one of the most common failings
- Input is any source of data from outside and whose value is not explicitly known by the programmer
- All sources of input data must be identified
- Explicitly validate assumptions on size and type of values before use
- Two key areas of concern: size and interpretation
- Programmers often make assumptions: maximum expected size of input
- Allocated buffer size is not confirmed
- May result in buffer overflows
- Testing may not identify the vulnerability (未測試發現的弱點)
- Test inputs are unlikely to include large enough inputs to trigger the overflow
- Safe programming practices (in Chapter 10)
- Use of safe string and buffer copying routines, etc.
- 驗證長度跟input
- Safe coding regards any input as dangerous
- Processes it in a manner that does not expose the program to danger
- Program input may be binary(0100101) or textual
- Binary data: depends on encoding and is usually app-specific (H.264)
- e.g., Ethernet frames, IP packets, and TCP segments
- e.g., DNS, SNMP, etc.: using binary encoding of the requests and responses
- Failure to validate may result in an exploitable vulnerability
- e.g., 2014 Heartbleed OpenSSL bug
- Failure to check the validity of a binary input value ➔ return too much data
- An increasing variety of character sets being used (e.g., ASCII)
- Care is needed to identify just which set is being used, and just what characters are being read
Injection Attacks
- When program input data can accidentally or deliberately influence the flow of execution of the program
- Most common: input data are passed as a parameter to another helper program
- Often occurs when using scripting languages (e.g., perl, PHP, python)
- Such languages encourage the reuse of other existing programs
- Now, often used as Web CGI scripts to process data supplied from HTML forms
- Example
- Command injection attack
- Safety extension
SQL Injection Example
Vulnerable PHP code
- Safer PHP code
Vulnerable PHP code
- HTTP exploit request
- Attacker利用path將hack.txt的函數使用到day.php
Cross-site Scripting (XSS) Attacks
- Input provided to a program by one user that is subsequently output to another user
- Script code may need to access data associated with other pages
- Assumption: all content from one site is equally trusted and hence is permitted to interact with other content from that site
- Attacks exploit this assumption and attempt to bypass the browser’s security checks
- Most commonly seen in scripted Web apps
- Involving the inclusion of script code in the HTML content of a Web page displayed by a user’s browser
- e.g., JavaScript, ActiveX, VBScript, Flash
- Most common variant: XSS reflection
XSS Reflection
- Consider the widespread use of guestbook programs
- e.g., wikis and blogs
- Allow users accessing the site to leave comments, which are subsequently viewed by other users
- Prevention: any user-supplied input should be examined
- The browser interprets the following identically to the above code
- Ensure that data conform with any assumptions made about the data before subsequence use
- e.g. textual data→contain only printable characters
- Input data should be compared against what is wanted
- i.e., accepting only valid input→whitelisting
- Alternative is to compare the input data with known dangerous values
- By only accepting known safe data, the program is more likely to remain secure
- using regular expressions
- Internally stored in fixed sized value
- 8, 16, 32, 64-bit integers
- Floating point numbers depend on the processor used
- Values may be signed or unsigned
- Must correctly interpret text form
- Have issues comparing signed to unsigned (signed有正負,unsigned沒有正負,永遠都是正的)
- Input as unsigned may be treated as a signed value
- Vulnerability: negative values have the top bit set
e.g., input size (unsigned): s=1xxxxxxx (signed) s < max buffer size→s is negative
- Could be used to thwart buffer overflow check
英文句子「negative values have the top bit set」的意思是「負值的最高位元會被設定為 1」。
在電腦科學中,一個位元(bit)是電腦中的基本儲存單位,可以用來表示 0 或 1。在二進制數字系統中,負值會用補碼(two’s complement)來表示。在補碼中,負值的最高位元會被設定為 1,而其他位元則是原數字的反碼(complement)。
例如,在 8 位元補碼中,負值 -1 的表示如下:
原數字:10000001
反碼:01111110
補碼:11111111
因此,負值的最高位元會被設定為 1。
在程式設計中,如果將負值當作無符號數來處理,就可能會發生錯誤。例如,如果將負值 -1 當作無符號數來處理,它會被當作 255。
以下是一段程式碼示例:
C
#include <stdio.h>
int main() {
int x = -1;
unsigned int y = x;
printf("x: %d\n", x);
printf("y: %u\n", y);
return 0;
}
這段程式碼會輸出以下結果:
x: -1
y: 255
因此,在比較 signed 和 unsigned 值時,需要注意負值的最高位元會被設定為 1。
- Major issue of input testing: very large range of inputs
- Textual or graphic input
- Random network requests
- Random parameters from system or libraries
- etc.
- Fuzzing: a software testing technique – using randomly generated data as inputs to a program
- Developed by Professor Barton Miller at University of Wisconsin Madison in 1989
- Simplicity and freedom from assumptions
- Very low cost of generating large numbers of tests
- Identifying reliability and security deficiencies in programs
- Input can be completely randomly generated, or randomly generated according to some template
- Templates: likely scenarios for bugs
- e.g., excessively long inputs or textual inputs without spaces
- e.g., targeting critical aspects of the protocol
- Pros: increasing the likelihood of locating bugs
- Cons: assumptions about the input; misses may happen
需要 random generating + template 測試
- A combination of both is needed for comprehensiveness
- Conceptually very simple, but identifying only simple types of faults
- Unlikely to locate some bugs, e.g., only triggered by a small number of very specific input values
11.3 Writing Safe Program Code
- Key issues
- Correct algorithm implementation: correctly solving the specified problem
- Correct machine instructions for algorithm
- Valid manipulation of data
Correct Algorithm Implementation
- Not correctly implement all cases or variants of the problem
- e.g., inappropriate interpretation or handling of program input
- Example I: a bug in some early releases of the Netscape Web browser
- Implementation of the random number generator: generating session keys
- Assumption: the numbers should be unguessable
- Bug: numbers were relatively easy to predict
- Due to a poor choice of the information used to seed the algorithm
- Fix: reimplementing the random number generator
- Example II: TCP session spoof or hijack attack
- Fooling the server into accepting packets using a spoofed source address
- Bug: initial sequence numbers are far too predictable
- Sequence number: an identifier and authenticator of packets
TCP 會話使用序列號來識別和驗證數據包。
初始序列號通常由隨機產生,但一些軟件存在漏洞,使其變得可預測。
攻擊者可以猜到序列號,並偽造源地址和序列號的數據包發送給伺服器,欺騙伺服器使其接受。
- Hijack attack
- Sequence number: the response from the server will not be seen by the attacker
- Correctly guessing this number: a suitable ACK packet can be constructed and sent to the server
攻擊者可以猜到下一個序列號,並構造包含正確序列號的 ACK 數據包發送給伺服器,劫持原有合法的 TCP 連接。
- Hijack variant
- Waiting until some authorized external user connects and logs into the server
- Guessing the sequence number used and injecting packets with spoofed details
攻擊者等待一個合法用戶連接并登入伺服器,然後猜測其使用的序列號,再注入包含偽裝信息和正確序列號的數據包。
- DoS attack
- Triggering RST packet from the server to terminate the connection
攻擊者可以故意發送錯誤的或不完整的數據包,誘使伺服器發送 RST (重置) 數據包,終止與合法用戶的連接。
- Fix: truly randomized initial sequence numbers
TCP 會話欺騙和劫持攻擊利用了初始序列號可預測的漏洞,通過偽造數據包和正確猜測序列號,可以竊取信息、劫持連接甚至發動 DoS 攻擊。使用真正隨機的初始序列號是重要的防禦措施之一。
- Example III: Programmers deliberately include additional code in a program to help test and debug it
- Inappropriately release information to a user of the program
- Permit a user to bypass security checks
- Was seen in the sendmail mail delivery program in the late 1980s
- Famously exploited by Morris Internet Worm
- Left in support for a DEBUG command that allowed the user to remotely query and control the running program
- The sendmail program ran using superuser privileges
- Example IV: Interpreter for a high or intermediate-level languages
- Failure to adequately reflect the language semantics: bugs
- Some early implementations of the JVM: security checks for remotely codes
JVM把遠端code當local code執行
- Permit an attacker to introduce code remotely (e.g., on Web pages) and trick the JVM interpreter into treating them as locally sourced
Ensuring that Machine Language Corresponds to Algorithm
- Largely ignored by most programmers
- Assumption: the compiler or interpreter generates or executes code that validly implements the language statements
- Malicious compiler programmer
- Including instructions in the compiler to emit additional code
- Countermeasure: careful comparison of the machine code with the source
Correct Interpretation of Data Values
- All data on a computer are stored as groups of binary bits
- Interpreted as a character, an integer, a floating-point number?
- Different languages provide varying capabilities for restricting and validating interpretation of data in variables
- Strong typing: more limited and safer
- Much more liberal interpretation of data: permit program code to explicitly change their interpretation
- e.g., language C
- Easy interpretation conversion between integers and memory addresses
- Significant benefits for system level programming
- However, many errors can be caused
Correct Use of Memory
- Correct use of memory
- Issue: allocation and management of dynamic memory storage (heap)
- Used to manipulate unknown amounts of data
- Must be allocated when needed and released when done
- Memory leak
- Steady reduction in memory available on the heap: completely exhausted
- DoS attack: cause the program to crash
- Many older languages, including C: no explicit support for dynamically allocated memory
- By explicitly calling standard library routines
- Determine exactly when the memory is no longer required can be difficult ◼ Easily occur, and difficult to identify and correct
- Modern languages (e.g., Java and C++) handle it automatically
正確解釋數據值的意義
這段講述了電腦中數據存儲和解釋的方式,以及一些潛在的錯誤風險。
重點:
- 所有電腦數據都以二進制位元組的形式存儲。
- 同一段位元組可以被解釋為不同的東西,例如字符、整數、浮點數等等。
- 不同的程式語言對數據解釋的限制和驗證能力也不同。
- 強類型語言 (如 Java) 對數據類型有更嚴格的限制,更安全但功能也更有限。
- 弱類型語言 (如 C) 對數據類型解釋更靈活,允許程式碼改變數據的解釋方式。
解釋轉換的風險:
- 在弱類型語言中,整數和記憶體地址之間很容易相互轉換。
- 這種靈活性雖然對系統級程式設計帶來很多便利,但也有很大風險。
- 其中一個主要風險是錯誤使用記憶體。
記憶體管理的錯誤:
- 弱類型語言通常需要程式員手動管理堆記憶體 (動態分配的記憶體)。
- 堆記憶體用於處理未知大小的數據,需要在需要時分配,用完後釋放。
- 若記憶體沒有被正确釋放,就會造成記憶體洩漏。
- 記憶體洩漏會逐渐消耗可用堆記憶體,最終導致程序崩溃或 DoS 攻击。
C 語言和記憶體管理:
- 傳統的 C 語言没有直接支持動態分配記憶體,而是通过调用标准库函数来实现。
- 使用標準函数庫需要手動確定釋放記憶體的時機,容易出錯。
- 記憶體洩漏在 C 程序中很常見,並且難以發現和修復。
現代語言的記憶體管理:
- 現代語言 (如 Java 和 C++) 通常會自動管理動態分配的記憶體,避免記憶體洩漏的发生。
Preventing Race Conditions with Shared Memory
- Race condition
- Multiple processes and threats compete to gain uncontrolled access to some resource
- Solution: correct selection and use of appropriate synchronization primitives
使用適當的同步機制,例如互斥鎖、訊號等,確保一次只能有一個進程或線程存取共享資源。
- But, deadlock can be still an issue
- Attackers may trigger the deadlock to launch DoS
11.4 Interacting with the Operating System and Other Programs
Environment Variables
Using Appropriate, Least Privileges
- In general, programs do not run in isolation on most computer systems
- multiple users, multiple programs
- various shared files and devices
- OS mediates access to system resources
- OS shares their use between all the executing programs
- Several issues
- Environment variables
- Using appropriate, least privileges
- Systems calls and standard library functions
- Preventing race conditions with shared system resources
- Safe temporary file use
- Interacting with other programs
Systems Calls and Standard Library Functions
- Programs use system calls and standard library functions for common operations
- The programs may not perform as expected
- Incorrect assumptions made for the operations of the system calls and standard library functions
- May be a result of system optimizing access to shared resources
- Result in requests for services being buffered, resequenced, or otherwise modified to optimize system use
- Optimizations can conflict with program goals
作業系統為了優化對共享資源的訪問,可能會對程式的請求進行緩衝、重新排序或其他形式的修改。
這些優化雖然能提升系統整體效率,但卻可能與程式的特定需求相衝突。
程式可能假設系統呼叫會立即執行,但實際上作業系統可能會將其放入佇列中等待執行。
程式可能依賴特定順序的系統呼叫回傳結果,但作業系統為了優化效能可能會改變此順序。
程式可能假設標準函數的行為與文件描述完全一致,但實際上一些實作可能會略有不同。
Example: How to Securely Delete a File?
- Standard file delete utility: simply removes the linkage between the file’s name and its contents
- Initial secure file shredding program algorithm
- Incorrect assumptions
- System will write the new data to same disk blocks
- Data are written immediately to disk
OS為了最佳化硬碟的速度,所以會放在buffer,等buffer存到一定的程度才執行
- When the I/O buffers are flushed and the file is closed, the data are then written to disk
- Better secure file shredding program algorithm
- Open the file for update: the existing data are still required
- Flush buffer after each pattern is written
- Synchronize the file system’s data with the values on the device
Preventing Race Conditions with Shared System Resources
Safe Temporary File Use
Interacting with Other Programs
11.5 Handling Program Output
- Program output
- May be stored for future use, sent over net, displayed
- May be binary or text
- Important: output conforms to the expected form and interpretation
- Programs must identify what is permissible output content
- Filter any possibly untrusted data to ensure that only valid output is displayed
- Character set should be specified
11.6 Key Terms, Review Questions, and Problems
#