块 (编程)
在计算机编程中,块(block)或代码块是将源代码组织在一起的词法结构。块构成自一个或多个声明和语句。编程语言允许创建块,包括嵌入其他块之内的块,就叫做块结构编程语言。块和子程序是结构化编程的基础,结构化所强调的控制结构可以用块来形成的。
在编程中块的功能,是确使成组的语句被当作如同就是一个语句,限定在一个块中声明的对象如变量、过程和函数的词法作用域,使得它们不冲突于在其他地方用到的同名者。在块结构编程语言中,在块外部的对象名字在块内部是可见的,除非它们被声明了相同名字的对象所遮掩。
历史
块结构的想法是在1950年代开发最初的Autocode期间发展出来的,并形式化于ALGOL 58和ALGOL 60报告中。Algol 58介入了“复合”(compound)语句的概念,它只与控制流程有关[1]。在“ALGOL 60报告”中,介入了块和作用域的概念[2]。最终在“修订报告”中,复合语句被定义为:包围在语句括号begin
和end
之间的成序列的语句,形成一个复合语句。块被定义为:成序列的声明,跟随着成序列的语句,并被包围在begin
和end
之间,形成一个块;所有声明以这种方式出现在一个块中,并只在这个块中有效。[3]
语法
块在不同语言中使用不同的语法。两个广大的家族是:
- ALGOL语言家族,ALGOL 60及其后继者比如Simula,使用关键字
begin
和end
来界定复合语句和块。ALGOL 68成为了面向表达式编程语言,偏好使用与begin
和end
等价的圆括号(
和)
[4]。Pascal语言家族,使用begin
和end
来界定复合语句[5]。C语言家族承袭了BCPL,使用花括号{
和}
来界定复合语句或块[6]。 - Lisp语言家族,使用具有语法关键字如
prog
[7]或let
[8]的S-表达式来表示块,S-表达式是圆括号包围的前缀表示法。
此外,复合语句界定还可以采用:
建立控制结构,除了对所控制的语句序列,采用复合语句或匿名块之外,还可以采用其他语法机制:
限制
受ALGOL影响的一些语言支持块,但有着各自的限制:
基本语义
块的语义是双重的。首先,它向编程者提供了建立任意大和复杂的结构,并把它当作一个单元的一种途径。其次,它确使编程者能限制变量的作用域,有时可以限制已经被声明了的其他对象的作用域。
在早期语言比如FORTRAN和BASIC中,没有语句块或控制结构。直到1978年标准化FORTRAN 77之前,都没有“块状IF
”语句,要实现按条件选择,必须诉诸GOTO
语句。例如下述FORTRAN代码,从雇员工资中分别扣除超出正税阈值部分的税款,和超出附加税阈值部分的附加税款:
C 语言:ANSI标准FORTRAN 66 C 初始化要计算的值 PAYSTX = .FALSE. PAYSST = .FALSE. TAX = 0.0 SUPTAX = 0.0 C 如果雇员挣钱小于等于正税阈值则跃过税款扣除 IF (WAGES .LE. TAXTHR) GOTO 100 PAYSTX = .TRUE. TAX = (WAGES - TAXTHR) * BASCRT C 如果雇员挣钱小于等于附加税阈值则跃过附加税扣除 IF (WAGES .LE. SUPTHR) GOTO 100 PAYSST = .TRUE. SUPTAX = (WAGES - SUPTHR) * SUPRAT 100 TAXED = WAGES - TAX - SUPTAX
由于程序的逻辑结构不反映在语言中,分析出给定语句在何时执行可能会有困难。
块允许编程者把一组语句当作一个单元。例如,在与上述FORTRAN代码相对应的Pascal代码:
{ 语言:Jensen和Wirth标准Pascal } if wages > tax_threshold then begin paystax := true; tax := (wages - tax_threshold) * tax_rate end else begin paystax := false; tax := 0 end; if wages > supertax_threshold then begin pays_supertax := true; supertax := (wages - supertax_threshold) * supertax_rate end else begin pays_supertax := false; supertax := 0 end; taxed := wages - tax - supertax;
与上述FORTRAN代码相比,上例中出现在初始化中的那些缺省值,通过复合语句即不带声明的块结构,被分别放置作出有关判断的地方。此外,处理附加税代码不再嵌入到处理正税代码之中,去除了附加税阈值要大于正税阈值,才能处理附加税的隐含条件。使用块结构,能明晰编程者的意图,使代码的结构更加密切反映出编程者的思考;再凭借缩进增进可读性,可使代码更加容易理解和修改。
在早期语言中,变量有着宽广的作用域。例如,在一个Fortran子例程的某部份中,叫做IEMPNO
的一个整数变量,可以用指示作为管理者的一个雇员的社会安全号码(SSN),但是在这个子例程的维护工作中,编程者可能偶然的将相同的变量IEMPNO
,用于了其他不同的用途,比如指示作为管理者的下属的那些雇员的SSN,这可能导致一个难于跟踪的缺陷。
块结构使得编程者易于将作用域控制到细微级别。例如下述Scheme代码,列出作为管理者的雇员名字和他的下属数目,以及每个下属的名字和角色:
;; 语言:R5RS标准Scheme (let ((empno (ssn-of employee-name))) (when (is-manager empno) (let ((employees (length (underlings-of empno)))) (printf "~a has ~a employees working under him:~%" employee-name employees) (for-each (lambda (empno) ;; 在这个lambda表达式之内变量empno指称一个下属的ssn。 ;; 在外部的表达式中变量empno指称的管理者的ssn,被遮蔽了。 (printf "Name: ~a, role: ~a~%" (name-of empno) (role-of empno))) (underlings-of empno)))))
在这个片段中,管理者及其下属二者各自的SSN,都用empno
来标识,但是因为下属SSN被声明于内部的块之中,它与持有管理者SSN的同名变量不相互影响。在实践中,出于清晰性的考虑,编程者更可能选择明显不同的变量名字,但是即使选择重名,也难以不经意的介入一个缺陷。
提升
在一些情境下,在一个块中的代码的求值,如同这个代码实际上是写在块的顶部或块的外部一样。这经常通俗的叫做“提升”(hoisting),这包括了:
- 循环不变代码外提,是将在循环内不变的代码在循环之前求值的编译器优化。
- 变量提升,是JavaScript的辖域规则,在这里变量有函数辖域,并且表现得如同它们被声明(而非定义)在函数的顶部一样。
参见
引用
- ^ Perlis, A. J.; Samelson, K. Preliminary report: international algebraic language (PDF). Communications of the ACM (New York, NY, USA: ACM). 1958, 1 (12): 8–22 [2023-02-20]. doi:10.1145/377924.594925. (原始内容 (PDF)于2023-02-20).
Strings of one or more statements may be combined into a single (compound) statement by enclosing them within the "statement parentheses"
begin
andend
. Single statements are separated by the statement separator ";
". - ^ John Backus; Friedrich L. Bauer; J. Green; C. Katz; John McCarthy; Alan Jay Perlis; Heinz Rutishauser; K. Samelson; B. Vauquois; J. H. Wegstein; A. van Wijngaarden; M. Woodger. Peter Naur , 编. Report on the Algorithmic Language ALGOL 60 (PDF) 3 (5). New York, NY, USA: ACM: 299–314. May 1960 [2009-10-27]. ISSN 0001-0782. doi:10.1145/367236.367262. (原始内容 (PDF)于2022-12-13).
Sequences of statements may be combined into compound statements by insertion of statement brackets. ……
Each declaration is attached to and valid for one compound statement. A compound statement which includes declarations is called a block. - ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 编. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始内容于2023-02-20).
A sequence of statements may be enclosed between the statement brackets
begin
andend
to form a compound statement. ……
A sequence of declarations followed by a sequence of statements and enclosed betweenbegin
andend
constitutes a block. Every declaration appears in a block in this way and is valid only for that block. - ^ A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, M. Sintzoff, C. H. Lindsey, L. G. L.T. Meertens and R. G. Fisker. Revised Report on the Algorithmic Language Algol 68. IFIP W.G. 2.1. [2023-02-20]. (原始内容于2020-07-11).
The ALGOL 60 concepts of block, compound statement and parenthesized expression are unified in ALGOL 68 into the serial-clause. A serial-clause may be an expression and yield a value. ……
A serial-clause consists of a possibly empty sequence of unlabelled phrases, the last of which, if any, is a declaration, followed by a sequence of possibly labelled units. The phrases and the units are separated by go-on-tokens, viz., semicolons. Some of the units may instead be separated by completers, viz.,EXIT
s; after a completer, the next unit must be labelled so that it can be reached. The value of the final unit, or of a unit preceding anEXIT
, determines the value of the serial-clause. - ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容 (PDF)于2023-02-20).
The compound statement specifies that its component statements be executed in the same sequence as they are written. The symbols
begin
andend
act as statement brackets. ……
Pascal uses the semicolon to separate statements, not to terminate statements; i.e., the semicolon is not part of the statement. - ^ 6.0 6.1 Brian Kernighan, Dennis Ritchie. The C Programming Language, Second Edition (PDF). Prentice Hall. 1988.
In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal.
Braces{
and}
are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after anif
,else
,while
, orfor
are another. (Variables can be declared inside any block; ……) There is no semicolon after the right brace that ends a block. ……
C is not a block-structured language in the sense of Pascal or similar languages, because functions may not be defined within other functions. On the other hand, variables can be defined in a block-structured fashion within a function. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. ……
An automatic variable declared and initialized in a block is initialized each time the block is entered.
Automatic variables, including formal parameters, also hide external variables and functions of the same name. - ^ John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin. (PDF) 2nd. MIT Press. 1985 [1962] [2021-09-23]. ISBN 0-262-13011-4. (原始内容 (PDF)存档于2021-03-02).
The LISP 1.5 program feature allows the user to write an Algol-like program containing LISP statements to be executed. ……
The program form has the structure - (PROG
, list of program variables, sequence of statements and atomic symbols...) An atomic symbol in the list is the location marker for the statement that follows. - ^ Kent M. Pitman. . 1983, 2007 [2021-10-14]. (原始内容存档于2021-12-21).
LET
is used to bind some variables to some objects, and then to evaluate some forms (those which make up the body) in the context of those bindings. ……LET*
Same asLET
but does bindings in sequence instead of in parallel. - ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容 (PDF)于2023-02-20).
The program is divided into a heading and a body, called a block. The heading gives the program a name and lists its parameters. …… The block consists of six sections, where any except the last may be empty. They must appear in the order given in the definition for a block:
Block =
LabeLDeclarationPart
ConstantDefinitionPart
TypeDejinitionPart
VariableDeclarationPart
ProcedureAndFunctionDeclarationPart
StatementPart.
……
Each procedure and function declaration has a structure similar to a program; i.e. , each consists of a heading and a block. ……
The compound statement is that of Algol, and corresponds to the DO group in PL/I. ……
The "block structure" differs from that of Algol and PL/I insofar as there are no anonymous blocks; i.e., each block is given a name and thereby is made into a procedure or function.