Source file src/runtime/mgc.go

     1	// Copyright 2009 The Go Authors. All rights reserved.
     2	// Use of this source code is governed by a BSD-style
     3	// license that can be found in the LICENSE file.
     4	
     5	// Garbage collector (GC).
     6	//
     7	// The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple
     8	// GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is
     9	// non-generational and non-compacting. Allocation is done using size segregated per P allocation
    10	// areas to minimize fragmentation while eliminating locks in the common case.
    11	//
    12	// The algorithm decomposes into several steps.
    13	// This is a high level description of the algorithm being used. For an overview of GC a good
    14	// place to start is Richard Jones' gchandbook.org.
    15	//
    16	// The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see
    17	// Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978.
    18	// On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978),
    19	// 966-975.
    20	// For journal quality proofs that these steps are complete, correct, and terminate see
    21	// Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world.
    22	// Concurrency and Computation: Practice and Experience 15(3-5), 2003.
    23	//
    24	// 1. GC performs sweep termination.
    25	//
    26	//    a. Stop the world. This causes all Ps to reach a GC safe-point.
    27	//
    28	//    b. Sweep any unswept spans. There will only be unswept spans if
    29	//    this GC cycle was forced before the expected time.
    30	//
    31	// 2. GC performs the mark phase.
    32	//
    33	//    a. Prepare for the mark phase by setting gcphase to _GCmark
    34	//    (from _GCoff), enabling the write barrier, enabling mutator
    35	//    assists, and enqueueing root mark jobs. No objects may be
    36	//    scanned until all Ps have enabled the write barrier, which is
    37	//    accomplished using STW.
    38	//
    39	//    b. Start the world. From this point, GC work is done by mark
    40	//    workers started by the scheduler and by assists performed as
    41	//    part of allocation. The write barrier shades both the
    42	//    overwritten pointer and the new pointer value for any pointer
    43	//    writes (see mbarrier.go for details). Newly allocated objects
    44	//    are immediately marked black.
    45	//
    46	//    c. GC performs root marking jobs. This includes scanning all
    47	//    stacks, shading all globals, and shading any heap pointers in
    48	//    off-heap runtime data structures. Scanning a stack stops a
    49	//    goroutine, shades any pointers found on its stack, and then
    50	//    resumes the goroutine.
    51	//
    52	//    d. GC drains the work queue of grey objects, scanning each grey
    53	//    object to black and shading all pointers found in the object
    54	//    (which in turn may add those pointers to the work queue).
    55	//
    56	//    e. Because GC work is spread across local caches, GC uses a
    57	//    distributed termination algorithm to detect when there are no
    58	//    more root marking jobs or grey objects (see gcMarkDone). At this
    59	//    point, GC transitions to mark termination.
    60	//
    61	// 3. GC performs mark termination.
    62	//
    63	//    a. Stop the world.
    64	//
    65	//    b. Set gcphase to _GCmarktermination, and disable workers and
    66	//    assists.
    67	//
    68	//    c. Perform housekeeping like flushing mcaches.
    69	//
    70	// 4. GC performs the sweep phase.
    71	//
    72	//    a. Prepare for the sweep phase by setting gcphase to _GCoff,
    73	//    setting up sweep state and disabling the write barrier.
    74	//
    75	//    b. Start the world. From this point on, newly allocated objects
    76	//    are white, and allocating sweeps spans before use if necessary.
    77	//
    78	//    c. GC does concurrent sweeping in the background and in response
    79	//    to allocation. See description below.
    80	//
    81	// 5. When sufficient allocation has taken place, replay the sequence
    82	// starting with 1 above. See discussion of GC rate below.
    83	
    84	// Concurrent sweep.
    85	//
    86	// The sweep phase proceeds concurrently with normal program execution.
    87	// The heap is swept span-by-span both lazily (when a goroutine needs another span)
    88	// and concurrently in a background goroutine (this helps programs that are not CPU bound).
    89	// At the end of STW mark termination all spans are marked as "needs sweeping".
    90	//
    91	// The background sweeper goroutine simply sweeps spans one-by-one.
    92	//
    93	// To avoid requesting more OS memory while there are unswept spans, when a
    94	// goroutine needs another span, it first attempts to reclaim that much memory
    95	// by sweeping. When a goroutine needs to allocate a new small-object span, it
    96	// sweeps small-object spans for the same object size until it frees at least
    97	// one object. When a goroutine needs to allocate large-object span from heap,
    98	// it sweeps spans until it frees at least that many pages into heap. There is
    99	// one case where this may not suffice: if a goroutine sweeps and frees two
   100	// nonadjacent one-page spans to the heap, it will allocate a new two-page
   101	// span, but there can still be other one-page unswept spans which could be
   102	// combined into a two-page span.
   103	//
   104	// It's critical to ensure that no operations proceed on unswept spans (that would corrupt
   105	// mark bits in GC bitmap). During GC all mcaches are flushed into the central cache,
   106	// so they are empty. When a goroutine grabs a new span into mcache, it sweeps it.
   107	// When a goroutine explicitly frees an object or sets a finalizer, it ensures that
   108	// the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish).
   109	// The finalizer goroutine is kicked off only when all spans are swept.
   110	// When the next GC starts, it sweeps all not-yet-swept spans (if any).
   111	
   112	// GC rate.
   113	// Next GC is after we've allocated an extra amount of memory proportional to
   114	// the amount already in use. The proportion is controlled by GOGC environment variable
   115	// (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M
   116	// (this mark is tracked in next_gc variable). This keeps the GC cost in linear
   117	// proportion to the allocation cost. Adjusting GOGC just changes the linear constant
   118	// (and also the amount of extra memory used).
   119	
   120	// Oblets
   121	//
   122	// In order to prevent long pauses while scanning large objects and to
   123	// improve parallelism, the garbage collector breaks up scan jobs for
   124	// objects larger than maxObletBytes into "oblets" of at most
   125	// maxObletBytes. When scanning encounters the beginning of a large
   126	// object, it scans only the first oblet and enqueues the remaining
   127	// oblets as new scan jobs.
   128	
   129	package runtime
   130	
   131	import (
   132		"internal/cpu"
   133		"runtime/internal/atomic"
   134		"unsafe"
   135	)
   136	
   137	const (
   138		_DebugGC         = 0
   139		_ConcurrentSweep = true
   140		_FinBlockSize    = 4 * 1024
   141	
   142		// sweepMinHeapDistance is a lower bound on the heap distance
   143		// (in bytes) reserved for concurrent sweeping between GC
   144		// cycles.
   145		sweepMinHeapDistance = 1024 * 1024
   146	)
   147	
   148	// heapminimum is the minimum heap size at which to trigger GC.
   149	// For small heaps, this overrides the usual GOGC*live set rule.
   150	//
   151	// When there is a very small live set but a lot of allocation, simply
   152	// collecting when the heap reaches GOGC*live results in many GC
   153	// cycles and high total per-GC overhead. This minimum amortizes this
   154	// per-GC overhead while keeping the heap reasonably small.
   155	//
   156	// During initialization this is set to 4MB*GOGC/100. In the case of
   157	// GOGC==0, this will set heapminimum to 0, resulting in constant
   158	// collection even when the heap size is small, which is useful for
   159	// debugging.
   160	var heapminimum uint64 = defaultHeapMinimum
   161	
   162	// defaultHeapMinimum is the value of heapminimum for GOGC==100.
   163	const defaultHeapMinimum = 4 << 20
   164	
   165	// Initialized from $GOGC.  GOGC=off means no GC.
   166	var gcpercent int32
   167	
   168	func gcinit() {
   169		if unsafe.Sizeof(workbuf{}) != _WorkbufSize {
   170			throw("size of Workbuf is suboptimal")
   171		}
   172	
   173		// No sweep on the first cycle.
   174		mheap_.sweepdone = 1
   175	
   176		// Set a reasonable initial GC trigger.
   177		memstats.triggerRatio = 7 / 8.0
   178	
   179		// Fake a heap_marked value so it looks like a trigger at
   180		// heapminimum is the appropriate growth from heap_marked.
   181		// This will go into computing the initial GC goal.
   182		memstats.heap_marked = uint64(float64(heapminimum) / (1 + memstats.triggerRatio))
   183	
   184		// Set gcpercent from the environment. This will also compute
   185		// and set the GC trigger and goal.
   186		_ = setGCPercent(readgogc())
   187	
   188		work.startSema = 1
   189		work.markDoneSema = 1
   190	}
   191	
   192	func readgogc() int32 {
   193		p := gogetenv("GOGC")
   194		if p == "off" {
   195			return -1
   196		}
   197		if n, ok := atoi32(p); ok {
   198			return n
   199		}
   200		return 100
   201	}
   202	
   203	// gcenable is called after the bulk of the runtime initialization,
   204	// just before we're about to start letting user code run.
   205	// It kicks off the background sweeper goroutine, the background
   206	// scavenger goroutine, and enables GC.
   207	func gcenable() {
   208		// Kick off sweeping and scavenging.
   209		c := make(chan int, 2)
   210		go bgsweep(c)
   211		go bgscavenge(c)
   212		<-c
   213		<-c
   214		memstats.enablegc = true // now that runtime is initialized, GC is okay
   215	}
   216	
   217	//go:linkname setGCPercent runtime/debug.setGCPercent
   218	func setGCPercent(in int32) (out int32) {
   219		// Run on the system stack since we grab the heap lock.
   220		systemstack(func() {
   221			lock(&mheap_.lock)
   222			out = gcpercent
   223			if in < 0 {
   224				in = -1
   225			}
   226			gcpercent = in
   227			heapminimum = defaultHeapMinimum * uint64(gcpercent) / 100
   228			// Update pacing in response to gcpercent change.
   229			gcSetTriggerRatio(memstats.triggerRatio)
   230			unlock(&mheap_.lock)
   231		})
   232		// Pacing changed, so the scavenger should be awoken.
   233		wakeScavenger()
   234	
   235		// If we just disabled GC, wait for any concurrent GC mark to
   236		// finish so we always return with no GC running.
   237		if in < 0 {
   238			gcWaitOnMark(atomic.Load(&work.cycles))
   239		}
   240	
   241		return out
   242	}
   243	
   244	// Garbage collector phase.
   245	// Indicates to write barrier and synchronization task to perform.
   246	var gcphase uint32
   247	
   248	// The compiler knows about this variable.
   249	// If you change it, you must change builtin/runtime.go, too.
   250	// If you change the first four bytes, you must also change the write
   251	// barrier insertion code.
   252	var writeBarrier struct {
   253		enabled bool    // compiler emits a check of this before calling write barrier
   254		pad     [3]byte // compiler uses 32-bit load for "enabled" field
   255		needed  bool    // whether we need a write barrier for current GC phase
   256		cgo     bool    // whether we need a write barrier for a cgo check
   257		alignme uint64  // guarantee alignment so that compiler can use a 32 or 64-bit load
   258	}
   259	
   260	// gcBlackenEnabled is 1 if mutator assists and background mark
   261	// workers are allowed to blacken objects. This must only be set when
   262	// gcphase == _GCmark.
   263	var gcBlackenEnabled uint32
   264	
   265	const (
   266		_GCoff             = iota // GC not running; sweeping in background, write barrier disabled
   267		_GCmark                   // GC marking roots and workbufs: allocate black, write barrier ENABLED
   268		_GCmarktermination        // GC mark termination: allocate black, P's help GC, write barrier ENABLED
   269	)
   270	
   271	//go:nosplit
   272	func setGCPhase(x uint32) {
   273		atomic.Store(&gcphase, x)
   274		writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
   275		writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
   276	}
   277	
   278	// gcMarkWorkerMode represents the mode that a concurrent mark worker
   279	// should operate in.
   280	//
   281	// Concurrent marking happens through four different mechanisms. One
   282	// is mutator assists, which happen in response to allocations and are
   283	// not scheduled. The other three are variations in the per-P mark
   284	// workers and are distinguished by gcMarkWorkerMode.
   285	type gcMarkWorkerMode int
   286	
   287	const (
   288		// gcMarkWorkerDedicatedMode indicates that the P of a mark
   289		// worker is dedicated to running that mark worker. The mark
   290		// worker should run without preemption.
   291		gcMarkWorkerDedicatedMode gcMarkWorkerMode = iota
   292	
   293		// gcMarkWorkerFractionalMode indicates that a P is currently
   294		// running the "fractional" mark worker. The fractional worker
   295		// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
   296		// an integer. The fractional worker should run until it is
   297		// preempted and will be scheduled to pick up the fractional
   298		// part of GOMAXPROCS*gcBackgroundUtilization.
   299		gcMarkWorkerFractionalMode
   300	
   301		// gcMarkWorkerIdleMode indicates that a P is running the mark
   302		// worker because it has nothing else to do. The idle worker
   303		// should run until it is preempted and account its time
   304		// against gcController.idleMarkTime.
   305		gcMarkWorkerIdleMode
   306	)
   307	
   308	// gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes
   309	// to use in execution traces.
   310	var gcMarkWorkerModeStrings = [...]string{
   311		"GC (dedicated)",
   312		"GC (fractional)",
   313		"GC (idle)",
   314	}
   315	
   316	// gcController implements the GC pacing controller that determines
   317	// when to trigger concurrent garbage collection and how much marking
   318	// work to do in mutator assists and background marking.
   319	//
   320	// It uses a feedback control algorithm to adjust the memstats.gc_trigger
   321	// trigger based on the heap growth and GC CPU utilization each cycle.
   322	// This algorithm optimizes for heap growth to match GOGC and for CPU
   323	// utilization between assist and background marking to be 25% of
   324	// GOMAXPROCS. The high-level design of this algorithm is documented
   325	// at https://golang.org/s/go15gcpacing.
   326	//
   327	// All fields of gcController are used only during a single mark
   328	// cycle.
   329	var gcController gcControllerState
   330	
   331	type gcControllerState struct {
   332		// scanWork is the total scan work performed this cycle. This
   333		// is updated atomically during the cycle. Updates occur in
   334		// bounded batches, since it is both written and read
   335		// throughout the cycle. At the end of the cycle, this is how
   336		// much of the retained heap is scannable.
   337		//
   338		// Currently this is the bytes of heap scanned. For most uses,
   339		// this is an opaque unit of work, but for estimation the
   340		// definition is important.
   341		scanWork int64
   342	
   343		// bgScanCredit is the scan work credit accumulated by the
   344		// concurrent background scan. This credit is accumulated by
   345		// the background scan and stolen by mutator assists. This is
   346		// updated atomically. Updates occur in bounded batches, since
   347		// it is both written and read throughout the cycle.
   348		bgScanCredit int64
   349	
   350		// assistTime is the nanoseconds spent in mutator assists
   351		// during this cycle. This is updated atomically. Updates
   352		// occur in bounded batches, since it is both written and read
   353		// throughout the cycle.
   354		assistTime int64
   355	
   356		// dedicatedMarkTime is the nanoseconds spent in dedicated
   357		// mark workers during this cycle. This is updated atomically
   358		// at the end of the concurrent mark phase.
   359		dedicatedMarkTime int64
   360	
   361		// fractionalMarkTime is the nanoseconds spent in the
   362		// fractional mark worker during this cycle. This is updated
   363		// atomically throughout the cycle and will be up-to-date if
   364		// the fractional mark worker is not currently running.
   365		fractionalMarkTime int64
   366	
   367		// idleMarkTime is the nanoseconds spent in idle marking
   368		// during this cycle. This is updated atomically throughout
   369		// the cycle.
   370		idleMarkTime int64
   371	
   372		// markStartTime is the absolute start time in nanoseconds
   373		// that assists and background mark workers started.
   374		markStartTime int64
   375	
   376		// dedicatedMarkWorkersNeeded is the number of dedicated mark
   377		// workers that need to be started. This is computed at the
   378		// beginning of each cycle and decremented atomically as
   379		// dedicated mark workers get started.
   380		dedicatedMarkWorkersNeeded int64
   381	
   382		// assistWorkPerByte is the ratio of scan work to allocated
   383		// bytes that should be performed by mutator assists. This is
   384		// computed at the beginning of each cycle and updated every
   385		// time heap_scan is updated.
   386		assistWorkPerByte float64
   387	
   388		// assistBytesPerWork is 1/assistWorkPerByte.
   389		assistBytesPerWork float64
   390	
   391		// fractionalUtilizationGoal is the fraction of wall clock
   392		// time that should be spent in the fractional mark worker on
   393		// each P that isn't running a dedicated worker.
   394		//
   395		// For example, if the utilization goal is 25% and there are
   396		// no dedicated workers, this will be 0.25. If the goal is
   397		// 25%, there is one dedicated worker, and GOMAXPROCS is 5,
   398		// this will be 0.05 to make up the missing 5%.
   399		//
   400		// If this is zero, no fractional workers are needed.
   401		fractionalUtilizationGoal float64
   402	
   403		_ cpu.CacheLinePad
   404	}
   405	
   406	// startCycle resets the GC controller's state and computes estimates
   407	// for a new GC cycle. The caller must hold worldsema.
   408	func (c *gcControllerState) startCycle() {
   409		c.scanWork = 0
   410		c.bgScanCredit = 0
   411		c.assistTime = 0
   412		c.dedicatedMarkTime = 0
   413		c.fractionalMarkTime = 0
   414		c.idleMarkTime = 0
   415	
   416		// Ensure that the heap goal is at least a little larger than
   417		// the current live heap size. This may not be the case if GC
   418		// start is delayed or if the allocation that pushed heap_live
   419		// over gc_trigger is large or if the trigger is really close to
   420		// GOGC. Assist is proportional to this distance, so enforce a
   421		// minimum distance, even if it means going over the GOGC goal
   422		// by a tiny bit.
   423		if memstats.next_gc < memstats.heap_live+1024*1024 {
   424			memstats.next_gc = memstats.heap_live + 1024*1024
   425		}
   426	
   427		// Compute the background mark utilization goal. In general,
   428		// this may not come out exactly. We round the number of
   429		// dedicated workers so that the utilization is closest to
   430		// 25%. For small GOMAXPROCS, this would introduce too much
   431		// error, so we add fractional workers in that case.
   432		totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization
   433		c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5)
   434		utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
   435		const maxUtilError = 0.3
   436		if utilError < -maxUtilError || utilError > maxUtilError {
   437			// Rounding put us more than 30% off our goal. With
   438			// gcBackgroundUtilization of 25%, this happens for
   439			// GOMAXPROCS<=3 or GOMAXPROCS=6. Enable fractional
   440			// workers to compensate.
   441			if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal {
   442				// Too many dedicated workers.
   443				c.dedicatedMarkWorkersNeeded--
   444			}
   445			c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs)
   446		} else {
   447			c.fractionalUtilizationGoal = 0
   448		}
   449	
   450		// In STW mode, we just want dedicated workers.
   451		if debug.gcstoptheworld > 0 {
   452			c.dedicatedMarkWorkersNeeded = int64(gomaxprocs)
   453			c.fractionalUtilizationGoal = 0
   454		}
   455	
   456		// Clear per-P state
   457		for _, p := range allp {
   458			p.gcAssistTime = 0
   459			p.gcFractionalMarkTime = 0
   460		}
   461	
   462		// Compute initial values for controls that are updated
   463		// throughout the cycle.
   464		c.revise()
   465	
   466		if debug.gcpacertrace > 0 {
   467			print("pacer: assist ratio=", c.assistWorkPerByte,
   468				" (scan ", memstats.heap_scan>>20, " MB in ",
   469				work.initialHeapLive>>20, "->",
   470				memstats.next_gc>>20, " MB)",
   471				" workers=", c.dedicatedMarkWorkersNeeded,
   472				"+", c.fractionalUtilizationGoal, "\n")
   473		}
   474	}
   475	
   476	// revise updates the assist ratio during the GC cycle to account for
   477	// improved estimates. This should be called either under STW or
   478	// whenever memstats.heap_scan, memstats.heap_live, or
   479	// memstats.next_gc is updated (with mheap_.lock held).
   480	//
   481	// It should only be called when gcBlackenEnabled != 0 (because this
   482	// is when assists are enabled and the necessary statistics are
   483	// available).
   484	func (c *gcControllerState) revise() {
   485		gcpercent := gcpercent
   486		if gcpercent < 0 {
   487			// If GC is disabled but we're running a forced GC,
   488			// act like GOGC is huge for the below calculations.
   489			gcpercent = 100000
   490		}
   491		live := atomic.Load64(&memstats.heap_live)
   492	
   493		var heapGoal, scanWorkExpected int64
   494		if live <= memstats.next_gc {
   495			// We're under the soft goal. Pace GC to complete at
   496			// next_gc assuming the heap is in steady-state.
   497			heapGoal = int64(memstats.next_gc)
   498	
   499			// Compute the expected scan work remaining.
   500			//
   501			// This is estimated based on the expected
   502			// steady-state scannable heap. For example, with
   503			// GOGC=100, only half of the scannable heap is
   504			// expected to be live, so that's what we target.
   505			//
   506			// (This is a float calculation to avoid overflowing on
   507			// 100*heap_scan.)
   508			scanWorkExpected = int64(float64(memstats.heap_scan) * 100 / float64(100+gcpercent))
   509		} else {
   510			// We're past the soft goal. Pace GC so that in the
   511			// worst case it will complete by the hard goal.
   512			const maxOvershoot = 1.1
   513			heapGoal = int64(float64(memstats.next_gc) * maxOvershoot)
   514	
   515			// Compute the upper bound on the scan work remaining.
   516			scanWorkExpected = int64(memstats.heap_scan)
   517		}
   518	
   519		// Compute the remaining scan work estimate.
   520		//
   521		// Note that we currently count allocations during GC as both
   522		// scannable heap (heap_scan) and scan work completed
   523		// (scanWork), so allocation will change this difference will
   524		// slowly in the soft regime and not at all in the hard
   525		// regime.
   526		scanWorkRemaining := scanWorkExpected - c.scanWork
   527		if scanWorkRemaining < 1000 {
   528			// We set a somewhat arbitrary lower bound on
   529			// remaining scan work since if we aim a little high,
   530			// we can miss by a little.
   531			//
   532			// We *do* need to enforce that this is at least 1,
   533			// since marking is racy and double-scanning objects
   534			// may legitimately make the remaining scan work
   535			// negative, even in the hard goal regime.
   536			scanWorkRemaining = 1000
   537		}
   538	
   539		// Compute the heap distance remaining.
   540		heapRemaining := heapGoal - int64(live)
   541		if heapRemaining <= 0 {
   542			// This shouldn't happen, but if it does, avoid
   543			// dividing by zero or setting the assist negative.
   544			heapRemaining = 1
   545		}
   546	
   547		// Compute the mutator assist ratio so by the time the mutator
   548		// allocates the remaining heap bytes up to next_gc, it will
   549		// have done (or stolen) the remaining amount of scan work.
   550		c.assistWorkPerByte = float64(scanWorkRemaining) / float64(heapRemaining)
   551		c.assistBytesPerWork = float64(heapRemaining) / float64(scanWorkRemaining)
   552	}
   553	
   554	// endCycle computes the trigger ratio for the next cycle.
   555	func (c *gcControllerState) endCycle() float64 {
   556		if work.userForced {
   557			// Forced GC means this cycle didn't start at the
   558			// trigger, so where it finished isn't good
   559			// information about how to adjust the trigger.
   560			// Just leave it where it is.
   561			return memstats.triggerRatio
   562		}
   563	
   564		// Proportional response gain for the trigger controller. Must
   565		// be in [0, 1]. Lower values smooth out transient effects but
   566		// take longer to respond to phase changes. Higher values
   567		// react to phase changes quickly, but are more affected by
   568		// transient changes. Values near 1 may be unstable.
   569		const triggerGain = 0.5
   570	
   571		// Compute next cycle trigger ratio. First, this computes the
   572		// "error" for this cycle; that is, how far off the trigger
   573		// was from what it should have been, accounting for both heap
   574		// growth and GC CPU utilization. We compute the actual heap
   575		// growth during this cycle and scale that by how far off from
   576		// the goal CPU utilization we were (to estimate the heap
   577		// growth if we had the desired CPU utilization). The
   578		// difference between this estimate and the GOGC-based goal
   579		// heap growth is the error.
   580		goalGrowthRatio := gcEffectiveGrowthRatio()
   581		actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1
   582		assistDuration := nanotime() - c.markStartTime
   583	
   584		// Assume background mark hit its utilization goal.
   585		utilization := gcBackgroundUtilization
   586		// Add assist utilization; avoid divide by zero.
   587		if assistDuration > 0 {
   588			utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs))
   589		}
   590	
   591		triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio)
   592	
   593		// Finally, we adjust the trigger for next time by this error,
   594		// damped by the proportional gain.
   595		triggerRatio := memstats.triggerRatio + triggerGain*triggerError
   596	
   597		if debug.gcpacertrace > 0 {
   598			// Print controller state in terms of the design
   599			// document.
   600			H_m_prev := memstats.heap_marked
   601			h_t := memstats.triggerRatio
   602			H_T := memstats.gc_trigger
   603			h_a := actualGrowthRatio
   604			H_a := memstats.heap_live
   605			h_g := goalGrowthRatio
   606			H_g := int64(float64(H_m_prev) * (1 + h_g))
   607			u_a := utilization
   608			u_g := gcGoalUtilization
   609			W_a := c.scanWork
   610			print("pacer: H_m_prev=", H_m_prev,
   611				" h_t=", h_t, " H_T=", H_T,
   612				" h_a=", h_a, " H_a=", H_a,
   613				" h_g=", h_g, " H_g=", H_g,
   614				" u_a=", u_a, " u_g=", u_g,
   615				" W_a=", W_a,
   616				" goalΔ=", goalGrowthRatio-h_t,
   617				" actualΔ=", h_a-h_t,
   618				" u_a/u_g=", u_a/u_g,
   619				"\n")
   620		}
   621	
   622		return triggerRatio
   623	}
   624	
   625	// enlistWorker encourages another dedicated mark worker to start on
   626	// another P if there are spare worker slots. It is used by putfull
   627	// when more work is made available.
   628	//
   629	//go:nowritebarrier
   630	func (c *gcControllerState) enlistWorker() {
   631		// If there are idle Ps, wake one so it will run an idle worker.
   632		// NOTE: This is suspected of causing deadlocks. See golang.org/issue/19112.
   633		//
   634		//	if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 {
   635		//		wakep()
   636		//		return
   637		//	}
   638	
   639		// There are no idle Ps. If we need more dedicated workers,
   640		// try to preempt a running P so it will switch to a worker.
   641		if c.dedicatedMarkWorkersNeeded <= 0 {
   642			return
   643		}
   644		// Pick a random other P to preempt.
   645		if gomaxprocs <= 1 {
   646			return
   647		}
   648		gp := getg()
   649		if gp == nil || gp.m == nil || gp.m.p == 0 {
   650			return
   651		}
   652		myID := gp.m.p.ptr().id
   653		for tries := 0; tries < 5; tries++ {
   654			id := int32(fastrandn(uint32(gomaxprocs - 1)))
   655			if id >= myID {
   656				id++
   657			}
   658			p := allp[id]
   659			if p.status != _Prunning {
   660				continue
   661			}
   662			if preemptone(p) {
   663				return
   664			}
   665		}
   666	}
   667	
   668	// findRunnableGCWorker returns the background mark worker for _p_ if it
   669	// should be run. This must only be called when gcBlackenEnabled != 0.
   670	func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
   671		if gcBlackenEnabled == 0 {
   672			throw("gcControllerState.findRunnable: blackening not enabled")
   673		}
   674		if _p_.gcBgMarkWorker == 0 {
   675			// The mark worker associated with this P is blocked
   676			// performing a mark transition. We can't run it
   677			// because it may be on some other run or wait queue.
   678			return nil
   679		}
   680	
   681		if !gcMarkWorkAvailable(_p_) {
   682			// No work to be done right now. This can happen at
   683			// the end of the mark phase when there are still
   684			// assists tapering off. Don't bother running a worker
   685			// now because it'll just return immediately.
   686			return nil
   687		}
   688	
   689		decIfPositive := func(ptr *int64) bool {
   690			if *ptr > 0 {
   691				if atomic.Xaddint64(ptr, -1) >= 0 {
   692					return true
   693				}
   694				// We lost a race
   695				atomic.Xaddint64(ptr, +1)
   696			}
   697			return false
   698		}
   699	
   700		if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
   701			// This P is now dedicated to marking until the end of
   702			// the concurrent mark phase.
   703			_p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
   704		} else if c.fractionalUtilizationGoal == 0 {
   705			// No need for fractional workers.
   706			return nil
   707		} else {
   708			// Is this P behind on the fractional utilization
   709			// goal?
   710			//
   711			// This should be kept in sync with pollFractionalWorkerExit.
   712			delta := nanotime() - gcController.markStartTime
   713			if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
   714				// Nope. No need to run a fractional worker.
   715				return nil
   716			}
   717			// Run a fractional worker.
   718			_p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
   719		}
   720	
   721		// Run the background mark worker
   722		gp := _p_.gcBgMarkWorker.ptr()
   723		casgstatus(gp, _Gwaiting, _Grunnable)
   724		if trace.enabled {
   725			traceGoUnpark(gp, 0)
   726		}
   727		return gp
   728	}
   729	
   730	// pollFractionalWorkerExit reports whether a fractional mark worker
   731	// should self-preempt. It assumes it is called from the fractional
   732	// worker.
   733	func pollFractionalWorkerExit() bool {
   734		// This should be kept in sync with the fractional worker
   735		// scheduler logic in findRunnableGCWorker.
   736		now := nanotime()
   737		delta := now - gcController.markStartTime
   738		if delta <= 0 {
   739			return true
   740		}
   741		p := getg().m.p.ptr()
   742		selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime)
   743		// Add some slack to the utilization goal so that the
   744		// fractional worker isn't behind again the instant it exits.
   745		return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal
   746	}
   747	
   748	// gcSetTriggerRatio sets the trigger ratio and updates everything
   749	// derived from it: the absolute trigger, the heap goal, mark pacing,
   750	// and sweep pacing.
   751	//
   752	// This can be called any time. If GC is the in the middle of a
   753	// concurrent phase, it will adjust the pacing of that phase.
   754	//
   755	// This depends on gcpercent, memstats.heap_marked, and
   756	// memstats.heap_live. These must be up to date.
   757	//
   758	// mheap_.lock must be held or the world must be stopped.
   759	func gcSetTriggerRatio(triggerRatio float64) {
   760		// Compute the next GC goal, which is when the allocated heap
   761		// has grown by GOGC/100 over the heap marked by the last
   762		// cycle.
   763		goal := ^uint64(0)
   764		if gcpercent >= 0 {
   765			goal = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100
   766		}
   767	
   768		// Set the trigger ratio, capped to reasonable bounds.
   769		if triggerRatio < 0 {
   770			// This can happen if the mutator is allocating very
   771			// quickly or the GC is scanning very slowly.
   772			triggerRatio = 0
   773		} else if gcpercent >= 0 {
   774			// Ensure there's always a little margin so that the
   775			// mutator assist ratio isn't infinity.
   776			maxTriggerRatio := 0.95 * float64(gcpercent) / 100
   777			if triggerRatio > maxTriggerRatio {
   778				triggerRatio = maxTriggerRatio
   779			}
   780		}
   781		memstats.triggerRatio = triggerRatio
   782	
   783		// Compute the absolute GC trigger from the trigger ratio.
   784		//
   785		// We trigger the next GC cycle when the allocated heap has
   786		// grown by the trigger ratio over the marked heap size.
   787		trigger := ^uint64(0)
   788		if gcpercent >= 0 {
   789			trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio))
   790			// Don't trigger below the minimum heap size.
   791			minTrigger := heapminimum
   792			if !isSweepDone() {
   793				// Concurrent sweep happens in the heap growth
   794				// from heap_live to gc_trigger, so ensure
   795				// that concurrent sweep has some heap growth
   796				// in which to perform sweeping before we
   797				// start the next GC cycle.
   798				sweepMin := atomic.Load64(&memstats.heap_live) + sweepMinHeapDistance
   799				if sweepMin > minTrigger {
   800					minTrigger = sweepMin
   801				}
   802			}
   803			if trigger < minTrigger {
   804				trigger = minTrigger
   805			}
   806			if int64(trigger) < 0 {
   807				print("runtime: next_gc=", memstats.next_gc, " heap_marked=", memstats.heap_marked, " heap_live=", memstats.heap_live, " initialHeapLive=", work.initialHeapLive, "triggerRatio=", triggerRatio, " minTrigger=", minTrigger, "\n")
   808				throw("gc_trigger underflow")
   809			}
   810			if trigger > goal {
   811				// The trigger ratio is always less than GOGC/100, but
   812				// other bounds on the trigger may have raised it.
   813				// Push up the goal, too.
   814				goal = trigger
   815			}
   816		}
   817	
   818		// Commit to the trigger and goal.
   819		memstats.gc_trigger = trigger
   820		memstats.next_gc = goal
   821		if trace.enabled {
   822			traceNextGC()
   823		}
   824	
   825		// Update mark pacing.
   826		if gcphase != _GCoff {
   827			gcController.revise()
   828		}
   829	
   830		// Update sweep pacing.
   831		if isSweepDone() {
   832			mheap_.sweepPagesPerByte = 0
   833		} else {
   834			// Concurrent sweep needs to sweep all of the in-use
   835			// pages by the time the allocated heap reaches the GC
   836			// trigger. Compute the ratio of in-use pages to sweep
   837			// per byte allocated, accounting for the fact that
   838			// some might already be swept.
   839			heapLiveBasis := atomic.Load64(&memstats.heap_live)
   840			heapDistance := int64(trigger) - int64(heapLiveBasis)
   841			// Add a little margin so rounding errors and
   842			// concurrent sweep are less likely to leave pages
   843			// unswept when GC starts.
   844			heapDistance -= 1024 * 1024
   845			if heapDistance < _PageSize {
   846				// Avoid setting the sweep ratio extremely high
   847				heapDistance = _PageSize
   848			}
   849			pagesSwept := atomic.Load64(&mheap_.pagesSwept)
   850			sweepDistancePages := int64(mheap_.pagesInUse) - int64(pagesSwept)
   851			if sweepDistancePages <= 0 {
   852				mheap_.sweepPagesPerByte = 0
   853			} else {
   854				mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance)
   855				mheap_.sweepHeapLiveBasis = heapLiveBasis
   856				// Write pagesSweptBasis last, since this
   857				// signals concurrent sweeps to recompute
   858				// their debt.
   859				atomic.Store64(&mheap_.pagesSweptBasis, pagesSwept)
   860			}
   861		}
   862	
   863		gcPaceScavenger()
   864	}
   865	
   866	// gcEffectiveGrowthRatio returns the current effective heap growth
   867	// ratio (GOGC/100) based on heap_marked from the previous GC and
   868	// next_gc for the current GC.
   869	//
   870	// This may differ from gcpercent/100 because of various upper and
   871	// lower bounds on gcpercent. For example, if the heap is smaller than
   872	// heapminimum, this can be higher than gcpercent/100.
   873	//
   874	// mheap_.lock must be held or the world must be stopped.
   875	func gcEffectiveGrowthRatio() float64 {
   876		egogc := float64(memstats.next_gc-memstats.heap_marked) / float64(memstats.heap_marked)
   877		if egogc < 0 {
   878			// Shouldn't happen, but just in case.
   879			egogc = 0
   880		}
   881		return egogc
   882	}
   883	
   884	// gcGoalUtilization is the goal CPU utilization for
   885	// marking as a fraction of GOMAXPROCS.
   886	const gcGoalUtilization = 0.30
   887	
   888	// gcBackgroundUtilization is the fixed CPU utilization for background
   889	// marking. It must be <= gcGoalUtilization. The difference between
   890	// gcGoalUtilization and gcBackgroundUtilization will be made up by
   891	// mark assists. The scheduler will aim to use within 50% of this
   892	// goal.
   893	//
   894	// Setting this to < gcGoalUtilization avoids saturating the trigger
   895	// feedback controller when there are no assists, which allows it to
   896	// better control CPU and heap growth. However, the larger the gap,
   897	// the more mutator assists are expected to happen, which impact
   898	// mutator latency.
   899	const gcBackgroundUtilization = 0.25
   900	
   901	// gcCreditSlack is the amount of scan work credit that can
   902	// accumulate locally before updating gcController.scanWork and,
   903	// optionally, gcController.bgScanCredit. Lower values give a more
   904	// accurate assist ratio and make it more likely that assists will
   905	// successfully steal background credit. Higher values reduce memory
   906	// contention.
   907	const gcCreditSlack = 2000
   908	
   909	// gcAssistTimeSlack is the nanoseconds of mutator assist time that
   910	// can accumulate on a P before updating gcController.assistTime.
   911	const gcAssistTimeSlack = 5000
   912	
   913	// gcOverAssistWork determines how many extra units of scan work a GC
   914	// assist does when an assist happens. This amortizes the cost of an
   915	// assist by pre-paying for this many bytes of future allocations.
   916	const gcOverAssistWork = 64 << 10
   917	
   918	var work struct {
   919		full  lfstack          // lock-free list of full blocks workbuf
   920		empty lfstack          // lock-free list of empty blocks workbuf
   921		pad0  cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait
   922	
   923		wbufSpans struct {
   924			lock mutex
   925			// free is a list of spans dedicated to workbufs, but
   926			// that don't currently contain any workbufs.
   927			free mSpanList
   928			// busy is a list of all spans containing workbufs on
   929			// one of the workbuf lists.
   930			busy mSpanList
   931		}
   932	
   933		// Restore 64-bit alignment on 32-bit.
   934		_ uint32
   935	
   936		// bytesMarked is the number of bytes marked this cycle. This
   937		// includes bytes blackened in scanned objects, noscan objects
   938		// that go straight to black, and permagrey objects scanned by
   939		// markroot during the concurrent scan phase. This is updated
   940		// atomically during the cycle. Updates may be batched
   941		// arbitrarily, since the value is only read at the end of the
   942		// cycle.
   943		//
   944		// Because of benign races during marking, this number may not
   945		// be the exact number of marked bytes, but it should be very
   946		// close.
   947		//
   948		// Put this field here because it needs 64-bit atomic access
   949		// (and thus 8-byte alignment even on 32-bit architectures).
   950		bytesMarked uint64
   951	
   952		markrootNext uint32 // next markroot job
   953		markrootJobs uint32 // number of markroot jobs
   954	
   955		nproc  uint32
   956		tstart int64
   957		nwait  uint32
   958		ndone  uint32
   959	
   960		// Number of roots of various root types. Set by gcMarkRootPrepare.
   961		nFlushCacheRoots                               int
   962		nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int
   963	
   964		// Each type of GC state transition is protected by a lock.
   965		// Since multiple threads can simultaneously detect the state
   966		// transition condition, any thread that detects a transition
   967		// condition must acquire the appropriate transition lock,
   968		// re-check the transition condition and return if it no
   969		// longer holds or perform the transition if it does.
   970		// Likewise, any transition must invalidate the transition
   971		// condition before releasing the lock. This ensures that each
   972		// transition is performed by exactly one thread and threads
   973		// that need the transition to happen block until it has
   974		// happened.
   975		//
   976		// startSema protects the transition from "off" to mark or
   977		// mark termination.
   978		startSema uint32
   979		// markDoneSema protects transitions from mark to mark termination.
   980		markDoneSema uint32
   981	
   982		bgMarkReady note   // signal background mark worker has started
   983		bgMarkDone  uint32 // cas to 1 when at a background mark completion point
   984		// Background mark completion signaling
   985	
   986		// mode is the concurrency mode of the current GC cycle.
   987		mode gcMode
   988	
   989		// userForced indicates the current GC cycle was forced by an
   990		// explicit user call.
   991		userForced bool
   992	
   993		// totaltime is the CPU nanoseconds spent in GC since the
   994		// program started if debug.gctrace > 0.
   995		totaltime int64
   996	
   997		// initialHeapLive is the value of memstats.heap_live at the
   998		// beginning of this GC cycle.
   999		initialHeapLive uint64
  1000	
  1001		// assistQueue is a queue of assists that are blocked because
  1002		// there was neither enough credit to steal or enough work to
  1003		// do.
  1004		assistQueue struct {
  1005			lock mutex
  1006			q    gQueue
  1007		}
  1008	
  1009		// sweepWaiters is a list of blocked goroutines to wake when
  1010		// we transition from mark termination to sweep.
  1011		sweepWaiters struct {
  1012			lock mutex
  1013			list gList
  1014		}
  1015	
  1016		// cycles is the number of completed GC cycles, where a GC
  1017		// cycle is sweep termination, mark, mark termination, and
  1018		// sweep. This differs from memstats.numgc, which is
  1019		// incremented at mark termination.
  1020		cycles uint32
  1021	
  1022		// Timing/utilization stats for this cycle.
  1023		stwprocs, maxprocs                 int32
  1024		tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start
  1025	
  1026		pauseNS    int64 // total STW time this cycle
  1027		pauseStart int64 // nanotime() of last STW
  1028	
  1029		// debug.gctrace heap sizes for this cycle.
  1030		heap0, heap1, heap2, heapGoal uint64
  1031	}
  1032	
  1033	// GC runs a garbage collection and blocks the caller until the
  1034	// garbage collection is complete. It may also block the entire
  1035	// program.
  1036	func GC() {
  1037		// We consider a cycle to be: sweep termination, mark, mark
  1038		// termination, and sweep. This function shouldn't return
  1039		// until a full cycle has been completed, from beginning to
  1040		// end. Hence, we always want to finish up the current cycle
  1041		// and start a new one. That means:
  1042		//
  1043		// 1. In sweep termination, mark, or mark termination of cycle
  1044		// N, wait until mark termination N completes and transitions
  1045		// to sweep N.
  1046		//
  1047		// 2. In sweep N, help with sweep N.
  1048		//
  1049		// At this point we can begin a full cycle N+1.
  1050		//
  1051		// 3. Trigger cycle N+1 by starting sweep termination N+1.
  1052		//
  1053		// 4. Wait for mark termination N+1 to complete.
  1054		//
  1055		// 5. Help with sweep N+1 until it's done.
  1056		//
  1057		// This all has to be written to deal with the fact that the
  1058		// GC may move ahead on its own. For example, when we block
  1059		// until mark termination N, we may wake up in cycle N+2.
  1060	
  1061		// Wait until the current sweep termination, mark, and mark
  1062		// termination complete.
  1063		n := atomic.Load(&work.cycles)
  1064		gcWaitOnMark(n)
  1065	
  1066		// We're now in sweep N or later. Trigger GC cycle N+1, which
  1067		// will first finish sweep N if necessary and then enter sweep
  1068		// termination N+1.
  1069		gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
  1070	
  1071		// Wait for mark termination N+1 to complete.
  1072		gcWaitOnMark(n + 1)
  1073	
  1074		// Finish sweep N+1 before returning. We do this both to
  1075		// complete the cycle and because runtime.GC() is often used
  1076		// as part of tests and benchmarks to get the system into a
  1077		// relatively stable and isolated state.
  1078		for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
  1079			sweep.nbgsweep++
  1080			Gosched()
  1081		}
  1082	
  1083		// Callers may assume that the heap profile reflects the
  1084		// just-completed cycle when this returns (historically this
  1085		// happened because this was a STW GC), but right now the
  1086		// profile still reflects mark termination N, not N+1.
  1087		//
  1088		// As soon as all of the sweep frees from cycle N+1 are done,
  1089		// we can go ahead and publish the heap profile.
  1090		//
  1091		// First, wait for sweeping to finish. (We know there are no
  1092		// more spans on the sweep queue, but we may be concurrently
  1093		// sweeping spans, so we have to wait.)
  1094		for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 {
  1095			Gosched()
  1096		}
  1097	
  1098		// Now we're really done with sweeping, so we can publish the
  1099		// stable heap profile. Only do this if we haven't already hit
  1100		// another mark termination.
  1101		mp := acquirem()
  1102		cycle := atomic.Load(&work.cycles)
  1103		if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
  1104			mProf_PostSweep()
  1105		}
  1106		releasem(mp)
  1107	}
  1108	
  1109	// gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has
  1110	// already completed this mark phase, it returns immediately.
  1111	func gcWaitOnMark(n uint32) {
  1112		for {
  1113			// Disable phase transitions.
  1114			lock(&work.sweepWaiters.lock)
  1115			nMarks := atomic.Load(&work.cycles)
  1116			if gcphase != _GCmark {
  1117				// We've already completed this cycle's mark.
  1118				nMarks++
  1119			}
  1120			if nMarks > n {
  1121				// We're done.
  1122				unlock(&work.sweepWaiters.lock)
  1123				return
  1124			}
  1125	
  1126			// Wait until sweep termination, mark, and mark
  1127			// termination of cycle N complete.
  1128			work.sweepWaiters.list.push(getg())
  1129			goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceEvGoBlock, 1)
  1130		}
  1131	}
  1132	
  1133	// gcMode indicates how concurrent a GC cycle should be.
  1134	type gcMode int
  1135	
  1136	const (
  1137		gcBackgroundMode gcMode = iota // concurrent GC and sweep
  1138		gcForceMode                    // stop-the-world GC now, concurrent sweep
  1139		gcForceBlockMode               // stop-the-world GC now and STW sweep (forced by user)
  1140	)
  1141	
  1142	// A gcTrigger is a predicate for starting a GC cycle. Specifically,
  1143	// it is an exit condition for the _GCoff phase.
  1144	type gcTrigger struct {
  1145		kind gcTriggerKind
  1146		now  int64  // gcTriggerTime: current time
  1147		n    uint32 // gcTriggerCycle: cycle number to start
  1148	}
  1149	
  1150	type gcTriggerKind int
  1151	
  1152	const (
  1153		// gcTriggerHeap indicates that a cycle should be started when
  1154		// the heap size reaches the trigger heap size computed by the
  1155		// controller.
  1156		gcTriggerHeap gcTriggerKind = iota
  1157	
  1158		// gcTriggerTime indicates that a cycle should be started when
  1159		// it's been more than forcegcperiod nanoseconds since the
  1160		// previous GC cycle.
  1161		gcTriggerTime
  1162	
  1163		// gcTriggerCycle indicates that a cycle should be started if
  1164		// we have not yet started cycle number gcTrigger.n (relative
  1165		// to work.cycles).
  1166		gcTriggerCycle
  1167	)
  1168	
  1169	// test reports whether the trigger condition is satisfied, meaning
  1170	// that the exit condition for the _GCoff phase has been met. The exit
  1171	// condition should be tested when allocating.
  1172	func (t gcTrigger) test() bool {
  1173		if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
  1174			return false
  1175		}
  1176		switch t.kind {
  1177		case gcTriggerHeap:
  1178			// Non-atomic access to heap_live for performance. If
  1179			// we are going to trigger on this, this thread just
  1180			// atomically wrote heap_live anyway and we'll see our
  1181			// own write.
  1182			return memstats.heap_live >= memstats.gc_trigger
  1183		case gcTriggerTime:
  1184			if gcpercent < 0 {
  1185				return false
  1186			}
  1187			lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
  1188			return lastgc != 0 && t.now-lastgc > forcegcperiod
  1189		case gcTriggerCycle:
  1190			// t.n > work.cycles, but accounting for wraparound.
  1191			return int32(t.n-work.cycles) > 0
  1192		}
  1193		return true
  1194	}
  1195	
  1196	// gcStart starts the GC. It transitions from _GCoff to _GCmark (if
  1197	// debug.gcstoptheworld == 0) or performs all of GC (if
  1198	// debug.gcstoptheworld != 0).
  1199	//
  1200	// This may return without performing this transition in some cases,
  1201	// such as when called on a system stack or with locks held.
  1202	func gcStart(trigger gcTrigger) {
  1203		// Since this is called from malloc and malloc is called in
  1204		// the guts of a number of libraries that might be holding
  1205		// locks, don't attempt to start GC in non-preemptible or
  1206		// potentially unstable situations.
  1207		mp := acquirem()
  1208		if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
  1209			releasem(mp)
  1210			return
  1211		}
  1212		releasem(mp)
  1213		mp = nil
  1214	
  1215		// Pick up the remaining unswept/not being swept spans concurrently
  1216		//
  1217		// This shouldn't happen if we're being invoked in background
  1218		// mode since proportional sweep should have just finished
  1219		// sweeping everything, but rounding errors, etc, may leave a
  1220		// few spans unswept. In forced mode, this is necessary since
  1221		// GC can be forced at any point in the sweeping cycle.
  1222		//
  1223		// We check the transition condition continuously here in case
  1224		// this G gets delayed in to the next GC cycle.
  1225		for trigger.test() && sweepone() != ^uintptr(0) {
  1226			sweep.nbgsweep++
  1227		}
  1228	
  1229		// Perform GC initialization and the sweep termination
  1230		// transition.
  1231		semacquire(&work.startSema)
  1232		// Re-check transition condition under transition lock.
  1233		if !trigger.test() {
  1234			semrelease(&work.startSema)
  1235			return
  1236		}
  1237	
  1238		// For stats, check if this GC was forced by the user.
  1239		work.userForced = trigger.kind == gcTriggerCycle
  1240	
  1241		// In gcstoptheworld debug mode, upgrade the mode accordingly.
  1242		// We do this after re-checking the transition condition so
  1243		// that multiple goroutines that detect the heap trigger don't
  1244		// start multiple STW GCs.
  1245		mode := gcBackgroundMode
  1246		if debug.gcstoptheworld == 1 {
  1247			mode = gcForceMode
  1248		} else if debug.gcstoptheworld == 2 {
  1249			mode = gcForceBlockMode
  1250		}
  1251	
  1252		// Ok, we're doing it! Stop everybody else
  1253		semacquire(&worldsema)
  1254	
  1255		if trace.enabled {
  1256			traceGCStart()
  1257		}
  1258	
  1259		// Check that all Ps have finished deferred mcache flushes.
  1260		for _, p := range allp {
  1261			if fg := atomic.Load(&p.mcache.flushGen); fg != mheap_.sweepgen {
  1262				println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen)
  1263				throw("p mcache not flushed")
  1264			}
  1265		}
  1266	
  1267		gcBgMarkStartWorkers()
  1268	
  1269		systemstack(gcResetMarkState)
  1270	
  1271		work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
  1272		if work.stwprocs > ncpu {
  1273			// This is used to compute CPU time of the STW phases,
  1274			// so it can't be more than ncpu, even if GOMAXPROCS is.
  1275			work.stwprocs = ncpu
  1276		}
  1277		work.heap0 = atomic.Load64(&memstats.heap_live)
  1278		work.pauseNS = 0
  1279		work.mode = mode
  1280	
  1281		now := nanotime()
  1282		work.tSweepTerm = now
  1283		work.pauseStart = now
  1284		if trace.enabled {
  1285			traceGCSTWStart(1)
  1286		}
  1287		systemstack(stopTheWorldWithSema)
  1288		// Finish sweep before we start concurrent scan.
  1289		systemstack(func() {
  1290			finishsweep_m()
  1291		})
  1292		// clearpools before we start the GC. If we wait they memory will not be
  1293		// reclaimed until the next GC cycle.
  1294		clearpools()
  1295	
  1296		work.cycles++
  1297	
  1298		gcController.startCycle()
  1299		work.heapGoal = memstats.next_gc
  1300	
  1301		// In STW mode, disable scheduling of user Gs. This may also
  1302		// disable scheduling of this goroutine, so it may block as
  1303		// soon as we start the world again.
  1304		if mode != gcBackgroundMode {
  1305			schedEnableUser(false)
  1306		}
  1307	
  1308		// Enter concurrent mark phase and enable
  1309		// write barriers.
  1310		//
  1311		// Because the world is stopped, all Ps will
  1312		// observe that write barriers are enabled by
  1313		// the time we start the world and begin
  1314		// scanning.
  1315		//
  1316		// Write barriers must be enabled before assists are
  1317		// enabled because they must be enabled before
  1318		// any non-leaf heap objects are marked. Since
  1319		// allocations are blocked until assists can
  1320		// happen, we want enable assists as early as
  1321		// possible.
  1322		setGCPhase(_GCmark)
  1323	
  1324		gcBgMarkPrepare() // Must happen before assist enable.
  1325		gcMarkRootPrepare()
  1326	
  1327		// Mark all active tinyalloc blocks. Since we're
  1328		// allocating from these, they need to be black like
  1329		// other allocations. The alternative is to blacken
  1330		// the tiny block on every allocation from it, which
  1331		// would slow down the tiny allocator.
  1332		gcMarkTinyAllocs()
  1333	
  1334		// At this point all Ps have enabled the write
  1335		// barrier, thus maintaining the no white to
  1336		// black invariant. Enable mutator assists to
  1337		// put back-pressure on fast allocating
  1338		// mutators.
  1339		atomic.Store(&gcBlackenEnabled, 1)
  1340	
  1341		// Assists and workers can start the moment we start
  1342		// the world.
  1343		gcController.markStartTime = now
  1344	
  1345		// Concurrent mark.
  1346		systemstack(func() {
  1347			now = startTheWorldWithSema(trace.enabled)
  1348			work.pauseNS += now - work.pauseStart
  1349			work.tMark = now
  1350		})
  1351		// In STW mode, we could block the instant systemstack
  1352		// returns, so don't do anything important here. Make sure we
  1353		// block rather than returning to user code.
  1354		if mode != gcBackgroundMode {
  1355			Gosched()
  1356		}
  1357	
  1358		semrelease(&work.startSema)
  1359	}
  1360	
  1361	// gcMarkDoneFlushed counts the number of P's with flushed work.
  1362	//
  1363	// Ideally this would be a captured local in gcMarkDone, but forEachP
  1364	// escapes its callback closure, so it can't capture anything.
  1365	//
  1366	// This is protected by markDoneSema.
  1367	var gcMarkDoneFlushed uint32
  1368	
  1369	// debugCachedWork enables extra checks for debugging premature mark
  1370	// termination.
  1371	//
  1372	// For debugging issue #27993.
  1373	const debugCachedWork = false
  1374	
  1375	// gcWorkPauseGen is for debugging the mark completion algorithm.
  1376	// gcWork put operations spin while gcWork.pauseGen == gcWorkPauseGen.
  1377	// Only used if debugCachedWork is true.
  1378	//
  1379	// For debugging issue #27993.
  1380	var gcWorkPauseGen uint32 = 1
  1381	
  1382	// gcMarkDone transitions the GC from mark to mark termination if all
  1383	// reachable objects have been marked (that is, there are no grey
  1384	// objects and can be no more in the future). Otherwise, it flushes
  1385	// all local work to the global queues where it can be discovered by
  1386	// other workers.
  1387	//
  1388	// This should be called when all local mark work has been drained and
  1389	// there are no remaining workers. Specifically, when
  1390	//
  1391	//   work.nwait == work.nproc && !gcMarkWorkAvailable(p)
  1392	//
  1393	// The calling context must be preemptible.
  1394	//
  1395	// Flushing local work is important because idle Ps may have local
  1396	// work queued. This is the only way to make that work visible and
  1397	// drive GC to completion.
  1398	//
  1399	// It is explicitly okay to have write barriers in this function. If
  1400	// it does transition to mark termination, then all reachable objects
  1401	// have been marked, so the write barrier cannot shade any more
  1402	// objects.
  1403	func gcMarkDone() {
  1404		// Ensure only one thread is running the ragged barrier at a
  1405		// time.
  1406		semacquire(&work.markDoneSema)
  1407	
  1408	top:
  1409		// Re-check transition condition under transition lock.
  1410		//
  1411		// It's critical that this checks the global work queues are
  1412		// empty before performing the ragged barrier. Otherwise,
  1413		// there could be global work that a P could take after the P
  1414		// has passed the ragged barrier.
  1415		if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
  1416			semrelease(&work.markDoneSema)
  1417			return
  1418		}
  1419	
  1420		// Flush all local buffers and collect flushedWork flags.
  1421		gcMarkDoneFlushed = 0
  1422		systemstack(func() {
  1423			gp := getg().m.curg
  1424			// Mark the user stack as preemptible so that it may be scanned.
  1425			// Otherwise, our attempt to force all P's to a safepoint could
  1426			// result in a deadlock as we attempt to preempt a worker that's
  1427			// trying to preempt us (e.g. for a stack scan).
  1428			casgstatus(gp, _Grunning, _Gwaiting)
  1429			forEachP(func(_p_ *p) {
  1430				// Flush the write barrier buffer, since this may add
  1431				// work to the gcWork.
  1432				wbBufFlush1(_p_)
  1433				// For debugging, shrink the write barrier
  1434				// buffer so it flushes immediately.
  1435				// wbBuf.reset will keep it at this size as
  1436				// long as throwOnGCWork is set.
  1437				if debugCachedWork {
  1438					b := &_p_.wbBuf
  1439					b.end = uintptr(unsafe.Pointer(&b.buf[wbBufEntryPointers]))
  1440					b.debugGen = gcWorkPauseGen
  1441				}
  1442				// Flush the gcWork, since this may create global work
  1443				// and set the flushedWork flag.
  1444				//
  1445				// TODO(austin): Break up these workbufs to
  1446				// better distribute work.
  1447				_p_.gcw.dispose()
  1448				// Collect the flushedWork flag.
  1449				if _p_.gcw.flushedWork {
  1450					atomic.Xadd(&gcMarkDoneFlushed, 1)
  1451					_p_.gcw.flushedWork = false
  1452				} else if debugCachedWork {
  1453					// For debugging, freeze the gcWork
  1454					// until we know whether we've reached
  1455					// completion or not. If we think
  1456					// we've reached completion, but
  1457					// there's a paused gcWork, then
  1458					// that's a bug.
  1459					_p_.gcw.pauseGen = gcWorkPauseGen
  1460					// Capture the G's stack.
  1461					for i := range _p_.gcw.pauseStack {
  1462						_p_.gcw.pauseStack[i] = 0
  1463					}
  1464					callers(1, _p_.gcw.pauseStack[:])
  1465				}
  1466			})
  1467			casgstatus(gp, _Gwaiting, _Grunning)
  1468		})
  1469	
  1470		if gcMarkDoneFlushed != 0 {
  1471			if debugCachedWork {
  1472				// Release paused gcWorks.
  1473				atomic.Xadd(&gcWorkPauseGen, 1)
  1474			}
  1475			// More grey objects were discovered since the
  1476			// previous termination check, so there may be more
  1477			// work to do. Keep going. It's possible the
  1478			// transition condition became true again during the
  1479			// ragged barrier, so re-check it.
  1480			goto top
  1481		}
  1482	
  1483		if debugCachedWork {
  1484			throwOnGCWork = true
  1485			// Release paused gcWorks. If there are any, they
  1486			// should now observe throwOnGCWork and panic.
  1487			atomic.Xadd(&gcWorkPauseGen, 1)
  1488		}
  1489	
  1490		// There was no global work, no local work, and no Ps
  1491		// communicated work since we took markDoneSema. Therefore
  1492		// there are no grey objects and no more objects can be
  1493		// shaded. Transition to mark termination.
  1494		now := nanotime()
  1495		work.tMarkTerm = now
  1496		work.pauseStart = now
  1497		getg().m.preemptoff = "gcing"
  1498		if trace.enabled {
  1499			traceGCSTWStart(0)
  1500		}
  1501		systemstack(stopTheWorldWithSema)
  1502		// The gcphase is _GCmark, it will transition to _GCmarktermination
  1503		// below. The important thing is that the wb remains active until
  1504		// all marking is complete. This includes writes made by the GC.
  1505	
  1506		if debugCachedWork {
  1507			// For debugging, double check that no work was added after we
  1508			// went around above and disable write barrier buffering.
  1509			for _, p := range allp {
  1510				gcw := &p.gcw
  1511				if !gcw.empty() {
  1512					printlock()
  1513					print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork)
  1514					if gcw.wbuf1 == nil {
  1515						print(" wbuf1=<nil>")
  1516					} else {
  1517						print(" wbuf1.n=", gcw.wbuf1.nobj)
  1518					}
  1519					if gcw.wbuf2 == nil {
  1520						print(" wbuf2=<nil>")
  1521					} else {
  1522						print(" wbuf2.n=", gcw.wbuf2.nobj)
  1523					}
  1524					print("\n")
  1525					if gcw.pauseGen == gcw.putGen {
  1526						println("runtime: checkPut already failed at this generation")
  1527					}
  1528					throw("throwOnGCWork")
  1529				}
  1530			}
  1531		} else {
  1532			// For unknown reasons (see issue #27993), there is
  1533			// sometimes work left over when we enter mark
  1534			// termination. Detect this and resume concurrent
  1535			// mark. This is obviously unfortunate.
  1536			//
  1537			// Switch to the system stack to call wbBufFlush1,
  1538			// though in this case it doesn't matter because we're
  1539			// non-preemptible anyway.
  1540			restart := false
  1541			systemstack(func() {
  1542				for _, p := range allp {
  1543					wbBufFlush1(p)
  1544					if !p.gcw.empty() {
  1545						restart = true
  1546						break
  1547					}
  1548				}
  1549			})
  1550			if restart {
  1551				getg().m.preemptoff = ""
  1552				systemstack(func() {
  1553					now := startTheWorldWithSema(true)
  1554					work.pauseNS += now - work.pauseStart
  1555				})
  1556				goto top
  1557			}
  1558		}
  1559	
  1560		// Disable assists and background workers. We must do
  1561		// this before waking blocked assists.
  1562		atomic.Store(&gcBlackenEnabled, 0)
  1563	
  1564		// Wake all blocked assists. These will run when we
  1565		// start the world again.
  1566		gcWakeAllAssists()
  1567	
  1568		// Likewise, release the transition lock. Blocked
  1569		// workers and assists will run when we start the
  1570		// world again.
  1571		semrelease(&work.markDoneSema)
  1572	
  1573		// In STW mode, re-enable user goroutines. These will be
  1574		// queued to run after we start the world.
  1575		schedEnableUser(true)
  1576	
  1577		// endCycle depends on all gcWork cache stats being flushed.
  1578		// The termination algorithm above ensured that up to
  1579		// allocations since the ragged barrier.
  1580		nextTriggerRatio := gcController.endCycle()
  1581	
  1582		// Perform mark termination. This will restart the world.
  1583		gcMarkTermination(nextTriggerRatio)
  1584	}
  1585	
  1586	func gcMarkTermination(nextTriggerRatio float64) {
  1587		// World is stopped.
  1588		// Start marktermination which includes enabling the write barrier.
  1589		atomic.Store(&gcBlackenEnabled, 0)
  1590		setGCPhase(_GCmarktermination)
  1591	
  1592		work.heap1 = memstats.heap_live
  1593		startTime := nanotime()
  1594	
  1595		mp := acquirem()
  1596		mp.preemptoff = "gcing"
  1597		_g_ := getg()
  1598		_g_.m.traceback = 2
  1599		gp := _g_.m.curg
  1600		casgstatus(gp, _Grunning, _Gwaiting)
  1601		gp.waitreason = waitReasonGarbageCollection
  1602	
  1603		// Run gc on the g0 stack. We do this so that the g stack
  1604		// we're currently running on will no longer change. Cuts
  1605		// the root set down a bit (g0 stacks are not scanned, and
  1606		// we don't need to scan gc's internal state).  We also
  1607		// need to switch to g0 so we can shrink the stack.
  1608		systemstack(func() {
  1609			gcMark(startTime)
  1610			// Must return immediately.
  1611			// The outer function's stack may have moved
  1612			// during gcMark (it shrinks stacks, including the
  1613			// outer function's stack), so we must not refer
  1614			// to any of its variables. Return back to the
  1615			// non-system stack to pick up the new addresses
  1616			// before continuing.
  1617		})
  1618	
  1619		systemstack(func() {
  1620			work.heap2 = work.bytesMarked
  1621			if debug.gccheckmark > 0 {
  1622				// Run a full non-parallel, stop-the-world
  1623				// mark using checkmark bits, to check that we
  1624				// didn't forget to mark anything during the
  1625				// concurrent mark process.
  1626				gcResetMarkState()
  1627				initCheckmarks()
  1628				gcw := &getg().m.p.ptr().gcw
  1629				gcDrain(gcw, 0)
  1630				wbBufFlush1(getg().m.p.ptr())
  1631				gcw.dispose()
  1632				clearCheckmarks()
  1633			}
  1634	
  1635			// marking is complete so we can turn the write barrier off
  1636			setGCPhase(_GCoff)
  1637			gcSweep(work.mode)
  1638		})
  1639	
  1640		_g_.m.traceback = 0
  1641		casgstatus(gp, _Gwaiting, _Grunning)
  1642	
  1643		if trace.enabled {
  1644			traceGCDone()
  1645		}
  1646	
  1647		// all done
  1648		mp.preemptoff = ""
  1649	
  1650		if gcphase != _GCoff {
  1651			throw("gc done but gcphase != _GCoff")
  1652		}
  1653	
  1654		// Record next_gc and heap_inuse for scavenger.
  1655		memstats.last_next_gc = memstats.next_gc
  1656		memstats.last_heap_inuse = memstats.heap_inuse
  1657	
  1658		// Update GC trigger and pacing for the next cycle.
  1659		gcSetTriggerRatio(nextTriggerRatio)
  1660	
  1661		// Pacing changed, so the scavenger should be awoken.
  1662		wakeScavenger()
  1663	
  1664		// Update timing memstats
  1665		now := nanotime()
  1666		sec, nsec, _ := time_now()
  1667		unixNow := sec*1e9 + int64(nsec)
  1668		work.pauseNS += now - work.pauseStart
  1669		work.tEnd = now
  1670		atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
  1671		atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
  1672		memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
  1673		memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
  1674		memstats.pause_total_ns += uint64(work.pauseNS)
  1675	
  1676		// Update work.totaltime.
  1677		sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
  1678		// We report idle marking time below, but omit it from the
  1679		// overall utilization here since it's "free".
  1680		markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
  1681		markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
  1682		cycleCpu := sweepTermCpu + markCpu + markTermCpu
  1683		work.totaltime += cycleCpu
  1684	
  1685		// Compute overall GC CPU utilization.
  1686		totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
  1687		memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)
  1688	
  1689		// Reset sweep state.
  1690		sweep.nbgsweep = 0
  1691		sweep.npausesweep = 0
  1692	
  1693		if work.userForced {
  1694			memstats.numforcedgc++
  1695		}
  1696	
  1697		// Bump GC cycle count and wake goroutines waiting on sweep.
  1698		lock(&work.sweepWaiters.lock)
  1699		memstats.numgc++
  1700		injectglist(&work.sweepWaiters.list)
  1701		unlock(&work.sweepWaiters.lock)
  1702	
  1703		// Finish the current heap profiling cycle and start a new
  1704		// heap profiling cycle. We do this before starting the world
  1705		// so events don't leak into the wrong cycle.
  1706		mProf_NextCycle()
  1707	
  1708		systemstack(func() { startTheWorldWithSema(true) })
  1709	
  1710		// Flush the heap profile so we can start a new cycle next GC.
  1711		// This is relatively expensive, so we don't do it with the
  1712		// world stopped.
  1713		mProf_Flush()
  1714	
  1715		// Prepare workbufs for freeing by the sweeper. We do this
  1716		// asynchronously because it can take non-trivial time.
  1717		prepareFreeWorkbufs()
  1718	
  1719		// Free stack spans. This must be done between GC cycles.
  1720		systemstack(freeStackSpans)
  1721	
  1722		// Ensure all mcaches are flushed. Each P will flush its own
  1723		// mcache before allocating, but idle Ps may not. Since this
  1724		// is necessary to sweep all spans, we need to ensure all
  1725		// mcaches are flushed before we start the next GC cycle.
  1726		systemstack(func() {
  1727			forEachP(func(_p_ *p) {
  1728				_p_.mcache.prepareForSweep()
  1729			})
  1730		})
  1731	
  1732		// Print gctrace before dropping worldsema. As soon as we drop
  1733		// worldsema another cycle could start and smash the stats
  1734		// we're trying to print.
  1735		if debug.gctrace > 0 {
  1736			util := int(memstats.gc_cpu_fraction * 100)
  1737	
  1738			var sbuf [24]byte
  1739			printlock()
  1740			print("gc ", memstats.numgc,
  1741				" @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
  1742				util, "%: ")
  1743			prev := work.tSweepTerm
  1744			for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
  1745				if i != 0 {
  1746					print("+")
  1747				}
  1748				print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
  1749				prev = ns
  1750			}
  1751			print(" ms clock, ")
  1752			for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
  1753				if i == 2 || i == 3 {
  1754					// Separate mark time components with /.
  1755					print("/")
  1756				} else if i != 0 {
  1757					print("+")
  1758				}
  1759				print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
  1760			}
  1761			print(" ms cpu, ",
  1762				work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
  1763				work.heapGoal>>20, " MB goal, ",
  1764				work.maxprocs, " P")
  1765			if work.userForced {
  1766				print(" (forced)")
  1767			}
  1768			print("\n")
  1769			printunlock()
  1770		}
  1771	
  1772		semrelease(&worldsema)
  1773		// Careful: another GC cycle may start now.
  1774	
  1775		releasem(mp)
  1776		mp = nil
  1777	
  1778		// now that gc is done, kick off finalizer thread if needed
  1779		if !concurrentSweep {
  1780			// give the queued finalizers, if any, a chance to run
  1781			Gosched()
  1782		}
  1783	}
  1784	
  1785	// gcBgMarkStartWorkers prepares background mark worker goroutines.
  1786	// These goroutines will not run until the mark phase, but they must
  1787	// be started while the work is not stopped and from a regular G
  1788	// stack. The caller must hold worldsema.
  1789	func gcBgMarkStartWorkers() {
  1790		// Background marking is performed by per-P G's. Ensure that
  1791		// each P has a background GC G.
  1792		for _, p := range allp {
  1793			if p.gcBgMarkWorker == 0 {
  1794				go gcBgMarkWorker(p)
  1795				notetsleepg(&work.bgMarkReady, -1)
  1796				noteclear(&work.bgMarkReady)
  1797			}
  1798		}
  1799	}
  1800	
  1801	// gcBgMarkPrepare sets up state for background marking.
  1802	// Mutator assists must not yet be enabled.
  1803	func gcBgMarkPrepare() {
  1804		// Background marking will stop when the work queues are empty
  1805		// and there are no more workers (note that, since this is
  1806		// concurrent, this may be a transient state, but mark
  1807		// termination will clean it up). Between background workers
  1808		// and assists, we don't really know how many workers there
  1809		// will be, so we pretend to have an arbitrarily large number
  1810		// of workers, almost all of which are "waiting". While a
  1811		// worker is working it decrements nwait. If nproc == nwait,
  1812		// there are no workers.
  1813		work.nproc = ^uint32(0)
  1814		work.nwait = ^uint32(0)
  1815	}
  1816	
  1817	func gcBgMarkWorker(_p_ *p) {
  1818		gp := getg()
  1819	
  1820		type parkInfo struct {
  1821			m      muintptr // Release this m on park.
  1822			attach puintptr // If non-nil, attach to this p on park.
  1823		}
  1824		// We pass park to a gopark unlock function, so it can't be on
  1825		// the stack (see gopark). Prevent deadlock from recursively
  1826		// starting GC by disabling preemption.
  1827		gp.m.preemptoff = "GC worker init"
  1828		park := new(parkInfo)
  1829		gp.m.preemptoff = ""
  1830	
  1831		park.m.set(acquirem())
  1832		park.attach.set(_p_)
  1833		// Inform gcBgMarkStartWorkers that this worker is ready.
  1834		// After this point, the background mark worker is scheduled
  1835		// cooperatively by gcController.findRunnable. Hence, it must
  1836		// never be preempted, as this would put it into _Grunnable
  1837		// and put it on a run queue. Instead, when the preempt flag
  1838		// is set, this puts itself into _Gwaiting to be woken up by
  1839		// gcController.findRunnable at the appropriate time.
  1840		notewakeup(&work.bgMarkReady)
  1841	
  1842		for {
  1843			// Go to sleep until woken by gcController.findRunnable.
  1844			// We can't releasem yet since even the call to gopark
  1845			// may be preempted.
  1846			gopark(func(g *g, parkp unsafe.Pointer) bool {
  1847				park := (*parkInfo)(parkp)
  1848	
  1849				// The worker G is no longer running, so it's
  1850				// now safe to allow preemption.
  1851				releasem(park.m.ptr())
  1852	
  1853				// If the worker isn't attached to its P,
  1854				// attach now. During initialization and after
  1855				// a phase change, the worker may have been
  1856				// running on a different P. As soon as we
  1857				// attach, the owner P may schedule the
  1858				// worker, so this must be done after the G is
  1859				// stopped.
  1860				if park.attach != 0 {
  1861					p := park.attach.ptr()
  1862					park.attach.set(nil)
  1863					// cas the worker because we may be
  1864					// racing with a new worker starting
  1865					// on this P.
  1866					if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
  1867						// The P got a new worker.
  1868						// Exit this worker.
  1869						return false
  1870					}
  1871				}
  1872				return true
  1873			}, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
  1874	
  1875			// Loop until the P dies and disassociates this
  1876			// worker (the P may later be reused, in which case
  1877			// it will get a new worker) or we failed to associate.
  1878			if _p_.gcBgMarkWorker.ptr() != gp {
  1879				break
  1880			}
  1881	
  1882			// Disable preemption so we can use the gcw. If the
  1883			// scheduler wants to preempt us, we'll stop draining,
  1884			// dispose the gcw, and then preempt.
  1885			park.m.set(acquirem())
  1886	
  1887			if gcBlackenEnabled == 0 {
  1888				throw("gcBgMarkWorker: blackening not enabled")
  1889			}
  1890	
  1891			startTime := nanotime()
  1892			_p_.gcMarkWorkerStartTime = startTime
  1893	
  1894			decnwait := atomic.Xadd(&work.nwait, -1)
  1895			if decnwait == work.nproc {
  1896				println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
  1897				throw("work.nwait was > work.nproc")
  1898			}
  1899	
  1900			systemstack(func() {
  1901				// Mark our goroutine preemptible so its stack
  1902				// can be scanned. This lets two mark workers
  1903				// scan each other (otherwise, they would
  1904				// deadlock). We must not modify anything on
  1905				// the G stack. However, stack shrinking is
  1906				// disabled for mark workers, so it is safe to
  1907				// read from the G stack.
  1908				casgstatus(gp, _Grunning, _Gwaiting)
  1909				switch _p_.gcMarkWorkerMode {
  1910				default:
  1911					throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
  1912				case gcMarkWorkerDedicatedMode:
  1913					gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
  1914					if gp.preempt {
  1915						// We were preempted. This is
  1916						// a useful signal to kick
  1917						// everything out of the run
  1918						// queue so it can run
  1919						// somewhere else.
  1920						lock(&sched.lock)
  1921						for {
  1922							gp, _ := runqget(_p_)
  1923							if gp == nil {
  1924								break
  1925							}
  1926							globrunqput(gp)
  1927						}
  1928						unlock(&sched.lock)
  1929					}
  1930					// Go back to draining, this time
  1931					// without preemption.
  1932					gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
  1933				case gcMarkWorkerFractionalMode:
  1934					gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
  1935				case gcMarkWorkerIdleMode:
  1936					gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
  1937				}
  1938				casgstatus(gp, _Gwaiting, _Grunning)
  1939			})
  1940	
  1941			// Account for time.
  1942			duration := nanotime() - startTime
  1943			switch _p_.gcMarkWorkerMode {
  1944			case gcMarkWorkerDedicatedMode:
  1945				atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
  1946				atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
  1947			case gcMarkWorkerFractionalMode:
  1948				atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
  1949				atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
  1950			case gcMarkWorkerIdleMode:
  1951				atomic.Xaddint64(&gcController.idleMarkTime, duration)
  1952			}
  1953	
  1954			// Was this the last worker and did we run out
  1955			// of work?
  1956			incnwait := atomic.Xadd(&work.nwait, +1)
  1957			if incnwait > work.nproc {
  1958				println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
  1959					"work.nwait=", incnwait, "work.nproc=", work.nproc)
  1960				throw("work.nwait > work.nproc")
  1961			}
  1962	
  1963			// If this worker reached a background mark completion
  1964			// point, signal the main GC goroutine.
  1965			if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
  1966				// Make this G preemptible and disassociate it
  1967				// as the worker for this P so
  1968				// findRunnableGCWorker doesn't try to
  1969				// schedule it.
  1970				_p_.gcBgMarkWorker.set(nil)
  1971				releasem(park.m.ptr())
  1972	
  1973				gcMarkDone()
  1974	
  1975				// Disable preemption and prepare to reattach
  1976				// to the P.
  1977				//
  1978				// We may be running on a different P at this
  1979				// point, so we can't reattach until this G is
  1980				// parked.
  1981				park.m.set(acquirem())
  1982				park.attach.set(_p_)
  1983			}
  1984		}
  1985	}
  1986	
  1987	// gcMarkWorkAvailable reports whether executing a mark worker
  1988	// on p is potentially useful. p may be nil, in which case it only
  1989	// checks the global sources of work.
  1990	func gcMarkWorkAvailable(p *p) bool {
  1991		if p != nil && !p.gcw.empty() {
  1992			return true
  1993		}
  1994		if !work.full.empty() {
  1995			return true // global work available
  1996		}
  1997		if work.markrootNext < work.markrootJobs {
  1998			return true // root scan work available
  1999		}
  2000		return false
  2001	}
  2002	
  2003	// gcMark runs the mark (or, for concurrent GC, mark termination)
  2004	// All gcWork caches must be empty.
  2005	// STW is in effect at this point.
  2006	func gcMark(start_time int64) {
  2007		if debug.allocfreetrace > 0 {
  2008			tracegc()
  2009		}
  2010	
  2011		if gcphase != _GCmarktermination {
  2012			throw("in gcMark expecting to see gcphase as _GCmarktermination")
  2013		}
  2014		work.tstart = start_time
  2015	
  2016		// Check that there's no marking work remaining.
  2017		if work.full != 0 || work.markrootNext < work.markrootJobs {
  2018			print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n")
  2019			panic("non-empty mark queue after concurrent mark")
  2020		}
  2021	
  2022		if debug.gccheckmark > 0 {
  2023			// This is expensive when there's a large number of
  2024			// Gs, so only do it if checkmark is also enabled.
  2025			gcMarkRootCheck()
  2026		}
  2027		if work.full != 0 {
  2028			throw("work.full != 0")
  2029		}
  2030	
  2031		// Clear out buffers and double-check that all gcWork caches
  2032		// are empty. This should be ensured by gcMarkDone before we
  2033		// enter mark termination.
  2034		//
  2035		// TODO: We could clear out buffers just before mark if this
  2036		// has a non-negligible impact on STW time.
  2037		for _, p := range allp {
  2038			// The write barrier may have buffered pointers since
  2039			// the gcMarkDone barrier. However, since the barrier
  2040			// ensured all reachable objects were marked, all of
  2041			// these must be pointers to black objects. Hence we
  2042			// can just discard the write barrier buffer.
  2043			if debug.gccheckmark > 0 || throwOnGCWork {
  2044				// For debugging, flush the buffer and make
  2045				// sure it really was all marked.
  2046				wbBufFlush1(p)
  2047			} else {
  2048				p.wbBuf.reset()
  2049			}
  2050	
  2051			gcw := &p.gcw
  2052			if !gcw.empty() {
  2053				printlock()
  2054				print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork)
  2055				if gcw.wbuf1 == nil {
  2056					print(" wbuf1=<nil>")
  2057				} else {
  2058					print(" wbuf1.n=", gcw.wbuf1.nobj)
  2059				}
  2060				if gcw.wbuf2 == nil {
  2061					print(" wbuf2=<nil>")
  2062				} else {
  2063					print(" wbuf2.n=", gcw.wbuf2.nobj)
  2064				}
  2065				print("\n")
  2066				throw("P has cached GC work at end of mark termination")
  2067			}
  2068			// There may still be cached empty buffers, which we
  2069			// need to flush since we're going to free them. Also,
  2070			// there may be non-zero stats because we allocated
  2071			// black after the gcMarkDone barrier.
  2072			gcw.dispose()
  2073		}
  2074	
  2075		throwOnGCWork = false
  2076	
  2077		cachestats()
  2078	
  2079		// Update the marked heap stat.
  2080		memstats.heap_marked = work.bytesMarked
  2081	
  2082		// Update other GC heap size stats. This must happen after
  2083		// cachestats (which flushes local statistics to these) and
  2084		// flushallmcaches (which modifies heap_live).
  2085		memstats.heap_live = work.bytesMarked
  2086		memstats.heap_scan = uint64(gcController.scanWork)
  2087	
  2088		if trace.enabled {
  2089			traceHeapAlloc()
  2090		}
  2091	}
  2092	
  2093	// gcSweep must be called on the system stack because it acquires the heap
  2094	// lock. See mheap for details.
  2095	//go:systemstack
  2096	func gcSweep(mode gcMode) {
  2097		if gcphase != _GCoff {
  2098			throw("gcSweep being done but phase is not GCoff")
  2099		}
  2100	
  2101		lock(&mheap_.lock)
  2102		mheap_.sweepgen += 2
  2103		mheap_.sweepdone = 0
  2104		if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
  2105			// We should have drained this list during the last
  2106			// sweep phase. We certainly need to start this phase
  2107			// with an empty swept list.
  2108			throw("non-empty swept list")
  2109		}
  2110		mheap_.pagesSwept = 0
  2111		mheap_.sweepArenas = mheap_.allArenas
  2112		mheap_.reclaimIndex = 0
  2113		mheap_.reclaimCredit = 0
  2114		unlock(&mheap_.lock)
  2115	
  2116		if !_ConcurrentSweep || mode == gcForceBlockMode {
  2117			// Special case synchronous sweep.
  2118			// Record that no proportional sweeping has to happen.
  2119			lock(&mheap_.lock)
  2120			mheap_.sweepPagesPerByte = 0
  2121			unlock(&mheap_.lock)
  2122			// Sweep all spans eagerly.
  2123			for sweepone() != ^uintptr(0) {
  2124				sweep.npausesweep++
  2125			}
  2126			// Free workbufs eagerly.
  2127			prepareFreeWorkbufs()
  2128			for freeSomeWbufs(false) {
  2129			}
  2130			// All "free" events for this mark/sweep cycle have
  2131			// now happened, so we can make this profile cycle
  2132			// available immediately.
  2133			mProf_NextCycle()
  2134			mProf_Flush()
  2135			return
  2136		}
  2137	
  2138		// Background sweep.
  2139		lock(&sweep.lock)
  2140		if sweep.parked {
  2141			sweep.parked = false
  2142			ready(sweep.g, 0, true)
  2143		}
  2144		unlock(&sweep.lock)
  2145	}
  2146	
  2147	// gcResetMarkState resets global state prior to marking (concurrent
  2148	// or STW) and resets the stack scan state of all Gs.
  2149	//
  2150	// This is safe to do without the world stopped because any Gs created
  2151	// during or after this will start out in the reset state.
  2152	//
  2153	// gcResetMarkState must be called on the system stack because it acquires
  2154	// the heap lock. See mheap for details.
  2155	//
  2156	//go:systemstack
  2157	func gcResetMarkState() {
  2158		// This may be called during a concurrent phase, so make sure
  2159		// allgs doesn't change.
  2160		lock(&allglock)
  2161		for _, gp := range allgs {
  2162			gp.gcscandone = false  // set to true in gcphasework
  2163			gp.gcscanvalid = false // stack has not been scanned
  2164			gp.gcAssistBytes = 0
  2165		}
  2166		unlock(&allglock)
  2167	
  2168		// Clear page marks. This is just 1MB per 64GB of heap, so the
  2169		// time here is pretty trivial.
  2170		lock(&mheap_.lock)
  2171		arenas := mheap_.allArenas
  2172		unlock(&mheap_.lock)
  2173		for _, ai := range arenas {
  2174			ha := mheap_.arenas[ai.l1()][ai.l2()]
  2175			for i := range ha.pageMarks {
  2176				ha.pageMarks[i] = 0
  2177			}
  2178		}
  2179	
  2180		work.bytesMarked = 0
  2181		work.initialHeapLive = atomic.Load64(&memstats.heap_live)
  2182	}
  2183	
  2184	// Hooks for other packages
  2185	
  2186	var poolcleanup func()
  2187	
  2188	//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup
  2189	func sync_runtime_registerPoolCleanup(f func()) {
  2190		poolcleanup = f
  2191	}
  2192	
  2193	func clearpools() {
  2194		// clear sync.Pools
  2195		if poolcleanup != nil {
  2196			poolcleanup()
  2197		}
  2198	
  2199		// Clear central sudog cache.
  2200		// Leave per-P caches alone, they have strictly bounded size.
  2201		// Disconnect cached list before dropping it on the floor,
  2202		// so that a dangling ref to one entry does not pin all of them.
  2203		lock(&sched.sudoglock)
  2204		var sg, sgnext *sudog
  2205		for sg = sched.sudogcache; sg != nil; sg = sgnext {
  2206			sgnext = sg.next
  2207			sg.next = nil
  2208		}
  2209		sched.sudogcache = nil
  2210		unlock(&sched.sudoglock)
  2211	
  2212		// Clear central defer pools.
  2213		// Leave per-P pools alone, they have strictly bounded size.
  2214		lock(&sched.deferlock)
  2215		for i := range sched.deferpool {
  2216			// disconnect cached list before dropping it on the floor,
  2217			// so that a dangling ref to one entry does not pin all of them.
  2218			var d, dlink *_defer
  2219			for d = sched.deferpool[i]; d != nil; d = dlink {
  2220				dlink = d.link
  2221				d.link = nil
  2222			}
  2223			sched.deferpool[i] = nil
  2224		}
  2225		unlock(&sched.deferlock)
  2226	}
  2227	
  2228	// Timing
  2229	
  2230	// itoaDiv formats val/(10**dec) into buf.
  2231	func itoaDiv(buf []byte, val uint64, dec int) []byte {
  2232		i := len(buf) - 1
  2233		idec := i - dec
  2234		for val >= 10 || i >= idec {
  2235			buf[i] = byte(val%10 + '0')
  2236			i--
  2237			if i == idec {
  2238				buf[i] = '.'
  2239				i--
  2240			}
  2241			val /= 10
  2242		}
  2243		buf[i] = byte(val + '0')
  2244		return buf[i:]
  2245	}
  2246	
  2247	// fmtNSAsMS nicely formats ns nanoseconds as milliseconds.
  2248	func fmtNSAsMS(buf []byte, ns uint64) []byte {
  2249		if ns >= 10e6 {
  2250			// Format as whole milliseconds.
  2251			return itoaDiv(buf, ns/1e6, 0)
  2252		}
  2253		// Format two digits of precision, with at most three decimal places.
  2254		x := ns / 1e3
  2255		if x == 0 {
  2256			buf[0] = '0'
  2257			return buf[:1]
  2258		}
  2259		dec := 3
  2260		for x >= 100 {
  2261			x /= 10
  2262			dec--
  2263		}
  2264		return itoaDiv(buf, x, dec)
  2265	}
  2266
View as plain text