Skip to content

Commit

Permalink
[clang][modules] Do not resolve HeaderFileInfo externally in `ASTWr…
Browse files Browse the repository at this point in the history
…iter` (llvm#87848)

Clang uses the `HeaderFileInfo` struct to track bits of information on
header files, which gets used throughout the compiler. We also use this
to compute the set of affecting module maps in `ASTWriter` and in the
end serialize the information into the `HEADER_SEARCH_TABLE` record of a
PCM file, allowing clients to learn about headers from the module. In
doing so, Clang asks for existing `HeaderFileInfo` for all known
`FileEntries`. Note that this asks the loaded PCM files for the
information they have on each header file in question. This seems
unnecessary: we only want to serialize information on header files that
either belong to the current module or that got included textually.
Loaded PCM files can't provide us with any useful information.

For explicit modules with lazy loading (using `-fmodule-map-file=<path>`
with `-fmodule-file=<name>=<path>`) the compiler knows about header
files listed in the module map files on the command-line. This can be a
large number.

Asking for existing `HeaderFileInfo` can trigger deserialization of
`HEADER_SEARCH_TABLE` from loaded PCM files. Keys of the on-disk hash
table consist of the header file size and modification time. However,
with explicit modules Clang zeroes out the modification time. Moreover,
if you import lots of modules, some of their header files end up having
identical sizes. This means lots of hash collisions that can only be
resolved by running the serialized filename through `FileManager` and
comparing equality of the `FileEntry`. This ends up being super
expensive, essentially re-stating lots of the transitively loaded SDK
header files.

This patch cleans up the API for getting `HeaderFileInfo` and makes sure
`ASTWriter` uses the version that doesn't ask loaded PCM files for more
information. This removes the excessive stat traffic coming from
`ASTWriter` hopefully without changing observable behavior.
  • Loading branch information
jansvoboda11 authored Apr 11, 2024
1 parent a952c12 commit 84df7a0
Show file tree
Hide file tree
Showing 4 changed files with 60 additions and 54 deletions.
2 changes: 1 addition & 1 deletion clang-tools-extra/clangd/index/SymbolCollector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,7 @@ class SymbolCollector::HeaderFileURICache {
// Framework headers are spelled as <FrameworkName/Foo.h>, not
// "path/FrameworkName.framework/Headers/Foo.h".
auto &HS = PP->getHeaderSearchInfo();
if (const auto *HFI = HS.getExistingFileInfo(*FE, /*WantExternal*/ false))
if (const auto *HFI = HS.getExistingLocalFileInfo(*FE))
if (!HFI->Framework.empty())
if (auto Spelling =
getFrameworkHeaderIncludeSpelling(*FE, HFI->Framework, HS))
Expand Down
24 changes: 13 additions & 11 deletions clang/include/clang/Lex/HeaderSearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -547,14 +547,15 @@ class HeaderSearch {
/// Return whether the specified file is a normal header,
/// a system header, or a C++ friendly system header.
SrcMgr::CharacteristicKind getFileDirFlavor(FileEntryRef File) {
return (SrcMgr::CharacteristicKind)getFileInfo(File).DirInfo;
if (const HeaderFileInfo *HFI = getExistingFileInfo(File))
return (SrcMgr::CharacteristicKind)HFI->DirInfo;
return (SrcMgr::CharacteristicKind)HeaderFileInfo().DirInfo;
}

/// Mark the specified file as a "once only" file due to
/// \#pragma once.
void MarkFileIncludeOnce(FileEntryRef File) {
HeaderFileInfo &FI = getFileInfo(File);
FI.isPragmaOnce = true;
getFileInfo(File).isPragmaOnce = true;
}

/// Mark the specified file as a system header, e.g. due to
Expand Down Expand Up @@ -834,16 +835,17 @@ class HeaderSearch {

unsigned header_file_size() const { return FileInfo.size(); }

/// Return the HeaderFileInfo structure for the specified FileEntry,
/// in preparation for updating it in some way.
/// Return the HeaderFileInfo structure for the specified FileEntry, in
/// preparation for updating it in some way.
HeaderFileInfo &getFileInfo(FileEntryRef FE);

/// Return the HeaderFileInfo structure for the specified FileEntry,
/// if it has ever been filled in.
/// \param WantExternal Whether the caller wants purely-external header file
/// info (where \p External is true).
const HeaderFileInfo *getExistingFileInfo(FileEntryRef FE,
bool WantExternal = true) const;
/// Return the HeaderFileInfo structure for the specified FileEntry, if it has
/// ever been filled in (either locally or externally).
const HeaderFileInfo *getExistingFileInfo(FileEntryRef FE) const;

/// Return the headerFileInfo structure for the specified FileEntry, if it has
/// ever been filled in locally.
const HeaderFileInfo *getExistingLocalFileInfo(FileEntryRef FE) const;

SearchDirIterator search_dir_begin() { return {*this, 0}; }
SearchDirIterator search_dir_end() { return {*this, SearchDirs.size()}; }
Expand Down
77 changes: 42 additions & 35 deletions clang/lib/Lex/HeaderSearch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -946,9 +946,13 @@ OptionalFileEntryRef HeaderSearch::LookupFile(
// If we have no includer, that means we're processing a #include
// from a module build. We should treat this as a system header if we're
// building a [system] module.
bool IncluderIsSystemHeader =
Includer ? getFileInfo(*Includer).DirInfo != SrcMgr::C_User :
BuildSystemModule;
bool IncluderIsSystemHeader = [&]() {
if (!Includer)
return BuildSystemModule;
const HeaderFileInfo *HFI = getExistingFileInfo(*Includer);
assert(HFI && "includer without file info");
return HFI->DirInfo != SrcMgr::C_User;
}();
if (OptionalFileEntryRef FE = getFileAndSuggestModule(
TmpDir, IncludeLoc, IncluderAndDir.second, IncluderIsSystemHeader,
RequestingModule, SuggestedModule)) {
Expand All @@ -963,10 +967,11 @@ OptionalFileEntryRef HeaderSearch::LookupFile(
// Note that we only use one of FromHFI/ToHFI at once, due to potential
// reallocation of the underlying vector potentially making the first
// reference binding dangling.
HeaderFileInfo &FromHFI = getFileInfo(*Includer);
unsigned DirInfo = FromHFI.DirInfo;
bool IndexHeaderMapHeader = FromHFI.IndexHeaderMapHeader;
StringRef Framework = FromHFI.Framework;
const HeaderFileInfo *FromHFI = getExistingFileInfo(*Includer);
assert(FromHFI && "includer without file info");
unsigned DirInfo = FromHFI->DirInfo;
bool IndexHeaderMapHeader = FromHFI->IndexHeaderMapHeader;
StringRef Framework = FromHFI->Framework;

HeaderFileInfo &ToHFI = getFileInfo(*FE);
ToHFI.DirInfo = DirInfo;
Expand Down Expand Up @@ -1153,10 +1158,12 @@ OptionalFileEntryRef HeaderSearch::LookupFile(
// "Foo" is the name of the framework in which the including header was found.
if (!Includers.empty() && Includers.front().first && !isAngled &&
!Filename.contains('/')) {
HeaderFileInfo &IncludingHFI = getFileInfo(*Includers.front().first);
if (IncludingHFI.IndexHeaderMapHeader) {
const HeaderFileInfo *IncludingHFI =
getExistingFileInfo(*Includers.front().first);
assert(IncludingHFI && "includer without file info");
if (IncludingHFI->IndexHeaderMapHeader) {
SmallString<128> ScratchFilename;
ScratchFilename += IncludingHFI.Framework;
ScratchFilename += IncludingHFI->Framework;
ScratchFilename += '/';
ScratchFilename += Filename;

Expand Down Expand Up @@ -1286,11 +1293,11 @@ OptionalFileEntryRef HeaderSearch::LookupSubframeworkHeader(
}

// This file is a system header or C++ unfriendly if the old file is.
//
// Note that the temporary 'DirInfo' is required here, as either call to
// getFileInfo could resize the vector and we don't want to rely on order
// of evaluation.
unsigned DirInfo = getFileInfo(ContextFileEnt).DirInfo;
const HeaderFileInfo *ContextHFI = getExistingFileInfo(ContextFileEnt);
assert(ContextHFI && "context file without file info");
// Note that the temporary 'DirInfo' is required here, as the call to
// getFileInfo could resize the vector and might invalidate 'ContextHFI'.
unsigned DirInfo = ContextHFI->DirInfo;
getFileInfo(*File).DirInfo = DirInfo;

FrameworkName.pop_back(); // remove the trailing '/'
Expand Down Expand Up @@ -1348,8 +1355,6 @@ static void mergeHeaderFileInfo(HeaderFileInfo &HFI,
HFI.Framework = OtherHFI.Framework;
}

/// getFileInfo - Return the HeaderFileInfo structure for the specified
/// FileEntry.
HeaderFileInfo &HeaderSearch::getFileInfo(FileEntryRef FE) {
if (FE.getUID() >= FileInfo.size())
FileInfo.resize(FE.getUID() + 1);
Expand All @@ -1366,27 +1371,20 @@ HeaderFileInfo &HeaderSearch::getFileInfo(FileEntryRef FE) {
}

HFI->IsValid = true;
// We have local information about this header file, so it's no longer
// strictly external.
// We assume the caller has local information about this header file, so it's
// no longer strictly external.
HFI->External = false;
return *HFI;
}

const HeaderFileInfo *
HeaderSearch::getExistingFileInfo(FileEntryRef FE, bool WantExternal) const {
// If we have an external source, ensure we have the latest information.
// FIXME: Use a generation count to check whether this is really up to date.
const HeaderFileInfo *HeaderSearch::getExistingFileInfo(FileEntryRef FE) const {
HeaderFileInfo *HFI;
if (ExternalSource) {
if (FE.getUID() >= FileInfo.size()) {
if (!WantExternal)
return nullptr;
if (FE.getUID() >= FileInfo.size())
FileInfo.resize(FE.getUID() + 1);
}

HFI = &FileInfo[FE.getUID()];
if (!WantExternal && (!HFI->IsValid || HFI->External))
return nullptr;
// FIXME: Use a generation count to check whether this is really up to date.
if (!HFI->Resolved) {
auto ExternalHFI = ExternalSource->GetHeaderFileInfo(FE);
if (ExternalHFI.IsValid) {
Expand All @@ -1395,16 +1393,25 @@ HeaderSearch::getExistingFileInfo(FileEntryRef FE, bool WantExternal) const {
mergeHeaderFileInfo(*HFI, ExternalHFI);
}
}
} else if (FE.getUID() >= FileInfo.size()) {
return nullptr;
} else {
} else if (FE.getUID() < FileInfo.size()) {
HFI = &FileInfo[FE.getUID()];
} else {
HFI = nullptr;
}

if (!HFI->IsValid || (HFI->External && !WantExternal))
return nullptr;
return (HFI && HFI->IsValid) ? HFI : nullptr;
}

const HeaderFileInfo *
HeaderSearch::getExistingLocalFileInfo(FileEntryRef FE) const {
HeaderFileInfo *HFI;
if (FE.getUID() < FileInfo.size()) {
HFI = &FileInfo[FE.getUID()];
} else {
HFI = nullptr;
}

return HFI;
return (HFI && HFI->IsValid && !HFI->External) ? HFI : nullptr;
}

bool HeaderSearch::isFileMultipleIncludeGuarded(FileEntryRef File) const {
Expand Down
11 changes: 4 additions & 7 deletions clang/lib/Serialization/ASTWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,7 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {
if (!File)
continue;

const HeaderFileInfo *HFI =
HS.getExistingFileInfo(*File, /*WantExternal*/ false);
const HeaderFileInfo *HFI = HS.getExistingLocalFileInfo(*File);
if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
continue;

Expand Down Expand Up @@ -2052,14 +2051,12 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
if (!File)
continue;

// Get the file info. This will load info from the external source if
// necessary. Skip emitting this file if we have no information on it
// as a header file (in which case HFI will be null) or if it hasn't
// Get the file info. Skip emitting this file if we have no information on
// it as a header file (in which case HFI will be null) or if it hasn't
// changed since it was loaded. Also skip it if it's for a modular header
// from a different module; in that case, we rely on the module(s)
// containing the header to provide this information.
const HeaderFileInfo *HFI =
HS.getExistingFileInfo(*File, /*WantExternal*/!Chain);
const HeaderFileInfo *HFI = HS.getExistingLocalFileInfo(*File);
if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
continue;

Expand Down

0 comments on commit 84df7a0

Please sign in to comment.