## File Relationships
Whilst working on my wiki system, I have come to need a system to determine the strength of file relationships considering tags as a sorting method. I was building the "further reading/ links" section of the wiki page. To automate this process, as I hope to fill my wiki with 100s of pages, I chose dataview queries to display files that share certain frontmatter. This works thusly: When I create a new wiki page, I am prompted to enter ceratain key information to the page, iE a profiles Profession. There is then further frontmatter I complete when fillingout the page (main field of study etc.).
So a "simple" `dataviewjs` query later any wiki files sharing thos properties will be linked in collapsible tables. This works well for now, but in Order to keep the files at a digestible ammount, and if the wiki is to grow, I need to create a cut-off point for when a file is linked or not.
I figured these criteria could lead to functional solutions.
### Criteria for Determining File Relationships:
#### **1. Tag Similarity (Weighted by Relative Proportion)**
- **Explanation**:
- Calculate how many tags two files share relative to their total number of tags. Files with fewer tags should be prioritized if they share a high proportion of their tags with the parent file.
- **Formula**:
- Relevance = Shared Tags ÷ (Parent Tags + Child Tags - Shared Tags)
- This gives a proportional relevance score (closer to `1` means more relevant).
- **Improvement**:
- Weight tags differently. For example, some tags (e.g., `core`, `concept`) might carry more weight than others (e.g., `miscellaneous`).
#### **2. Shared Frontmatter Fields**
- **Explanation**:
- Compare YAML metadata fields (e.g., `profession`, `fos`) between files. A high number of shared fields indicates stronger relationships.
- **Improvement**:
- Assign weights to specific fields. For example:
- Shared `profession` → Higher weight (e.g., 10 points).
- Shared `tags` → Moderate weight (e.g., 5 points).
- Shared `fos` → Lower weight (e.g., 2 points).
#### **3. Shared Links (Network Similarity)**
- **Explanation**:
- If two files are linked to the same files, they are more likely to be related. This builds a **"graph-based" relationship**.
- **Improvement**:
- Weight files with high "centrality" in the graph (i.e., files that are linked to many other related files).
- Consider not just direct links but also second-degree links (e.g., files that are linked to by both files).
#### **4. Proximity in the Folder Structure**
- **Explanation**:
- Files within the same folder (or subfolders) are more likely to be related.
- **Improvement**:
- Assign a proximity score based on folder depth. For example:
- Same folder → 10 points.
- Immediate subfolder → 7 points.
- Shared parent folder → 5 points.
#### **5. Link Strength Based on Context**
- **Explanation**:
- Links in the body of a file can have different strengths depending on where they appear. For example:
- Links in a **summary section** are more important.
- Links in a **footnote** or **reference** section are less important.
- **Improvement**:
- Assign weights based on the section where the link appears.
#### **6. Frequency of Link Appearance**
- **Explanation**:
- If a file is linked multiple times within the same parent file (e.g., repeated mentions in various sections), it's more relevant.
- **Improvement**:
- Count how often a file is mentioned in another file and weight it higher if the frequency is greater.
#### **7. Temporal Relevance**
- **Explanation**:
- Consider the **creation** or **modification date** of files. More recently edited or created files may be more relevant, especially if you're actively working on them.
- **Improvement**:
- Apply a decay function, such as giving higher scores to files edited or created within the last X days.
#### **8. File Size (Optional)**
- **Explanation**:
- Smaller, concise files might be more relevant than larger files filled with general information.
- **Improvement**:
- Penalize files with very large word counts, as they are likely broad in scope.
#### **9. Shared Keywords in Body Text**
- **Explanation**:
- Look for keywords in the text of files (not just tags or frontmatter). Files sharing key terms might be closely related.
- **Improvement**:
- Use natural language processing (NLP) to extract key terms and compute overlap.