File Relationships - research & notes

## File Relationships Whilst working on my wiki system, I have come to need a system to determine the strength of file relationships considering tags as a sorting method. I was building the "further reading/ links" section of the wiki page. To automate this process, as I hope to fill my wiki with 100s of pages, I chose dataview queries to display files that share certain frontmatter. This works thusly: When I create a new wiki page, I am prompted to enter ceratain key information to the page, iE a profiles Profession. There is then further frontmatter I complete when fillingout the page (main field of study etc.). So a "simple" `dataviewjs` query later any wiki files sharing thos properties will be linked in collapsible tables. This works well for now, but in Order to keep the files at a digestible ammount, and if the wiki is to grow, I need to create a cut-off point for when a file is linked or not. I figured these criteria could lead to functional solutions. ### Criteria for Determining File Relationships: #### **1. Tag Similarity (Weighted by Relative Proportion)** - **Explanation**: - Calculate how many tags two files share relative to their total number of tags. Files with fewer tags should be prioritized if they share a high proportion of their tags with the parent file. - **Formula**: - Relevance = Shared Tags ÷ (Parent Tags + Child Tags - Shared Tags) - This gives a proportional relevance score (closer to `1` means more relevant). - **Improvement**: - Weight tags differently. For example, some tags (e.g., `core`, `concept`) might carry more weight than others (e.g., `miscellaneous`). #### **2. Shared Frontmatter Fields** - **Explanation**: - Compare YAML metadata fields (e.g., `profession`, `fos`) between files. A high number of shared fields indicates stronger relationships. - **Improvement**: - Assign weights to specific fields. For example: - Shared `profession` → Higher weight (e.g., 10 points). - Shared `tags` → Moderate weight (e.g., 5 points). - Shared `fos` → Lower weight (e.g., 2 points). #### **3. Shared Links (Network Similarity)** - **Explanation**: - If two files are linked to the same files, they are more likely to be related. This builds a **"graph-based" relationship**. - **Improvement**: - Weight files with high "centrality" in the graph (i.e., files that are linked to many other related files). - Consider not just direct links but also second-degree links (e.g., files that are linked to by both files). #### **4. Proximity in the Folder Structure** - **Explanation**: - Files within the same folder (or subfolders) are more likely to be related. - **Improvement**: - Assign a proximity score based on folder depth. For example: - Same folder → 10 points. - Immediate subfolder → 7 points. - Shared parent folder → 5 points. #### **5. Link Strength Based on Context** - **Explanation**: - Links in the body of a file can have different strengths depending on where they appear. For example: - Links in a **summary section** are more important. - Links in a **footnote** or **reference** section are less important. - **Improvement**: - Assign weights based on the section where the link appears. #### **6. Frequency of Link Appearance** - **Explanation**: - If a file is linked multiple times within the same parent file (e.g., repeated mentions in various sections), it's more relevant. - **Improvement**: - Count how often a file is mentioned in another file and weight it higher if the frequency is greater. #### **7. Temporal Relevance** - **Explanation**: - Consider the **creation** or **modification date** of files. More recently edited or created files may be more relevant, especially if you're actively working on them. - **Improvement**: - Apply a decay function, such as giving higher scores to files edited or created within the last X days. #### **8. File Size (Optional)** - **Explanation**: - Smaller, concise files might be more relevant than larger files filled with general information. - **Improvement**: - Penalize files with very large word counts, as they are likely broad in scope. #### **9. Shared Keywords in Body Text** - **Explanation**: - Look for keywords in the text of files (not just tags or frontmatter). Files sharing key terms might be closely related. - **Improvement**: - Use natural language processing (NLP) to extract key terms and compute overlap.