In this assignment an analysis of character frequency of Greek letters was conducted for two books in physics namely “Introduction to Electrodynamics” by David J. Griffiths and “Galaxy formation and Evolution” by Houjun Mo, Frank van den Bosch and Simon White. The first one focuses on a foundational subject in physics while the later deals with a more specialized and advanced subject in physics. From here on “Introduction to Electrodynamics” would be referred to as Community A and “Galaxy formation and Evolution” would be referred to as Community B.
A random sampling of 300 pages from the content bearing pages of the books were taken and was analyzed using python code to obtain the number of new greek characters encountered as well as the total number of greek characters present as more and more pages were sampled. Random sampling helps ensure that each page in the book has an equal chance of being selected, thus providing a representative sample of the entire text.
####Relative abundance vs Rank plot
Character | Frequency | Relative abundance,Pi |
---|---|---|
μ | 491 | 0.167234 |
π | 456 | 0.155313 |
θ | 438 | 0.149183 |
φ | 213 | 0.072548 |
σ | 202 | 0.068801 |
ρ | 186 | 0.063351 |
ω | 185 | 0.063011 |
τ | 122 | 0.041553 |
λ | 107 | 0.036444 |
α | 101 | 0.034401 |
δ | 87 | 0.029632 |
χ | 84 | 0.028610 |
γ | 84 | 0.028610 |
β | 71 | 0.024183 |
ν | 68 | 0.023161 |
η | 16 | 0.005450 |
κ | 14 | 0.004768 |
Δ | 7 | 0.002384 |
ψ | 3 | 0.001022 |
ε | 1 | 0.000341 |
Character | Frequency | Relative abundance,Pi |
---|---|---|
δ | 555 | 0.083208 |
ν | 534 | 0.08006 |
ρ | 511 | 0.076612 |
γ | 389 | 0.058321 |
α | 372 | 0.055772 |
τ | 345 | 0.051724 |
Ω | 312 | 0.046777 |
λ | 295 | 0.044228 |
μ | 289 | 0.043328 |
σ | 252 | 0.037781 |
π | 251 | 0.037631 |
β | 247 | 0.037031 |
ξ | 245 | 0.036732 |
Φ | 229 | 0.034333 |
ε | 205 | 0.030735 |
Δ | 194 | 0.029085 |
Λ | 147 | 0.022039 |
Σ | 144 | 0.021589 |
φ | 142 | 0.021289 |
ω | 127 | 0.01904 |
θ | 118 | 0.017691 |
κ | 113 | 0.016942 |
ϕ | 105 | 0.015742 |
Ψ | 95 | 0.014243 |
Γ | 79 | 0.011844 |
ζ | 76 | 0.011394 |
η | 66 | 0.009895 |
ϑ | 59 | 0.008846 |
χ | 58 | 0.008696 |
Θ | 54 | 0.008096 |
Π | 31 | 0.004648 |
ϒ | 17 | 0.002549 |
ψ | 11 | 0.001649 |
ϖ | 3 | 0.00045 |
Where Pi the proportion of individuals of ith species that contribute to the total species and S is the total number of species
Community A has lower no of species when compared to community B which can be seen in the richness plot with higher saturation value for community B.
The number of species in community A and B are 20 and 34 respectively.
The high value of Simpson’s and Shanon’s index exhibited by community B in comparison to community A, indicates a more diverse use of Greek letters in the former.
Community B also demonstrates a higher equitability compared to community A, suggesting a more even distribution of the Greek letters in the former.
The character with the highest frequency in community A is ‘μ’ while in community B it is ‘δ’
Links to the books
Introduction to Electrodynamics
Galaxy Formation and Evolution