This is a Preprint and has not been peer reviewed. This is version 4 of this Preprint.
The job shop problem is a highly practically relevant NP-hard problem, which has and continues to receive considerable attention in the literature. Approaches to the problem are typically benchmarked on publicly available datasets containing sets of problem instances. These problem instances are usually generated by some mechanism involving randomisation of instance properties or by maximising instance difficulty, but do not explicitly address properties such as product mix. Product mix, or more generally, diversity in jobs and operations, can be highly variable across different use cases and may affect the suitability of different scheduling methods. We generate a dataset explicitly varying this property by formalising the concept of diversity. To this end, we measure the diversity of jobs and operations in job shop instances using the Shannon entropy and generate instances with specific values of entropy. While our interest is specifically in learning-based approaches to scheduling, the generated instances can serve as a common basis to investigate the impact of instance diversity on a wider variety of different scheduling methods.